# Week 4 Day 2: APIs

## Ethical web scraping

The phrase "data scraping" is colloquial and popular but has pejorative connotations. Data is valuable: other people invested time in collecting, organizing, and sharing it. When you show up with a scraper you built after maybe a dozen hours demanding data, you rarely pay the costs of labor, hosting, *etc*. that went into making the data available. There are *very* good rationales for making many kinds of data more availabile: reproducibility of scientific results, sharing publicly-funded and/or close-to-zero marginal cost resources, transparency and accountability in democratic institutions, remixing for innovative new analyses, *etc*. 

But data breaches have become eponymous (Target in 2013, Equifax in 2017, Facebook in 2018, *etc*.) because they violate other values like privacy. These manifest most clearly in principles outlined in the 1978 [Belmont Report](https://en.wikipedia.org/wiki/Belmont_Report):
* **Respect for persons**: protecting the autonomy of all people and treating them with courtesy and respect and allowing for informed consent. Researchers must be truthful and conduct no deception;
* **Beneficence**: The philosophy of "Do no harm" while maximizing benefits for the research project and minimizing risks to the research subjects; and
* **Justice**: ensuring reasonable, non-exploitative, and well-considered procedures are administered fairly — the fair distribution of costs and benefits to potential research participants — and equally.

(A fourth principle "Respect for Public" emphasizes compliance, accountability, and transparency in the conduct of research.)

In the context of data scraping, there are four "areas of difficulty":

* **Informed consent**: does the data scraper obtain consent from every person whose data is being retrieved?
* **Informational risk**: can the data scraper inflict economic, social, *etc*. harm on individuals by disclosing data?
* **Privacy**: does the data scraper know which information a person intended to be private or public? 
* **Decision-making under uncertainty**: does the data scraper know all the ways the data could be (mis)used? 

Ethical and legal risks involved with scraping:

* **[Copyright infringement](https://en.wikipedia.org/wiki/Copyright_infringement)**: compiling data that someone else can claim ownership over
* **[Trespass](https://en.wikipedia.org/wiki/Trespass_to_chattels#In_the_electronic_age)**: over-aggressive scraping shuts down someone else's property
* **[Computer Fraud & Abuse Act](https://en.wikipedia.org/wiki/Computer_Fraud_and_Abuse_Act)**: misrepresenting yourself to access a system is "hacking"

While I cannot provide legal advice, we will revisit these concerns throughout the course through best practices for avoiding infringement, staggering data collection, simulating human requests, securing data, and protecting privacy.

James Densmore has a nice summary of [practices for ethical web scraping](https://towardsdatascience.com/ethics-in-web-scraping-b96b18136f01):

> * If you have a public API that provides the data I’m looking for, I’ll use it and avoid scraping all together.
> * I will always provide a User Agent string that makes my intentions clear and provides a way for you to contact me with questions or concerns.
> * I will request data at a reasonable rate. I will strive to never be confused for a DDoS attack.
> * I will only save the data I absolutely need from your page. If all I need it OpenGraph meta-data, that’s all I’ll keep.
> * I will respect any content I do keep. I’ll never pass it off as my own.
> * I will look for ways to return value to you. Maybe I can drive some (real) traffic to your site or credit you in an article or post.
> * I will respond in a timely fashion to your outreach and work with you towards a resolution.
> * I will scrape for the purpose of creating new value from the data, not to duplicate it.

Some other important components of ethical web scraping practices [include](http://robertorocha.info/on-the-ethics-of-web-scraping/):

* Reading the Terms of Service and Privacy Policies for the site's rules on scraping.
* Inspecting the robots.txt file for rules about what pages can be scraped, indexed, *etc*.
* Be gentle on smaller websites by running during off-peak hours and spacing out requests.
* Identify yourself by name and email in your User-Agent strings

What does a robots.txt file look like? Here is CNN's. It helpfull provides a sitemap to the robot to get other pages, it allows all kinds of User-agents, and disallows crawling of pages in specific directories (ads, polls, tests).

![Should you build a scraper flowchart](http://www.storybench.org/wp-content/uploads/2016/04/flowchart_final.jpeg)

<!-- 

### What is an API?
An API is a communication tool. You, the ***User***, communicate with the ***Client***, the computer that sends the request to the ***Server***, the computer that responds to your request. 

The server is where the information you’re looking for is stored, and it’s what responds to your request. Information about the server appears in the documentation. The documentation will include the endpoints where specific data can be found as well as the structure of the data on the server. 

How you make a request depends on the API you are using adn that is where documentation such as the Wikipedia documenation will come into play. 

A core thing that is prevelant through APIs is an ***Endpoint***, or a specific route or URL where an API can be accessed. Each endpoint corresponds to a particular function or data point that the API exposes for use. 

When interacting with an API, a **client** (the person or software using the API) will send **requests** to the **endpoint**. These requests tell the API what action to perform, and they often contain additional data or parameters to guide that action. Following a request, the API provides a **response**. This contains the data requested, or an error message detailing why the request couldn’t be completed.



<!-- JSON files are similar to dictionaries but they don't import as so-->

### import requests
One of the things that we need in order to access this API is `requests`. This is library that makes a HTTP request and gets backend information from a website, such as RSS feed. 

documentation - https://requests.readthedocs.io/en/latest/

- requests.get()

requests.get(url, params={key: value}, args) -- sends a GET request to the specified url


- requests.get().json

Making a request with Requests is very simple. Begin by importing the Requests ... There's also a builtin JSON decoder. 

In [2]:
import requests

<!--  -->

## PokeApi
### We're going to try with the PokeApi - https://pokeapi.co

This is an API where we cna get a bunch of information on the pokemon in question. 

--- try it on the UI

--- here are the pokemon options: https://pokeapi.co/api/v2/pokemon

the website you need to ping is: 

    "https://pokeapi.co/api/v2/pokemon/"+pokemonName


In [5]:
# Syntax:
# requests.get(url, params={key: value}, args) -- sends a GET request to the specified url

pokemonName = 'charmander' # this is setting a variable of the name of the pokemon you want to look up

pokemonSearch = requests.get("https://pokeapi.co/api/v2/pokemon/"+pokemonName).json()
# ^ this gives us the endpoint to info abt pokemon names 

In [7]:
#import pprint so we can see it
import pprint

In [9]:
pprint.pprint(pokemonSearch)
# ^ this is a json file

{'abilities': [{'ability': {'name': 'blaze',
                            'url': 'https://pokeapi.co/api/v2/ability/66/'},
                'is_hidden': False,
                'slot': 1},
               {'ability': {'name': 'solar-power',
                            'url': 'https://pokeapi.co/api/v2/ability/94/'},
                'is_hidden': True,
                'slot': 3}],
 'base_experience': 62,
 'cries': {'latest': 'https://raw.githubusercontent.com/PokeAPI/cries/main/cries/pokemon/latest/4.ogg',
           'legacy': 'https://raw.githubusercontent.com/PokeAPI/cries/main/cries/pokemon/legacy/4.ogg'},
 'forms': [{'name': 'charmander',
            'url': 'https://pokeapi.co/api/v2/pokemon-form/4/'}],
 'game_indices': [{'game_index': 176,
                   'version': {'name': 'red',
                               'url': 'https://pokeapi.co/api/v2/version/1/'}},
                  {'game_index': 176,
                   'version': {'name': 'blue',
                               'url': 'h

In [11]:
#write your own
#PokemonName = 'charmander'

# request for info abt squirtile pokemon
pokemonSearch_squirtle = requests.get("https://pokeapi.co/api/v2/pokemon/"+"squirtle").json()

In [12]:
# pretty print 
pprint.pprint(pokemonSearch_squirtle)

{'abilities': [{'ability': {'name': 'torrent',
                            'url': 'https://pokeapi.co/api/v2/ability/67/'},
                'is_hidden': False,
                'slot': 1},
               {'ability': {'name': 'rain-dish',
                            'url': 'https://pokeapi.co/api/v2/ability/44/'},
                'is_hidden': True,
                'slot': 3}],
 'base_experience': 63,
 'cries': {'latest': 'https://raw.githubusercontent.com/PokeAPI/cries/main/cries/pokemon/latest/7.ogg',
           'legacy': 'https://raw.githubusercontent.com/PokeAPI/cries/main/cries/pokemon/legacy/7.ogg'},
 'forms': [{'name': 'squirtle',
            'url': 'https://pokeapi.co/api/v2/pokemon-form/7/'}],
 'game_indices': [{'game_index': 177,
                   'version': {'name': 'red',
                               'url': 'https://pokeapi.co/api/v2/version/1/'}},
                  {'game_index': 177,
                   'version': {'name': 'blue',
                               'url': 'htt

In [7]:
#get the keys of the dictionary
pokemonSearch.keys()

dict_keys(['abilities', 'base_experience', 'cries', 'forms', 'game_indices', 'height', 'held_items', 'id', 'is_default', 'location_area_encounters', 'moves', 'name', 'order', 'past_abilities', 'past_types', 'species', 'sprites', 'stats', 'types', 'weight'])

<!--  -->

## Excercise 1: 

Write code to find the Pokemon's abilities, store them as a list, and then print them out. (try a different pokemon)

Hint: A Pokeman can have multiple abilities, so you'll need to iterate over them.

In [10]:
pokemonSearch_squirtle['abilities']


[{'ability': {'name': 'torrent',
   'url': 'https://pokeapi.co/api/v2/ability/67/'},
  'is_hidden': False,
  'slot': 1},
 {'ability': {'name': 'rain-dish',
   'url': 'https://pokeapi.co/api/v2/ability/44/'},
  'is_hidden': True,
  'slot': 3}]

In [17]:
abilities_list = []

# iterate thru abilities of squirtle abilties dict
for thing in pokemonSearch_squirtle['abilities']:
    abilities_list.append(thing['ability']['name'])
print(abilities_list)

['torrent', 'rain-dish']


<!--  -->

## TVMaze

https://www.tvmaze.com/api#show-search

noteL futurama is show 538 

In [18]:
#make a querey for the show 'girls'
# tv maze is an api that lets u search for tv shows by name

requests.get('https://api.tvmaze.com/search/shows?q=girls')
# baseURL?=q=girls
# q=girls is the search term and will return info abt shows matching "girls"

<Response [200]>

In [19]:
#make a querey for the show  'girls'
girlsQ = requests.get('https://api.tvmaze.com/search/shows?q=girls').json()
girlsQ

[{'score': 0.9049741,
  'show': {'id': 139,
   'url': 'https://www.tvmaze.com/shows/139/girls',
   'name': 'Girls',
   'type': 'Scripted',
   'language': 'English',
   'genres': ['Drama', 'Romance'],
   'status': 'Ended',
   'runtime': 30,
   'averageRuntime': 30,
   'premiered': '2012-04-15',
   'ended': '2017-04-16',
   'officialSite': 'http://www.hbo.com/girls',
   'schedule': {'time': '22:00', 'days': ['Sunday']},
   'rating': {'average': 6.4},
   'weight': 98,
   'network': {'id': 8,
    'name': 'HBO',
    'country': {'name': 'United States',
     'code': 'US',
     'timezone': 'America/New_York'},
    'officialSite': 'https://www.hbo.com/'},
   'webChannel': None,
   'dvdCountry': None,
   'externals': {'tvrage': 30124, 'thetvdb': 220411, 'imdb': 'tt1723816'},
   'image': {'medium': 'https://static.tvmaze.com/uploads/images/medium_portrait/31/78286.jpg',
    'original': 'https://static.tvmaze.com/uploads/images/original_untouched/31/78286.jpg'},
   'summary': '<p>This Emmy winnin

In [21]:
#get futurama through show number
# make a new requests
futuramaQ = requests.get('https://api.tvmaze.com/search/shows?q=futurama').json()
len(futuramaQ)

1

In [19]:
len(futuramaQ) 
futuramaQ[0]['show']['id'] # show id of futurama

538

<!--  -->

### APIs with keys

An application programming interface (API) key is a code used to identify and authenticate an application or user. API keys are available through platforms, such as a white-labeled internal marketplace. They also act as a unique identifier and provide a secret token for authentication purposes.

#### What is an API querey?
Parameters are the variables passed to an API endpoint to provide explicit instructions for the API server to process. The parameters can be included as part of the API request in the URL query string or in the request body field

![how-to-use-an-api-just-the-basics-4.png](attachment:how-to-use-an-api-just-the-basics-4.png)


<!--  -->

## Last.fm Music Discovery API

The Last.fm API allows anyone to build their own programs using Last.fm data

- https://www.last.fm/api


In [None]:
# keys
# 

In [23]:
#this is a key
aKey = "815f527f75d594aa272fc6c9205136b2"

#the api root information is found here - https://www.last.fm/api/intro
rootURL = "http://ws.audioscrobbler.com/2.0/"

#write a querey
artistSearchQuery = requests.get(rootURL+"?method=artist.search&artist=eminem&api_key="+
                          aKey+"&format=json").json()
# params:
    # method = artist.search - tells api we want to search artists
    # &artist=eminem - tells api the search term is eminem

In [25]:
pprint.pprint(artistSearchQuery)

{'results': {'@attr': {'for': 'eminem'},
             'artistmatches': {'artist': [{'image': [{'#text': 'https://lastfm.freetls.fastly.net/i/u/34s/2a96cbd8b46e442fc41c2b86b821562f.png',
                                                      'size': 'small'},
                                                     {'#text': 'https://lastfm.freetls.fastly.net/i/u/64s/2a96cbd8b46e442fc41c2b86b821562f.png',
                                                      'size': 'medium'},
                                                     {'#text': 'https://lastfm.freetls.fastly.net/i/u/174s/2a96cbd8b46e442fc41c2b86b821562f.png',
                                                      'size': 'large'},
                                                     {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
                                                      'size': 'extralarge'},
                                                     {'#text': 'https://lastfm.f

In [30]:
#find the top albumn in eminem aritst query
    #artistSearchQuery['results'].keys()
    #artistSearchQuery['results']['artistmatches'].keys()

# 1 - make request for all top albums of eminem 
topAlbumSearchQuery = requests.get(rootURL+"?method=artist.getTopAlbums&artist=eminem&api_key="+
                          aKey+"&format=json").json()
# get top albums for eminem
# find all eminem albums 


In [31]:
topAlbumSearchQuery['topalbums']['album'][0]['name'] # 2 - slice to get top album

'The Eminem Show'

In [25]:
artistSearchQuery['results'] # this is a dictionary

{'opensearch:Query': {'#text': '',
  'role': 'request',
  'searchTerms': 'eminem',
  'startPage': '1'},
 'opensearch:totalResults': '116575',
 'opensearch:startIndex': '0',
 'opensearch:itemsPerPage': '30',
 'artistmatches': {'artist': [{'name': 'Eminem',
    'listeners': '7058898',
    'mbid': 'b95ce3ff-3d05-4e87-9e01-c97b66af13d4',
    'url': 'https://www.last.fm/music/Eminem',
    'streamable': '0',
    'image': [{'#text': 'https://lastfm.freetls.fastly.net/i/u/34s/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'small'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/64s/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'medium'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/174s/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'large'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'extralarge'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821

In [26]:
artistSearchQuery['results'].keys() 

dict_keys(['opensearch:Query', 'opensearch:totalResults', 'opensearch:startIndex', 'opensearch:itemsPerPage', 'artistmatches', '@attr'])

In [27]:
artistSearchQuery['results']['artistmatches'] # this is w/in the results dictionary 

{'artist': [{'name': 'Eminem',
   'listeners': '7058898',
   'mbid': 'b95ce3ff-3d05-4e87-9e01-c97b66af13d4',
   'url': 'https://www.last.fm/music/Eminem',
   'streamable': '0',
   'image': [{'#text': 'https://lastfm.freetls.fastly.net/i/u/34s/2a96cbd8b46e442fc41c2b86b821562f.png',
     'size': 'small'},
    {'#text': 'https://lastfm.freetls.fastly.net/i/u/64s/2a96cbd8b46e442fc41c2b86b821562f.png',
     'size': 'medium'},
    {'#text': 'https://lastfm.freetls.fastly.net/i/u/174s/2a96cbd8b46e442fc41c2b86b821562f.png',
     'size': 'large'},
    {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
     'size': 'extralarge'},
    {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
     'size': 'mega'}]},
  {'name': 'Eminem, Rihanna',
   'listeners': '21524',
   'mbid': '',
   'url': 'https://www.last.fm/music/Eminem,+Rihanna',
   'streamable': '0',
   'image': [{'#text': '', 'size': 'small'},
    {'#text

In [32]:
# get j eminem
#artistSearchQuery['results']['artistmatches'].

topalbum_artistSearchQuery = requests.get(rootURL+"?method=artist.getTopAlbums&artist=eminem&api_key="+
                          aKey+"&format=json").json() #?method=artist.getTopAlbums --what we changed 
topalbum_artistSearchQuery

{'topalbums': {'album': [{'name': 'The Eminem Show',
    'playcount': 66987229,
    'mbid': 'af71f60c-a8e8-4774-a2b3-30dbfaa13bd6',
    'url': 'https://www.last.fm/music/Eminem/The+Eminem+Show',
    'artist': {'name': 'Eminem',
     'mbid': 'b95ce3ff-3d05-4e87-9e01-c97b66af13d4',
     'url': 'https://www.last.fm/music/Eminem'},
    'image': [{'#text': 'https://lastfm.freetls.fastly.net/i/u/34s/74768435b4f70689863aa76f888d62a3.png',
      'size': 'small'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/64s/74768435b4f70689863aa76f888d62a3.png',
      'size': 'medium'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/174s/74768435b4f70689863aa76f888d62a3.png',
      'size': 'large'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/74768435b4f70689863aa76f888d62a3.png',
      'size': 'extralarge'}]},
   {'name': 'Recovery',
    'playcount': 44846260,
    'mbid': 'dddf01df-f9f1-4ba6-b414-5ddf1984fc7f',
    'url': 'https://www.last.fm/music/Eminem/Recovery',
    '

In [35]:
# get first album
topalbum_artistSearchQuery['topalbums']['album'][0]['name']

'The Eminem Show'

### Exercise 2

 make a search query for your own favorite artist

In [32]:
# Excercise: make a search query for your own favorite artist and get their top tracks 

# 1 - make request
topTracksSearchQuery = requests.get(rootURL+"?method=artist.getTopTracks&artist=RedHotChiliPeppers&api_key="+
                          aKey+"&format=json").json() #?method=artist.getTopTracks --what we changed 
topTracksSearchQuery

{'toptracks': {'track': [{'name': 'Californication',
    'playcount': '22799920',
    'listeners': '2630466',
    'mbid': '084a24a9-b289-4584-9fb5-1ca0f7500eb3',
    'url': 'https://www.last.fm/music/Red+Hot+Chili+Peppers/_/Californication',
    'streamable': '0',
    'artist': {'name': 'Red Hot Chili Peppers',
     'mbid': '8bfac288-ccc5-448d-9573-c33ea2aa5c30',
     'url': 'https://www.last.fm/music/Red+Hot+Chili+Peppers'},
    'image': [{'#text': 'https://lastfm.freetls.fastly.net/i/u/34s/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'small'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/64s/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'medium'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/174s/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'large'},
     {'#text': 'https://lastfm.freetls.fastly.net/i/u/300x300/2a96cbd8b46e442fc41c2b86b821562f.png',
      'size': 'extralarge'}],
    '@attr': {'rank': '1'}},
   {'name': 'Under the Bridge',
    

In [33]:
# 2 - get top tracks 
topTracksSearchQuery['toptracks']['track'][0]['name']

'Californication'

In [38]:
# print top 5 top tracks 

# 1 - make emptpy list for tracks
myList = []

# 2 - iterate through the tracks in the request
for track in topTracksSearchQuery['toptracks']['track']:
    myList.append(track['name'])

# 3 - get only top 5 
print(myList[:5])

['Californication', 'Under the Bridge', 'Scar Tissue', "Can't Stop", 'Otherside']


## National Park Service API

https://www.nps.gov/subjects/developer/index.htm

In [40]:
# authentication
#https://www.nps.gov/subjects/developer/guides.html

# set base url and authenticatoin
apiKey = "lLqHaJEKm2wfhbVIZVVSPrUxkBxWKbDj0GcgplEk" 

# request call
baseURL = "https://developer.nps.gov/api/v1"
HEADERS = {"X-Api-Key":apiKey} 
request = requests.get(baseURL+"/campgrounds",headers=HEADERS).json()

# api key is in a parameter of the request 

In [41]:
pprint.pprint(request)

{'data': [{'accessibility': {'accessRoads': ['Paved Roads - All vehicles OK'],
                             'adaInfo': 'The main road leading to the '
                                        'campground is paved but the road that '
                                        'goes to each campsite is not.',
                             'additionalInfo': '',
                             'cellPhoneInfo': '',
                             'classifications': ['Limited Development '
                                                 'Campground'],
                             'fireStovePolicy': 'Ground fires are not '
                                                'permitted. Each campsite has '
                                                'a grill.',
                             'internetInfo': '',
                             'rvAllowed': '1',
                             'rvInfo': 'RV and Trailers are permitted',
                             'rvMaxLength': '0',
                             

In [61]:
# get park code amis


# request call
# get all parks 
baseURL = "https://developer.nps.gov/api/v1"
HEADERS = {"X-Api-Key":apiKey} 
request = requests.get(baseURL+"/parks",headers=HEADERS).json()

In [62]:
request['data'][0]['parkCode']

'abli'

In [63]:
pprint.pprint(request)

# iterate thru first ahlf of the parks 
for i in range(25):
    
    # checking each park for its park code
    if request['data'][i]['parkCode'] == "amis":
        print(i)
        
        # print all the data for that park if its park code is "amis"
        print((req['data'][i]))

{'data': [{'activities': [{'id': '13A57703-BB1A-41A2-94B8-53B692EB7238',
                           'name': 'Astronomy'},
                          {'id': 'D37A0003-8317-4F04-8FB0-4CF0A272E195',
                           'name': 'Stargazing'},
                          {'id': '1DFACD97-1B9C-4F5A-80F2-05593604799E',
                           'name': 'Food'},
                          {'id': 'C6D3230A-2CEA-4AFE-BFF3-DC1E2C2C4BB4',
                           'name': 'Picnicking'},
                          {'id': 'B33DC9B6-0B7D-4322-BAD7-A13A34C584A3',
                           'name': 'Guided Tours'},
                          {'id': 'A0631906-9672-4583-91DE-113B93DB6B6E',
                           'name': 'Self-Guided Tours - Walking'},
                          {'id': '42FD78B9-2B90-4AA9-BC43-F10E9FEA8B5A',
                           'name': 'Hands-On'},
                          {'id': 'DF4A35E0-7983-4A3E-BC47-F37B872B0F25',
                           'name': 'Junior Ranger Progra

NameError: name 'req' is not defined

In [64]:
# make a list of all activiies you can do in the park
for activity in request['data'][15]['activities']:
    print(activity['name'])

Boating
Motorized Boating
Sailing
Jet Skiing
Camping
Backcountry Camping
Canoe or Kayak Camping
Car or Front Country Camping
Group Camping
RV Camping
Fishing
Freshwater Fishing
Hiking
Front-Country Hiking
Hunting and Gathering
Hunting
Paddling
Canoeing
Kayaking
Junior Ranger Program
SCUBA Diving
Swimming
Freshwater Swimming
Water Skiing
Wildlife Watching
Birdwatching
Park Film
Shopping
Bookstore and Park Store


## Making Functions

## Wikipedia API

In [67]:
## Run this cell block
baseURL = "https://en.wikipedia.org/w/api.php" #base URL for the wikipedia API

In [68]:
search = 'microsoft' #the parameter we'll search for

testRequest = requests.get(baseURL+"?+action=query&list=search&srsearch="+search+"&format=json").json() #our json request

In [69]:
testRequest

{'batchcomplete': '',
 'continue': {'sroffset': 10, 'continue': '-||'},
 'query': {'searchinfo': {'totalhits': 43596},
  'search': [{'ns': 0,
    'title': 'Microsoft',
    'pageid': 19001,
    'size': 228960,
    'wordcount': 19724,
    'snippet': '<span class="searchmatch">Microsoft</span> Corporation is an American multinational corporation and technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the',
    'timestamp': '2025-06-27T12:10:23Z'},
   {'ns': 0,
    'title': 'Microsoft Excel',
    'pageid': 20268,
    'size': 103566,
    'wordcount': 9663,
    'snippet': '<span class="searchmatch">Microsoft</span> Excel is a spreadsheet editor developed by <span class="searchmatch">Microsoft</span> for Windows, macOS, Android, iOS and iPadOS. It features calculation or computation capabilities',
    'timestamp': '2025-06-17T06:47:53Z'},
   {'ns': 0,
    'title': 'Microsoft Office',
    'pageid': 20288,
    'size': 198060,
    'wordcount': 16053,
    'snippet': 'Bi

In [71]:
# define a function
"https://en.wikipedia.org/w/api.php"

# will request the wiki api to search for pages matching a term
# search is the word you want to look up
def wikiCall(search):
    #d efine the baseURL
    baseURL = "https://en.wikipedia.org/w/api.php" 
    
    # build the query
    q = requests.get(baseURL+"?+action=query&list=search&srsearch="+search+"&format=json").json() #our json request
        # ?+action=query - tells API you want to do a query
        # &list=search - tells it you want to search for pages
        # & srsearch=___ - tells it this is the search term you are looking for 
        #
    return q


In [75]:
wikiCall("Boulder")

{'batchcomplete': '',
 'continue': {'sroffset': 10, 'continue': '-||'},
 'query': {'searchinfo': {'totalhits': 35695,
   'suggestion': 'bolder',
   'suggestionsnippet': 'bolder'},
  'search': [{'ns': 0,
    'title': 'Boulder',
    'pageid': 60784,
    'size': 3827,
    'wordcount': 327,
    'snippet': 'In geology, a <span class="searchmatch">boulder</span> (or rarely bowlder) is a rock fragment with size greater than 25.6\xa0cm (10.1\xa0in) in diameter. Smaller pieces are called cobbles and',
    'timestamp': '2025-05-31T07:03:20Z'},
   {'ns': 0,
    'title': 'Boulder, Colorado',
    'pageid': 94341,
    'size': 101310,
    'wordcount': 8275,
    'snippet': '<span class="searchmatch">Boulder</span> is a home rule city in <span class="searchmatch">Boulder</span> County, Colorado, United States, and its county seat. With a population of 108,250 at the 2020 census, it is the',
    'timestamp': '2025-06-24T18:22:05Z'},
   {'ns': 0,
    'title': '2021 Boulder shooting',
    'pageid': 671831

In [69]:
wikiCall("Dog") # pings the api

{'batchcomplete': '',
 'continue': {'sroffset': 10, 'continue': '-||'},
 'query': {'searchinfo': {'totalhits': 124204,
   'suggestion': 'do',
   'suggestionsnippet': 'do'},
  'search': [{'ns': 0,
    'title': 'Dog',
    'pageid': 4269567,
    'size': 192447,
    'wordcount': 17696,
    'snippet': 'The <span class="searchmatch">dog</span> (Canis familiaris or Canis lupus familiaris) is a domesticated descendant of the gray wolf. Also called the domestic <span class="searchmatch">dog</span>, it was selectively bred',
    'timestamp': '2025-06-21T22:30:22Z'},
   {'ns': 0,
    'title': 'Dog (disambiguation)',
    'pageid': 2854454,
    'size': 7831,
    'wordcount': 1059,
    'snippet': 'Look up <span class="searchmatch">dog</span>, doggy, or doggie in Wiktionary, the free dictionary. The <span class="searchmatch">dog</span> is a domesticated canid species, Canis familiaris. <span class="searchmatch">Dog</span>(s), doggy, or doggie may',
    'timestamp': '2025-03-16T12:09:44Z'},
   {'ns': 

### Exercise 3: 

call the same API and pass another variable as a search term

In [79]:
wikiCall("Chicago")

{'batchcomplete': '',
 'continue': {'sroffset': 10, 'continue': '-||'},
 'query': {'searchinfo': {'totalhits': 325000},
  'search': [{'ns': 0,
    'title': 'Chicago',
    'pageid': 6886,
    'size': 265332,
    'wordcount': 22806,
    'snippet': '<span class="searchmatch">Chicago</span> is the most populous city in the U.S. state of Illinois and in the Midwestern United States. With a population of 2,746,388, as of the 2020 census',
    'timestamp': '2025-06-28T06:01:04Z'},
   {'ns': 0,
    'title': 'South Chicago, Chicago',
    'pageid': 1422349,
    'size': 29082,
    'wordcount': 2884,
    'snippet': 'South <span class="searchmatch">Chicago</span>, formerly known as Ainsworth, is one of the 77 community areas of <span class="searchmatch">Chicago</span>, Illinois. This chevron-shaped community is one of <span class="searchmatch">Chicago&#039;s</span> 16 lakefront',
    'timestamp': '2025-05-16T12:59:03Z'},
   {'ns': 0,
    'title': 'Chicago Lawn, Chicago',
    'pageid': 1892224,
    

### Exercise 4: 
Create a function that searches for articles like in the previous two questions, but this time return the article with the highest word count. This should return the title, pageid, snippet and wordcount from the article with the highest word count from your search.

In [88]:
# define a function
"https://en.wikipedia.org/w/api.php"
def wikiCall_highestWordCount(search):

    # make counter to track highest word count found so far 
    wordCounter = 0
    
    #define the baseURL
    baseURL = "https://en.wikipedia.org/w/api.php" 
    
    # build a query
    q = requests.get(baseURL+"?+action=query&list=search&srsearch="+search+"&format=json").json()
    
    # get wordcount of each thing
    # loop thru each search result in the JSON
        # q['query']['search'] - the list of articles
    for article in q['query']['search']: 
        # get word count for article and assign to variable
        wordCount = article['wordcount']

        # if article's word count is bigger than the others, set it at word count 
        if wordCounter < wordCount:
            wordCounter = wordCount
            
            wordCount = article['wordcount']
            title = article['title']
            pageid = article['pageid']
            snippet = article['snippet']
            print(title)
            print(wordCount)
        else: # if this article did not beat word count, do nothing 
            pass
    return "The article with the highest count is: " + title
   


In [89]:
wikiCall_highestWordCount("dog")

Dog
17696


'The article with the highest count is: Dog'