# An API explainer notebook

We use this notebook to explain APIs. APIs make it easy for us to download data from webpages in a format that we can work with. Let's see how!


Let's say we want to download the Oppenheimer page https://en.wikipedia.org/wiki/J._Robert_Oppenheimer

How can we download a webpage? If we ask the web for help, a [top reply](https://stackoverflow.com/questions/22676/how-to-download-a-file-over-http) suggests the following code should work

In [1]:
import urllib.request

url = 'https://en.wikipedia.org/wiki/J._Robert_Oppenheimer'
response = urllib.request.urlopen(url)
data = response.read()      # a `bytes` object
text = data.decode('utf-8')

> Try replacing the example webpage address with the Wikipedia address we would like to download

> Now, print the _text_ variable below

In [2]:
text



There might be some system in this text, but it is difficult to see. If we want to work with what we just downloaded, we will spend _a lot_ of time cleaning the data.

APIs let us interact with webpages in an _ordered_ fashion. Let's use the Wikipedia API to get the page we are interested in.

We can get the page in an easy-to-work-with format, by typing in a new address created from a few base ingredients (see [the API quick start guide](https://www.mediawiki.org/wiki/API:Main_page)) such as the API _baseurl_, an _action_, a _data format_, and more:

In [3]:
baseurl = "https://en.wikipedia.org/w/api.php?"
action = "action=query"
title = "titles=J._Robert_Oppenheimer"
content = "prop=revisions&rvprop=content"
dataformat ="format=json"

query = "{}{}&{}&{}&{}".format(baseurl, action, content, title, dataformat)
print(query)

https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&titles=J._Robert_Oppenheimer&format=json


> Try following the _query_ link. This is a webpage, but structured in a way that makes it easy for us to work with when we download it with Python.
> Explore the structure of the page. How do you get to the actual content of the page?

Now, let's download the nicely-structured page with Python. We do exactly what we did at the top of this notebook.

In [4]:
wikiresponse = urllib.request.urlopen(query)
wikidata = wikiresponse.read()
wikitext = wikidata.decode('utf-8')

In [7]:
import json
json.loads(wikitext)

{'batchcomplete': '',
  'revisions': {'*': 'Because "rvslots" was not specified, a legacy format has been used for the output. This format is deprecated, and in the future the new format will always be used.'}},
 'query': {'normalized': [{'from': 'J._Robert_Oppenheimer',
    'to': 'J. Robert Oppenheimer'}],
  'pages': {'39034': {'pageid': 39034,
    'ns': 0,
    'title': 'J. Robert Oppenheimer',
    'revisions': [{'contentformat': 'text/x-wiki',
      'contentmodel': 'wikitext',

This might not look much better than what we had at first. But what we have now is a dictionary with the same structure as the ordered webpage provided by the API. See:

In [8]:
print("keys:",json.loads(wikitext).keys())
print("one level deeper:",json.loads(wikitext)["query"])



> Now explore the dictionary structure. Can you find the page content again?

> Also download the source for your 4 favorite wikipedia pages and explore their structure.

To sum up: 
- The web has _a lot_ of content that could be cool to work with. 
- APIs make it possible for us to download content in a structure that we can work with.
- APIs are great!