## Interacting with APIs

Python is useful as a tool for interacting with Application Programming Interfaces (APIs). APIs are basically a way for two programs to communicate with one another. One program being perhaps a Python script, the other being a database on the web. 

Generally, the modern web runs on APIs. Consider the National Weather Service data api, which almost all weather apps use to some extent within their own apps.

"Modern" APIs are usually [REST](https://www.google.com/search?q=rest+api&rlz=1C1GCEA_enUS886US886&oq=rest+api&aqs=chrome..69i57j0l5j69i60l2.2887j0j4&sourceid=chrome&ie=UTF-8). Older ones are often [SOAP](https://www.google.com/search?q=soap+api&rlz=1C1GCEA_enUS886US886&oq=soap+api&aqs=chrome..69i57j69i59j35i39j0j46j0j69i60j69i61.1352j0j9&sourceid=chrome&ie=UTF-8). REST primarily exchanges data in JSON format (but can also do XML). SOAP exchanges data in XML. 

REST APIs are *FAR* easier to work with in Python. This is because you can run a query simply with a URL. For example: https://api.crossref.org/works/10.1038/nature02847. The API lives at the base URL, which is https://api.crossref.org/works/. The query parameter in this case is simply the DOI of an article tacked on to the end. But, it can be more complex with multiple parameters. 

*Tip: if you use Chrome as your browser, there is an extensions named JSON Viewer that formats JSON in your browser to make it easy to read. It's a must for this stuff*

Forming the REST query is simple. Just take a string that represents your query parameters and concatenate it to the base URL. Then you can send the message. If you have a list that represents your query parameters, then you can do this in a loop and send them all of one at a time. We'll walk through this below. 

SOAP APIs work with XML. Rather than sending a query formed as a URL, a SOAP API only accepts queries that are formatted as XML. So, you send an XML message to the API's server and it returns an XML message. In SOAP parlance, you would call this an envelope. You send your query as an XML SOAP envelope, and the response comes back as XML in a SOAP envelope. 

Forming a SOAP query in an XML envelope is a little more challenging. Fortunately, there are some Python libraries that make this easier. I have used the library SUDS. SUDS bascially takes your parameters and forms the xml envelope behind the scenes and sends the message. 

#### We can tackle SOAP later. This notebook covers REST. Know that the general concept is the same.

The process is:
    - Form the query
    - send the message
    - parse the xml or json response
    - write the data into a format that makes sense for us hoomans to read, like a csv
    
To send the queries, we'll use the [requests](https://requests.readthedocs.io/en/master/) library. Then, we'll use json and csv to work with the response. Aside from sending the requeset, everything covered here was introduced in the earlier notebooks.
    
    >>> import requests
    >>> import json
    >>> import csv

### We'll query the CrossRef API. 

This API has loads of different search options. See the full documentation [here](https://github.com/CrossRef/rest-api-doc).

We'll keep it simple and ask it for information about individual papers via the papers' DOIs.

Start by creating a string variable of the base url:
    
    >>> url = 'https://api.crossref.org/works/'

#### We'll start with just one paper.

    >>> doi = '10.1038/nature02847'

#### Now we'll concatenate these two together to form our query:

This is super easy!

    >>> query = url + doi

#### print it...

    >>> print (query)

#### Cool! Jupyter is smart enough to recognize it is a link and displayed it as such.

#### Next, we'll use the requests function get() to retrieve the url:

    >>> requests.get(query)

#### Response 200 means it worked. 404 or something else means it went wrong.

What we need to do here is store this in a variable:

    >>> response = requests.get(query)

### Now comes the fun part! 
We will read the contents of the response as JSON and parse it using the json module.

Note that in the previous notebook, we used json.load() to read the json. load() reads data encoded in bytes. But the CrossRef API returns json as string, so we'll use json.loads() instead (The s meaning string).

    >>> data = json.loads(response.content.decode('utf-8'))
    
*Note: response.content means we're reading the content of the response. .decode means we're reading it as a string encoded in utf-8. Encoding is a frequent issue when dealing with string data. Reading and writing it in utf-8 ensures unusual characters like diacritics and cyrillics don't wreck everything*

#### Nothing happen? Great, that means no errors. 

run just data to view the json:

    >>> data

Notice this JSON is a bit more complex than the book catalog example.

#### Time to parse!
A good approach is usually to start working through the heirarchy, starting from the top... Try taking a look at some of the different elements. Experiment for a minute:

    >>> data['status']
    >>> data['publisher']
    >>> data['license']
    >>> data['message']
   
    ...etcetera
    
Remember some of the methods we used previously to access different elements within the hierarchy... *(hint: use index positions)* Refer back to the json notebook for help.

#### Under what key are the data we're most interested in?

    >>> data['message']

#### Let's say we're going after cited references. 
These are found under ['message']['reference']. So, let's start by storing just the reference key in a variable on its own:

    >>> refs = data['message']['reference']
    
View them:

    >>> refs

#### Here, you could select out the elements of each citation you wish to keep. Perhaps all of them. 
Notice that CrossRef provides things like the doi of the cited ref, the author (or at least the first author), the year, the journal title (sort of) and then an unstructured full citation. 

**Create a for loop** that iterates over each cited ref and prints the different elements. Again, this follows the same general concept as the loops we used in previous notebooks to parse XML or JSON:

    >>> for ref in refs:
            auth = ref['author']
            year = ref['year]
            journ = ref['journal-title']
            vol = ref['volume']
            unst = ref['unstructured']
            print(auth, year, journ, vol, unst)

#### Oh no. Headache time. What happened here?
The first three worked fine, and then something went awry. 

Observe the fourth reference, and note that it does not have author, year, journal, and many other fields. Python couldn't find the author field in the fourth reference, so it threw an error and quit. It does seem to have the unstructured field. 

#### We'll have to insert some logic into the code using if and else statements. Remember that?

Let's test this on some known entities. Create a variable just for reference number 1 (0 index position) and reference 4 (3 index position) and we'll test our logic:

    >>> ref1 = refs[0]
    >>> ref4 = refs[3]

#### View them both:

    >>> ref1

    >>> ref4

#### Now we'll start developing an if else sequence to test if certain elements are present in the reference

Start by testing if 'article' is present in ref1. The syntax to check if an element is in a Python dictionary is as follows:

    >>> if 'author' in ref1:
            print('yes')
        else:
            print('not present')

**Note:** There are *many* ways to use [if else sequences in Python](https://www.google.com/search?q=python+if+else+sequence&rlz=1C1GCEA_enUS886US886&oq=python+if+else+sequence&aqs=chrome..69i57.3239j0j4&sourceid=chrome&ie=UTF-8). In our earlier exercise, we tested the value of something using operators, but this time we just checked the presence of a thing.

#### Now run the same test on ref4:

    >>> if 'author' in ref4:
            print('yes')
        else:
            print('not present')

#### Great. We can deploy this general logic sequence when creating variables for each individual element we're parsing.

We would say, if present create a variable from the element, if not present, create that variable with a string value of 'NA'

    >>> for ref in refs:
            if 'author' in ref:
                auth = ref['author']
            else:
                auth = 'NA'
        
            print(auth)

#### Now we just need to build this logic into the rest of the variables we're creating...

**Build** the if else sequence into all of the varibles in your loop

It will start like this:

    >>> for ref in refs:
            if 'author' in ref:
                auth = ref['author']
            else:
                auth = 'NA'
            
            if 'year' in ref:
                year = ref['year']
            else:
                'year' = 'NA'
    
    etc...
        
        
        print(print(auth, year, journ, vol, unst)
        
        
*Tip: watch your indenting and colons*

### Problem solved--Headache over!

Now we just need to wrap this for loop into our csv writer as we have done in earlier notebooks. 

Try to script a process that will write each citation for a paper into a csv file. 

## I believe in your success. 

#### Give it a try:

### Feel like another challenge? 

#### If that was too #basic try wrapping your parsing loop in another for loop that iterates over a list of four DOIs. Here is a list:


In [103]:
doiList = ['10.1002/2016JE005244',  
           '10.1038/s41561-017-0015-2',
           '10.1029/2018GL078011',
           '10.1002/2017GL074002']

#### What you'll need to build here is a for loop that will first iterate over each doi, concatenate that doi with the url variable, send each query, read each response as json, then iterate over the response to parse out the elements of the json we want and write it to a csv.

You have almost all of the code above to accomplish this, you just need to put it all together.

It will start something like this:

    >>> for doi in doiList:
        ...?