# Pulling data from (open) REST APIs

[Big source of public APIs](https://rapidapi.com/collection/list-of-free-apis)

We have already seen how to use `requests` to fetch a webpage:

In [2]:
import requests
r = requests.get('https://www.cnn.com')
r.text[0:300]

'<!DOCTYPE html><html class="no-js"><head><meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"><meta charset="utf-8"><meta content="text/html" http-equiv="Content-Type"><meta name="viewport" content="width=device-width, initial-scale=1.0, minimum-scale=1.0"><link rel="dns-prefetch" href="/op'

If the URL is to a page that gives you HTML, we would say that we are fetching a webpage. On the other hand, if the URL is returning data in some form, we would say that we are accessing a *REST* api.
 
**REST** is an acronym for *REpresentational State Transfer* and is a very handy way to make something trivial sound very complicated.  Anytime you see the word REST, just think "webpage that gives me data not HTML." There is a massive industry and giant following behind this term but I don't see anything beyond "fetch data from webpage".

Anyway, we are going to pull data from web servers that intentionally provide nice data spigot URLs. Information you need in order to get data is typically:

* Base URL, including machine name, port number, and "file" path
* The names and values of parameters
* What data comes back and in what format (XML, JSON, CSV, ...)

## Looking up word definitions

The [dictionaryapi.dev](https://dictionaryapi.dev/) API lets us look up words in various languages and get the definitions. The format of the URL to access the API is just:

```
https://api.dictionaryapi.dev/api/v2/entries/<language_code>/<word>
```

So, we can get the English definition for *science* like this (and parse the json result):

In [4]:
import requests
import json

r = requests.get('https://api.dictionaryapi.dev/api/v2/entries/en_US/science')
r.text[0:200]

'[{"word":"science","phonetic":"ˈsʌɪəns","phonetics":[{"text":"ˈsʌɪəns","audio":"//ssl.gstatic.com/dictionary/static/sounds/20200429/science--_gb_1.mp3"}],"origin":"Middle English (denoting knowledge):'

In [7]:
data = json.loads(r.text)
print(data[0]['word'], data[0]['phonetic'])
data

science ˈsʌɪəns


[{'word': 'science',
  'phonetic': 'ˈsʌɪəns',
  'phonetics': [{'text': 'ˈsʌɪəns',
    'audio': '//ssl.gstatic.com/dictionary/static/sounds/20200429/science--_gb_1.mp3'}],
  'origin': 'Middle English (denoting knowledge): from Old French, from Latin scientia, from scire ‘know’.',
  'meanings': [{'partOfSpeech': 'noun',
    'definitions': [{'definition': 'the intellectual and practical activity encompassing the systematic study of the structure and behaviour of the physical and natural world through observation and experiment.',
      'example': 'the world of science and technology',
      'synonyms': ['branch of knowledge',
       'area of study',
       'discipline',
       'field'],
      'antonyms': []}]}]}]

The JSON looks like the following when formatted in the browser (I think I have a JSON viewer plug-in).

<img src="figures/dictionary-science-json.png" width="400">

That looks like there is a list with one element, which is the actual dictionary of stuff we want so `data[0]` is the dictionary of stuff. This lets us get access to the definition and phonetics if we dig down.

In [15]:
data = data[0]

In [19]:
phonetic = data['phonetic']
sciencedef = data['meanings'][0]['definitions'][0]['definition']
print(phonetic)
print(sciencedef)

ˈsʌɪəns
the intellectual and practical activity encompassing the systematic study of the structure and behaviour of the physical and natural world through observation and experiment.


**Exercise**:  Print out the origin of the word science from that JSON.

**Exercise**: Use the API to fetch and print out the definition of *Merhaba* (a greeting) in the Turkish language. The result should be *karşılaşıldığında söylenilen bir selamlaşma sözü.* (*a word of greeting when encountered.*)  The Turkish language code is `tr`.

## JSON from openpayments.us

(This site seems to go down a lot when they reboot our computer science machine so forgive me if it's not up...)

Now, let's look at a website that will give us JSON data: [www.openpayments.us](http://www.openpayments.us).
 
There is a REST data API available at URL template:

```
URL = f"http://openpayments.us/data?query={q}" # for some q
```
**Exercise**: Use `curl` to fetch data about a doctor.

Here's how to fetch the data for a doctor's name, such as `John Chan`:

In [8]:
import requests
import json
import sys

name = "John Chan"
URL = f"http://openpayments.us/data?query={name}"

r = requests.get(URL)
data = json.loads(r.text)

print(json.dumps(data)[0:1000])

{"query": "John AND Chan", "lucenequery": "+_composite:john +_composite:chan", "hits": 10, "page": 1, "pagesize": 50, "sortby": null, "reverse": false, "cols": ["Payment ID", "Vendor", "Date", "Amount", "Dr Name", "Nature of Payment", "Payment Type", "Dr Addr", "Dr Type", "Dr License", "Hospital", "Re Drug", "Re: Device", "Travel", "# Payments", "Submitter", "Year", "Publ Date", "Dr or Hosp", "Prod. Indicator", "Disputed?", "Dr Ownership", "3rd Party Recipient", "3rd Party", "Charity?", "3rd Party is Recipient", "Context", "Publ Delay"], "numresults": 10, "results": [{"Payment ID": "42285", "Vendor": "Shire US Holdings PA", "Date": "2013/08/26", "Amount": "124.99", "Dr Name": "JOHN CHAN", "Nature of Payment": "Food and Beverage", "Payment Type": "In-kind items and services", "Dr Addr": "8700 BEVERLY BLVD RM 5512 LOS ANGELES CA 90048 United States", "Dr Type": "DPM Internal Medicine", "Dr License": "CA", "Re: Device": "DERMAGRAFT", "# Payments": "1", "Submitter": "Shire US Holdings", "Y

This website gives you JSON, which is very easy to load and dump using the default `json` package as you can see from that code snippet. As before, you can grab one of the elements using dictionary like indexing:

In [26]:
results = data['results']
results[0:2]

[{'Payment ID': '42285',
  'Vendor': 'Shire US Holdings PA',
  'Date': '2013/08/26',
  'Amount': '124.99',
  'Dr Name': 'JOHN CHAN',
  'Nature of Payment': 'Food and Beverage',
  'Payment Type': 'In-kind items and services',
  'Dr Addr': '8700 BEVERLY BLVD RM 5512 LOS ANGELES CA 90048 United States',
  'Dr Type': 'DPM Internal Medicine',
  'Dr License': 'CA',
  'Re: Device': 'DERMAGRAFT',
  '# Payments': '1',
  'Submitter': 'Shire US Holdings',
  'Year': '2013',
  'Publ Date': '2014/09/29',
  'Dr or Hosp': 'Dr',
  'Prod. Indicator': 'Covered',
  'Disputed?': 'No',
  'Dr Ownership': 'No',
  '3rd Party Recipient': 'No',
  'Publ Delay': 'No'},
 {'Payment ID': '683301',
  'Vendor': 'AstraZeneca Pharmaceuticals LP DE',
  'Date': '2013/09/10',
  'Amount': '12.1',
  'Dr Name': 'John Chan',
  'Nature of Payment': 'Food and Beverage',
  'Payment Type': 'In-kind items and services',
  'Dr Addr': '2111 Geer Rd Suite 500 Turlock CA 95382-2458 United States',
  'Dr Type': 'DDS Dental Providers/ Den

It is convenient to look at the records in a data frame:

In [29]:
import pandas as pd
pd.DataFrame.from_dict(results).head(3)

Unnamed: 0,Payment ID,Vendor,Date,Amount,Dr Name,Nature of Payment,Payment Type,Dr Addr,Dr Type,Dr License,Re: Device,# Payments,Submitter,Year,Publ Date,Dr or Hosp,Prod. Indicator,Disputed?,Dr Ownership,3rd Party Recipient,Publ Delay,Charity?,Context,3rd Party,3rd Party is Recipient,Travel
0,42285,Shire US Holdings PA,2013/08/26,124.99,JOHN CHAN,Food and Beverage,In-kind items and services,8700 BEVERLY BLVD RM 5512 LOS ANGELES CA 90048...,DPM Internal Medicine,CA,DERMAGRAFT,1,Shire US Holdings,2013,2014/09/29,Dr,Covered,No,No,No,No,,,,,
1,683301,AstraZeneca Pharmaceuticals LP DE,2013/09/10,12.1,John Chan,Food and Beverage,In-kind items and services,2111 Geer Rd Suite 500 Turlock CA 95382-2458 U...,DDS Dental Providers/ Dentist,CA,,1,AstraZeneca Pharmaceuticals LP,2013,2014/09/29,Dr,Covered,No,No,No,No,No,Informational Meal,,,
2,5406946,"Otsuka America Pharmaceutical, Inc. MD",2013/11/12,23.04,JOHN CHAN,Food and Beverage,In-kind items and services,8700 BEVERLY BLVD RM 5512 LOS ANGELES CA 90048...,MD Internal Medicine,CA,,1,"Otsuka America Pharmaceutical, Inc.",2013,2014/09/29,Dr,Covered,No,No,No,No,,,,,


A **technical detail** related to valid strings you can include as part of a URL.  Spaces are not allowed so `John Chan` has to be encoded or "quoted".  Fortunately, `requests` does this automatically for us. If you ever need to quote parameter values in URLs, you can do this:

```python
from urllib.parse import quote
value = quote(value)
```

Because `&` is the separator between parameters, it is also invalid in a parameter name or value. Here are some example conversions:

```python
>>> quote("john chan")
'john%20chan'
>>> quote("john&chan")
'john%26chan'
```

The conversion uses the ASCII character code (in 2-digit hexadecimal) for space and ampersand. Sometimes you will see the space converted to a `+`, which also works: `John+Chan`.