## The requests library

The easiest way to make API requests in Python is with the aptly-named [requests](https://2.python-requests.org/en/master/) library. This library is not built-in to the Python programming language, and usually, you would need to download and install it in order to use it. However, because we're doing all our development using Colab, this is not necessary. Because requests is such a popular library, Google has pre-installed it in the Colab environment.

To start using the requests library you need to import it.


In [9]:
import requests

To experiment with creating an HTTP request let us try to perform the same request to GitHub from the previous checkpoint. Recall that a request was made to the endpoint:

```
https://api.github.com/search/repositories?q=tetris+language:assembly&sort=stars&order=desc
```

Let's break this down a bit. The base URL, or domain, is **https://api.github.com**. The endpoint path is **/search/repositories**. The query string is **q=tetris+language:assembly&sort=stars&order=desc**. To break out that query string into the component parts:

key | value | Meaning
--- | --- | ---
q | tetris+language:assembly | The search term. Search for all repositories with the word tetris written with assembly language
sort | stars | Sort the results by the number of stars
order | desc | Sort in descending order, that is, from highest to lowest

To make a GET request to this endpoint we can use the `requests.get()` function.

```
requests.get(url, params=None)
```

Here, the `url`	is comprised of the base URL plus the endpoint path (`https://api.github.com/search/repositories`). 
`params` is an optional argument that defaults to `None`. If you need to include `params`, it would be a dictionary.

The `requests.get` method returns a response object.

To pass the three parameters, we put them in the dictionary `query`. We can put together the full URL from the base URL and the path.

In [10]:
query = {'q': 'tetris+language:assembly', 'sort': 'stars', 'order': 'desc'}
url = 'https://api.github.com/search/repositories'

Now, all the parts of the request are in place we can make the actual call.

In [11]:
response = requests.get(url, params=query)

The variable named `response` now contains a Response object that encapsulates the HTTP Response. A response object is made up of the following:

Property | Meaning | Response Field Name
--- | --- | ---
Status Code |An HTTP status code. | status_code
Text | Content of the response in unicode | text
Ok | Boolean value that indicates that the request was successful | ok
json | JSON encoded content of the response, if the server responded with JSON | json()
Headers | A dictionary of response headers | headers
Content | The raw content of the response | content

HTTP status codes are how web servers indicate the outcome of a request. Have you ever tried to visit a web page and gotten a **404 Not Found** error instead? That **404** is the status code that indicates that the requested resource is not on the server. There are many status codes defined in the HTTP standard. The [MDN](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status) website lists all status codes and their meanings. For now, status codes in the 200 range are all success codes.

Let us see what status code we got from our request.

In [12]:
response.status_code

200

**200**, great! That means that the request was successful. Some servers respond with different success codes at times so we cannot always depend on just 200. So rather we use the **ok** property to check if the request was successful. We can always check if **ok** is true before we proceed.

In [13]:
response.ok

True

Since we know from the documentation that GitHub responds with JSON formatted data, we can then get this data from the response object with a call to the `json()` method. Let's see what that response looks like.

In [14]:
data = response.json()
data

{'total_count': 318,
 'incomplete_results': False,
 'items': [{'id': 68911683,
   'node_id': 'MDEwOlJlcG9zaXRvcnk2ODkxMTY4Mw==',
   'name': 'tetros',
   'full_name': 'daniel-e/tetros',
   'private': False,
   'owner': {'login': 'daniel-e',
    'id': 5294331,
    'node_id': 'MDQ6VXNlcjUyOTQzMzE=',
    'avatar_url': 'https://avatars2.githubusercontent.com/u/5294331?v=4',
    'gravatar_id': '',
    'url': 'https://api.github.com/users/daniel-e',
    'html_url': 'https://github.com/daniel-e',
    'followers_url': 'https://api.github.com/users/daniel-e/followers',
    'following_url': 'https://api.github.com/users/daniel-e/following{/other_user}',
    'gists_url': 'https://api.github.com/users/daniel-e/gists{/gist_id}',
    'starred_url': 'https://api.github.com/users/daniel-e/starred{/owner}{/repo}',
    'subscriptions_url': 'https://api.github.com/users/daniel-e/subscriptions',
    'organizations_url': 'https://api.github.com/users/daniel-e/orgs',
    'repos_url': 'https://api.github.com/

This looks exactly like the response that we got when we tried this in Postman. The difference is that `data` is now a Python data structure. In fact, in this case it is a dictionary, and as such we can simply process the data it contains as we would normally process a dictionary of data. Let's see a few examples.

In [16]:
data["total_count"]

318

In [15]:
# List all the keys in the dictionary
print(data.keys())

# What is the value of total_count?
print(f'Total number of repositories matching query is {data["total_count"]}')

# how many items in the items array
print(f'The length of the items array is {len(data["items"])}')

# get the first item from the array
item1 = data['items'][0]

# What are the keys of the item?
print(item1.keys())



dict_keys(['total_count', 'incomplete_results', 'items'])
Total number of repositories matching query is 318
The length of the items array is 30
dict_keys(['id', 'node_id', 'name', 'full_name', 'private', 'owner', 'html_url', 'description', 'fork', 'url', 'forks_url', 'keys_url', 'collaborators_url', 'teams_url', 'hooks_url', 'issue_events_url', 'events_url', 'assignees_url', 'branches_url', 'tags_url', 'blobs_url', 'git_tags_url', 'git_refs_url', 'trees_url', 'statuses_url', 'languages_url', 'stargazers_url', 'contributors_url', 'subscribers_url', 'subscription_url', 'commits_url', 'git_commits_url', 'comments_url', 'issue_comment_url', 'contents_url', 'compare_url', 'merges_url', 'archive_url', 'downloads_url', 'issues_url', 'pulls_url', 'milestones_url', 'notifications_url', 'labels_url', 'releases_url', 'deployments_url', 'created_at', 'updated_at', 'pushed_at', 'git_url', 'ssh_url', 'clone_url', 'svn_url', 'homepage', 'size', 'stargazers_count', 'watchers_count', 'language', 'has_

Before we dive further into processing the data, it might be instructive to see how a failed request might behave. This is easy to do, we just have to break the URL. Or break one of the rules of the request itself.

In [9]:
broken_url = 'http://api.github.com/this_is_broken'
broken_response = requests.get(broken_url)
print(broken_response.ok)
print(broken_response.status_code)

False
404


If you try to get content from this response, it will not break the code because GitHub actually responds with some JSON even to the invalid request. This behaviour may not be the same in other APIs. 

In [10]:
some_data = broken_response.json()
some_data

{'message': 'Not Found',
 'documentation_url': 'https://developer.github.com/v3'}

## openFDA

Great. We've seen how to make an API request. Now, let's see another example.

You are working with a company that is considering investing in different food products. As part of their research, they want learn more about product recalls. The Food and Drug Administration has an API that provides [recall data](https://open.fda.gov/apis/food/enforcement/) about food recalls. You have been tasked with getting some data from this API and presenting some summary information about food recalls for the last year.

Visit the documentation and read about this API endpoint. The first thing to note is that no API key is needed. There are several ways to search the data, all of which are covered in the documentation. 


### The endpoint

The [documentation](https://open.fda.gov/apis/food/enforcement/how-to-use-the-endpoint/) for the recall endpoint explains how to make a call and what to expect. The base url is `https://api.fda.gov/food/enforcement.json`. We can set a limit of 99, the maximum allowed and perform a search within a particular date range.

To specify a date range, use the `report_date` field and set the start and end date of the range.


In [None]:
import requests

url = 'https://api.fda.gov/food/enforcement.json?search=report_date:[20180101+TO+20181231]'
query = {'limit': 99}
response = requests.get(url, query)
response.ok

That seemed to work. Lets grab some data from the response and process it. We know from the documentation that the data has this format:

```json
{
  "meta": {
    "disclaimer": "Do not rely on openFDA to make decisions regarding medical care. While we make every effort to ensure that data is accurate, you should assume all results are unvalidated. We may limit or otherwise restrict your access to the API in line with our Terms of Service.",
    "terms": "https://open.fda.gov/terms/",
    "license": "https://open.fda.gov/license/",
    "last_updated": "2019-02-23",
    "results": {
      "skip": 0,
      "limit": 1,
      "total": 4024
    }
  },
  "results": [
    {
      "country": "United States",
      "city": "Medford",
      "reason_for_recall": "Product received from supplier is being recalled due to the potential to be contaminated with Salmonella",
      "address_1": "2500 S Pacific Hwy",
      "address_2": "",
      "code_info": "Best By Date on the recalled nut products - 01MAY11 through 24SEPT13.    The lot code format for the baskets is DDDYM(H or C), with DDD representing the Julian date, Y representing the year, and letter M, H or C representing the production facility, printed on the shipping container.  The affected lots would have been between 1460M(H or C) to 2682M(H or C).",
      "product_quantity": "15,264 - 12 oz. jars",
      "center_classification_date": "20121026",
      "distribution_pattern": "Nationwide and Canada through online ordering www.harryanddavid.com/h/home and through retail stores located throught the U.S.",
      "state": "OR",
      "product_description": "Harry & David Creamy Raspberry Peanut Spread, 12 oz. jars, labeled in part: \"HARRY & DAVID CREAMY RASPBERRY PEANUT SPREAD***NET WT. 12 OZ. (340g)***INGREDIENTS: ROASTED PEANUTS, SUGAR, RASPBERRY COMPOUND***MADE FOR: HARRY AND DAVID MEDFORD, OR 97501***7 80994 73872 0***\"    The 12 oz. jars are sold individually and also as gift add-ons for gift baskets.     The gift baskets are listed below:    Harry & David Apple Snack Box;  Wolferman's Bee Sweet Gift Basket;  Wolferman's Hearty Snack Gift Basket;  Wolferman's All-Day Assortment Gift Basket;  Wolferman's Fathers Day Basket",
      "report_date": "20121107",
      "classification": "Class I",
      "openfda": {},
      "recall_number": "F-0562-2013",
      "recalling_firm": "Harry and David Operations, Inc.",
      "initial_firm_notification": "Two or more of the following: Email, Fax, Letter, Press Release, Telephone, Visit",
      "event_id": "63306",
      "product_type": "Food",
      "termination_date": "20130314",
      "more_code_info": null,
      "recall_initiation_date": "20120927",
      "postal_code": "97501-8724",
      "voluntary_mandated": "Voluntary: Firm Initiated",
      "status": "Terminated"
    }
  ]
}
```

By examining that JSON data carefully, it can be seen that there are two keys: *meta* and *results*. Meta contains some details about the search query itself, a disclaimer, the total number of results, the number skipped, and how many we returned. results is an array of objects, where each object has details of a particular incident. We are only interested in a subset of this data so we can iterate over the results and construct a simpler dictionary of results.

In [0]:
# make sure we got a valid response
if(response.ok):
  # get the full data from the response
  data = response.json()
  # get just the results
  raw_incidents = data['results']
  #iterate over the results and only grab the properties that we are interested in
  incidents = [{
      'city':incident['city'],
      'state': incident['state'],
      'reason': incident['reason_for_recall'],
      'date': incident['report_date'],
      'company': incident['recalling_firm'],
      'product_type': incident['product_type'],
      'postal_code': incident['postal_code']
  } for incident in raw_incidents]
  # print 5 items to see if this worked
  print(incidents[:5])

[{'city': 'Minneapolis', 'state': 'MN', 'reason': 'Shipping container from CA to HI was not held at proper temperature which could cause food items to be contaminated with spoilage organisms or pathogens', 'date': '20180509', 'company': 'Target Corporation', 'product_type': 'Food', 'postal_code': '55402-3601'}, {'city': 'Oakland', 'state': 'CA', 'reason': 'Sales Team employee noticed that some cartons of 48-fl oz. cartons of Dreyers Slow Churned Vanilla Bean Ice Cream contain Butter Pecan ice cream an have a Dreyers Slow Churned Butter Pecan lid.', 'date': '20180404', 'company': 'Nestle Dreyers Ice Cream Company', 'product_type': 'Food', 'postal_code': '94618-1325'}, {'city': 'City of Industry', 'state': 'CA', 'reason': 'Investigation of consumer complaint found that two products had undeclared wheat as flour in an ingredient and was not listed in the ingredient statement.', 'date': '20181219', 'company': 'Gemini Food Corporation Inc', 'product_type': 'Food', 'postal_code': '91789-5213

Of course, we only got the first 99 results. In order to get all the results, we will need to make several requests to the server. In the second call, we can skip the first 99 results, then in the third call skip the previous 198 results and so on, until we get all the results. In the following code, a number of print statements were added to illustrate the process as the code executes.

In [None]:
# define a function that will process the data for us
def process_data(raw_incidents):
  #iterate the results and only grab the properties that we are interetsed in
  return [{
      'city':incident['city'],
      'state': incident['state'],
      'reason': incident['reason_for_recall'],
      'date': incident['report_date'],
      'company': incident['recalling_firm'],
      'product_type': incident['product_type'],
      'postal_code': incident['postal_code']
  } for incident in raw_incidents]
  

# declare a list to store all results
incidents = []

# declare variables to track skip amount
skip = 0
limit = 99

# make an initial call
url = 'https://api.fda.gov/food/enforcement.json?search=report_date:[20180101+TO+20181231]'
query = {'limit': limit, 'skip': skip}
response = requests.get(url, query)
print('Querying {}'.format(response.url))

# make sure we got a valid response
if(response.ok):
  # get the full data from the response
  data = response.json()
  
  # get the meta data
  meta_data = data['meta']

  total = meta_data['results']['total']
  print('There is a total of {} results to fetch'.format(total))
  
  # process the results we have so far
  incidents = process_data(data['results'])
  print('{} results processed so far'.format(len(incidents)))
  
  # increment skip

  skip = skip + limit
  
  while skip < total:
    query = {'limit': limit, 'skip': skip}
    response = requests.get(url, query)
    print('Querying {}'.format(response.url))
    if(response.ok):
      #  now incidents will be the old values plus the new ones      
      incidents = incidents + process_data(response.json()['results'])
      print('{} results processed so far'.format(len(incidents)))
      # increment skip
      skip = skip + limit
      
print('{} results returned'.format(len(incidents)))      

At this point, we have a list of data from the API. This module will not go into the details of processing that data. In the next module, we will look at data analysis.