# Interacting with APIs in Python

Chances are you've probably heard the acronym "API" before. But if you've never used one in your work, it may not be entirely clear up front why you'd want to. So what's the problem we're trying to solve with APIs anyway?

## 0. Why APIs?

Consider the following scenarios:

1. We have a database of several thousand places which we'd like to display on a map, but we only have their addresses. In order to display them on the map, we need their latitude/longitude values. (For the uninitiated, this process is known as "geocoding".)
2. With the goal of analyzing sentiment or perhaps the relative frequency of descriptors of various candidates, we would like to find all New York Times articles relating to the 2020 election. Perhaps we'd additionally like to collect relevant tweets about the candidates.

In either case, we can imagine what the naïve solution might look like, done by hand:

1. Fire up a browser, open Google Maps, type in the first address, copy/paste the lat/lng into the database (i.e. spreadsheet); move on to the second address and repeat the process, and so on until we're done.
2. Navigate to the New York Times website and enter "2020 election" in the search bar, click on each article in the search results, determine that the article is in fact relevant, save the article somewhere, and move on to the next one, and so on until we're pretty sure we've got all relevant articles.

This is terrible. It looks like we've lined up some seriously painstaking work for ourselves. There must be a way to automate this stuff...

Enter the **Application Programming Interface (API)**. If someone (or some corporate entity---a needless distinction, as corporations are people, after all) has already created a service that solves our underlying need, e.g. Google Maps or the NYTimes search bar, then there's a good chance in this day and age that that service offers an API, which is essentially a tightly controlled, programmatic way to interact with the service.

One of the most obvious benefits is that we can automate away the tedium of doing the same thing many times. Another benefit, more relevant to application developers, is the ability to build "on top of" someone else's API in order to focus on a specific thing we want to accomplish. Consider, for example, building a real-time logistics app that displays the current location of our fleet of vehicles; all we have are the real-time GPS coordinates, so we use another service (e.g. Mapbox or Google/Apple/Bing Maps) for displaying the underlying roads and geographic data.

Hopefully by this point you're convinced of the potential utility of APIs. Let's get started!

## 1. Unauthenticated API: Geocoding

Continuing with the first scenario mentioned in the introduction above, let's suppose we have a set of addresses, and we need the latitude/longitude pairs.

We *could* use the Google Maps Geocoding API for this, but today (and this wasn't the case just a few years back) the service requires you to set up a billing account with a valid credit card before you can use the API, and i'd rather not get bogged down by that issue right now.

So instead, let's use the [US Census Geocoding API](https://geocoding.geo.census.gov/geocoder/Geocoding_Services_API.html). The results won't be quite as high-quality (interpolated vs. rooftop), but the service is public and free, and illustrates the point.

### 1.1. A first pass

At the end of the day, using this API boils down to figuring out the right URL and parameters to send an HTTP request to. Let's take the address of Huang Engineering Center as an example: `475 Via Ortega, Stanford, CA 94305`. After reading the instructions on how to use the census geocoding API, we determine that one way to make this request is:

https://geocoding.geo.census.gov/geocoder/locations/onelineaddress?benchmark=Public_AR_Current&address=475+Via+Ortega,Stanford,CA&format=json

Open the link in the browser just to get a peek. Next, we'll make this request programmatically.

(We won't talk about the `benchmark` argument for the purposes of this tutorial, but we'll touch on the `format` argument later. The `address` is the interesting thing here.)

To make HTTP requests from Python, we'll use the `requests` library ([docs](https://2.python-requests.org/en/master/)), which is not part of the standard library but makes web requests very simple. (This will come in handy when scraping later.) We'll also need `json` from the standard library, for making sense of the response data.

In [1]:
!pip3 install requests
import requests
import json

[33mYou are using pip version 19.0.3, however version 19.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


Okay, let's do precisely what we just did in the browser, but with the `requests` library:

In [2]:
response = requests.get('https://geocoding.geo.census.gov/geocoder/locations/onelineaddress?benchmark=Public_AR_Current&address=475+Via+Ortega,Stanford,CA&format=json')
response

<Response [200]>

Well, we know that HTTP 200 means "success", but how do we get the data we want out of this response object? We need to inspect the `text` field:

In [3]:
response.text

'{"result":{"input":{"benchmark":{"id":"4","benchmarkName":"Public_AR_Current","benchmarkDescription":"Public Address Ranges - Current Benchmark","isDefault":false},"address":{"address":"475 Via Ortega,Stanford,CA"}},"addressMatches":[{"matchedAddress":"475 VIA ORTEGA, STANFORD, CA, 94305","coordinates":{"x":-122.17598,"y":37.428837},"tigerLine":{"tigerLineId":"122949876","side":"R"},"addressComponents":{"fromAddress":"401","toAddress":"499","preQualifier":"","preDirection":"","preType":"","streetName":"VIA ORTEGA","suffixType":"","suffixDirection":"","suffixQualifier":"","city":"STANFORD","state":"CA","zip":"94305"}}]}}'

Ah, the lat/lng is in there, buried inside a [JSON](http://json.org)-formatted string. The details of JSON aren't important for this lesson. Suffice it to say that if someone (i.e. API) hands you data in JSON form, you can parse it in almost any language (e.g. `json.loads` in Python) and then it becomes an object native to that language.

JSON is an extremely common response format in APIs, so it's worth getting familiar with it.

Let's parse the response text:

In [4]:
results = json.loads(response.text)
results

{'result': {'input': {'benchmark': {'id': '4',
    'benchmarkName': 'Public_AR_Current',
    'benchmarkDescription': 'Public Address Ranges - Current Benchmark',
    'isDefault': False},
   'address': {'address': '475 Via Ortega,Stanford,CA'}},
  'addressMatches': [{'matchedAddress': '475 VIA ORTEGA, STANFORD, CA, 94305',
    'coordinates': {'x': -122.17598, 'y': 37.428837},
    'tigerLine': {'tigerLineId': '122949876', 'side': 'R'},
    'addressComponents': {'fromAddress': '401',
     'toAddress': '499',
     'preQualifier': '',
     'preDirection': '',
     'preType': '',
     'streetName': 'VIA ORTEGA',
     'suffixType': '',
     'suffixDirection': '',
     'suffixQualifier': '',
     'city': 'STANFORD',
     'state': 'CA',
     'zip': '94305'}}]}}

Hey, that's just a Python dictionary, which you all know and love by now! Here, i'll prove it to you:

In [5]:
type(results)

dict

Notice that the lat/lng pair we're interested in is buried a few levels deep:

In [6]:
results['result']['addressMatches'][0]['coordinates']

{'x': -122.17598, 'y': 37.428837}

Understandably, `y` is latitude and `x` is longitude.

### 1.2. Tightening things up

Let's go back and take another look at that initial request:

https://geocoding.geo.census.gov/geocoder/locations/onelineaddress?benchmark=Public_AR_Current&address=475+Via+Ortega,Stanford,CA&format=json

The `requests` library provides a way to break this up in a more semantically meaningful way that we'll see comes in handy later. Everything after the `?` in the URL are the *parameters*, which can be passed as an argument to the `params` instead of pasted into the URL. The params usually contain the interesting part of the API request, so it's handy to have them better structured, like so:

In [7]:
response = requests.get(
  'https://geocoding.geo.census.gov/geocoder/locations/onelineaddress',
  params={
    'address': '475 Via Ortega, Stanford, CA',
    'benchmark': 'Public_AR_Current',
    'format': 'json',
  })

Looks nicer, right? As before, we can get at the lat/lng as follows:

In [8]:
json.loads(response.text)['result']['addressMatches'][0]['coordinates']

{'x': -122.17598, 'y': 37.428837}

Great, this is looking good. We now have a programmatic way of punching in an address and getting back some coordinates!

We can do even better, though. Up to this point, we've been using the `onelineaddress` search type. After perusing the docs further, we decide it might be more robust if we were to use the `address` search type, which takes a *structured* address. This is a way of being more explicit about what are looking for, rather than leaving interpretation of the text string up to the API. So let's do the same search again, but in a structured manner, using the `address` endpoint:

In [9]:
response = requests.get(
  'https://geocoding.geo.census.gov/geocoder/locations/address',
  params={
    'street': '475 Via Ortega',
    'city': 'Stanford',
    'state': 'CA',
    'zip': '94305',
    'benchmark': 'Public_AR_Current',
    'format': 'json',
  })
json.loads(response.text)['result']['addressMatches'][0]['coordinates']

{'x': -122.17598, 'y': 37.428837}

Okay, we've been using this API enough that it's probably worth putting this all in a function that accepts only the pieces that are changing and grabs just the coordinates from the response:

In [10]:
def geocode(structured_address):
  params = {
    'benchmark': 'Public_AR_Current',
    'format': 'json',
  }
  params.update(structured_address)
  response = requests.get(
      'https://geocoding.geo.census.gov/geocoder/locations/address',
      params=params)
  matches = json.loads(response.text)['result']['addressMatches']
  if matches:
    return matches[0]['coordinates']

Now it's a little easier to perform geocoding:

In [11]:
geocode({'street': '475 Via Ortega', 'city': 'Stanford', 'state': 'CA', 'zip': '94305'})

{'x': -122.17598, 'y': 37.428837}

Your turn! Geocode the Googleplex, `1600 Amphitheatre Pkwy, Mountain View, CA 94043`:

You should get `{'x': -122.086815, 'y': 37.42353}`.

## 2. Authenticated API

*TODO(sjespers)*: Thinking NYTimes. Can i really get people to sign up for a token?