# Geocoding through APIs

## Install what needs installing

We're installing two packages: [Geocoder](https://geocoder.readthedocs.io/), which provides shortcuts to geocoding services, and [tqdm](https://github.com/tqdm/tqdm), which allows you to have beautiful perfect progress bars in both the terminal and in notebooks.

In [None]:
!pip install geocoder tqdm==4.42.1

## Download our datasets

In [None]:
!curl https://raw.githubusercontent.com/jsoma/NICAR20-geocoding-apis/master/addresses.csv -o addresses.csv
!curl https://raw.githubusercontent.com/jsoma/NICAR20-geocoding-apis/master/pharma.csv -o pharma.csv

## Read in our addresses

We have a list of variously-strange addresses in `addresses.csv`. Because we're Python professionals, we'll use pandas to open it up.

In [None]:
import requests
import pandas as pd
pd.set_option("display.max_columns", 100)

df = pd.read_csv("addresses.csv")
df

# Census Geocoder

Our first target is the [Census Geocoder](https://geocoding.geo.census.gov/geocoder/locations/onelineaddress?form). We're using the "one line" version instead of the "address" version because our data isn't split up into street number/street name/city/etc.

## Making a simple request

One way to deal with most of these services is by going out and grabbing the web page. The result comes back as JSON. Our url is [this one here](https://geocoding.geo.census.gov/geocoder/locations/onelineaddress?address=555+Canal+St,+New+Orleans,+LA+70130&benchmark=Public_AR_Current&format=json).

In [None]:
url = "https://geocoding.geo.census.gov/geocoder/locations/onelineaddress?address=555+Canal+St,+New+Orleans,+LA+70130&benchmark=Public_AR_Current&format=json"
response = requests.get(url)
response.json()

We're going to stay pretty simple, but if you wanted to get crazy you could use the **geographies** endpoint instead of the **location** endpoint.

In [None]:
url = "https://geocoding.geo.census.gov/geocoder/geographies/onelineaddress?address=555+Canal+St,+New+Orleans,+LA+70130&benchmark=Public_AR_Current&vintage=Current_Current&format=json"
response = requests.get(url)
response.json()

## Making a readable request

I really don't like that whole `&blah&blah&blah` thing, so let's do it again but make it a little easier to read.

In [None]:
params = {
    'address': '555 Canal St, New Orleans, LA 70130',
    'benchmark': 'Public_AR_Current',
    'format': 'json'
}
url = "https://geocoding.geo.census.gov/geocoder/locations/onelineaddress"
response = requests.get(url, params=params)
response.json()

### Let's do something nice with those results

There's a lot going on in those results! Let's just pull out exactly what we need.

In [None]:
data = response.json()
matches = data['result']['addressMatches']

match = matches[0]
result = {
    'match_count': len(matches),
    'matches_address': match['matchedAddress'],
    'longitude': match['coordinates']['x'],
    'latitude': match['coordinates']['y']
}

result

### But what if we want ALL of that data?

If we're too lazy to dig through for exactly what we want, why not just take it all? Maybe it'll be useful someday!

In [None]:
from pandas.io.json import json_normalize

json_normalize(match, sep='_')

## Getting fancy with `.apply` and `tqdm`

In [None]:
# Combine all of the above
def census_geocode(row):    
    # Make the request
    params = {
        'address': row['full_address'],
        'benchmark': 'Public_AR_Current',
        'format': 'json'
    }
    url = "https://geocoding.geo.census.gov/geocoder/locations/onelineaddress"
    response = requests.get(url, params=params)

    # Find the matches
    data = response.json()
    matches = data['result']['addressMatches']

    try:
        # Grab the first (best?) match, combine it with the original
        match = matches[0]
        return {**row, **match}
    except:
        # If we run into an error ,just return the original
        return {**row}

In [None]:
from tqdm import tqdm
tqdm.pandas()

# geocoded = df.progress_apply(census_geocode, axis=1)
geocoded = df.apply(census_geocode, axis=1)
geocoded

We can use `json_normalize` to make those results a beautiful DataFrame!

In [None]:
json_normalize(geocoded, sep='_')

## Using geocoder

Life gets a lot lot lot easier if we use **geocoder** instead of separate http requests!

In [None]:
import geocoder

g = geocoder.uscensus('555 Canal St, New Orleans, LA 70130')
g

What's the latitude and longitude?

In [None]:
g.latlng

Let's just check the whole result out as JSON.

In [None]:
g.json

If we have JSON, we can use a function + `.apply` + `json_normalize` just like before to make a nice and easy and clean and nice and perfec approach to geocoding all of addresses!

## Using `.apply` with the Census Bureau and geocoder

In [None]:
def census_geocode(row):
    # Geocode the address
    address = row['full_address']
    g = geocoder.uscensus(address)

    # Combine the original data with the geocoded data
    combined = {**row, 'geo': g.json}

    return combined

In [None]:
# We already ran this once, but I'll leave it here for reference
# from tqdm import tqdm
# tqdm.pandas()

# Geocode it
# Use this if you aren't on Google Colab
# geocoded = df.progress_apply(census_geocode, axis=1)
geocoded = df.apply(census_geocode, axis=1)

# Turn it into something that looks nice
geocoded = json_normalize(geocoded, sep='_')

# Look at it
geocoded

Even though the code it nice, it sure misses a lot. What else can we try?

# HERE Geocoder

The HERE geocoder doesn't work with geocoder (sorry!) but we can make do with the HTTP version.

In [None]:
# https://geocode.search.hereapi.com/v1/geocode?q=5+Rue+Daunou%2C+75000+Paris%2C+France&apikey=YOUR_API_KEY
import requests

params = {
    'q': '555 Canal St, New Orleans, LA 70130',
    'apikey': 'HERE_API_KEY'
}

response = requests.get('https://geocode.search.hereapi.com/v1/geocode', params=params)
response.json()

Let's just grab the one result we're interested in.

In [None]:
results = response.json()['items']
results[0]

What would this look like as a function we could use with `.apply`?

## Using `.apply` with HERE

In [None]:
def here_geocode(row):
    # Pull out the address
    address = row['full_address']
    
    # Make the request
    params = {
        'q': address,
        'apikey': 'HERE_API_KEY'
    }

    response = requests.get('https://geocode.search.hereapi.com/v1/geocode', params=params)
    results = response.json()['items']
    
    # Is there more than one result?
    if len(results) > 0:
        result = results[0]

        # Combine the original data with the geocoded data
        combined = {**row, 'geo': result}
    else:
        combined = {**row}

    return combined

In [None]:
# We already ran this once, but I'll leave it here for reference
# from tqdm import tqdm
# tqdm.pandas()

# Geocode it
# Use this if you aren't on Google Colab
# geocoded = df.progress_apply(here_geocode, axis=1)
geocoded = df.apply(here_geocode, axis=1)

# Turn it into something that looks nice
geocoded = json_normalize(geocoded, sep='_')

# Look at it
geocoded

# Google

Google's geocoder is the real kingpin, although it unfortunate has a *lot* of restrictions on use. Let's put it to use with geocoder.

In [None]:
import geocoder

g = geocoder.google('555 Canal St, New Orleans, LA 70130', key='GOOGLE_API_KEY')
g

In [None]:
g.json

Hm, how many columns is that going to be? Let's use `json_normalize` to see.

In [None]:
json_normalize(g.json)

## Using `.apply` with Google and geocoder

In [None]:
def google_geocode(row):
    # Geocode the address
    address = row['full_address']
    g = geocoder.google(address, key='GOOGLE_API_KEY')

    # Combine the original data with the geocoded data
    combined = {**row, 'geo': g.json}

    return combined

In [None]:
# We already ran this once, but I'll leave it here for reference
# from tqdm import tqdm
# tqdm.pandas()

# Geocode it
# Use this if you aren't on Google Colab
# geocoded = df.progress_apply(google_geocode, axis=1)
geocoded = df.apply(google_geocode, axis=1)

# Turn it into something that looks nice
geocoded = json_normalize(geocoded, sep='_')

# Look at it
geocoded

# Try it on another dataset!

We'll use it on a subset of [Joe Fox's pharmacies dataset](https://github.com/joemfox/NICAR20-geocoding). You can use this section as the reference point for code to steal in the future!

In [None]:
pharma = pd.read_csv("pharma.csv")
pharma.head()

Take note that **we need to change something** about `google_geocode` to make it work with this new dataset. What's different?

In [None]:
def google_geocode(row):
    # Geocode the address
    address = row['address']
    g = geocoder.google(address, key='GOOGLE_API_KEY')

    # Combine the original data with the geocoded data
    combined = {**row, 'geo': g.json}

    return combined

In [None]:
# We already ran this once, but I'll leave it here for reference
# from tqdm import tqdm
# tqdm.pandas()

# Geocode it
# Use this if you aren't on Google Colab
# geocoded = pharma.progress_apply(google_geocode, axis=1)
geocoded = pharma.apply(google_geocode, axis=1)

# Turn it into something that looks nice
geocoded = json_normalize(geocoded, sep='_')

# Look at it
geocoded

And now we might as well save it.

In [None]:
geocoded.to_csv("pharma_geocoded.csv", index=False)