# Geocoding with Python

Agenda:
- Geocoding addresses to latitude/longitude
- Exploring locations with the Google Places API
- Reverse geocoding latitude/longitude to an address
- Reverse geocoding latitude/longitude to block FIPS code

**You'll need a Google API key to use the Google Maps Geocoding API and the Google Places API Web Service:**
1. Go to https://console.developers.google.com/project and sign in
1. Create a new project and call it cp255, then click create
1. On the screen with all the APIs listed, click "Google Places API Web Service" under Google Maps APIs, then click the Enable API button
1. Go to Credentials and click create credentials, choose API Key
1. Copy your API key when it is displayed, then create keys.py with the line `google_maps_api_key='YOUR-KEY-HERE'`

In [None]:
import pandas as pd, requests, time
from geopy.geocoders import GoogleV3

# import my api key saved in a local file i called keys.py
from keys import google_maps_api_key

In [None]:
# set the pause duration between api requests
pause = 0.1

## Part 1: Geocoding addresses to lat-long

We will use the Google Maps geocoding API. You don't need an API key for this.
- Documentation: https://developers.google.com/maps/documentation/geocoding/intro
- Example request: http://maps.googleapis.com/maps/api/geocode/json?address=350+5th+Ave,+New+York,+NY+10118&sensor=false

In [None]:
locations = pd.DataFrame()
locations['address'] = ['350 5th Ave, New York, NY 10118',
                        '100 Larkin St, San Francisco, CA 94102',
                        'Wurster Hall, Berkeley, CA']
locations

In [None]:
# function that accepts an address string, sends it to the Google API, and returns the lat-long API result
def geocode(address):
    time.sleep(pause) #pause for some duration before each request, to not hammer their server
    url = 'http://maps.googleapis.com/maps/api/geocode/json?address={}&sensor=false' #api url with placeholders
    request = url.format(address) #fill in the placeholder with a variable
    response = requests.get(request) #send the request to the server and get the response
    data = response.json() #convert the response json string into a dict
    
    if len(data['results']) > 0: #if google was able to geolocate our address, extract lat-long from result
        latitude = data['results'][0]['geometry']['location']['lat']
        longitude = data['results'][0]['geometry']['location']['lng']
        return '{},{}'.format(latitude, longitude) #return lat-long as a string in the format google likes

In [None]:
# for each value in the address column, geocode it, save results as new df column
locations['latlng'] = locations['address'].map(geocode)
locations

In [None]:
# parse the result into separate lat and lon columns for easy mapping
locations['latitude'] = locations['latlng'].map(lambda x: x.split(',')[0])
locations['longitude'] = locations['latlng'].map(lambda x: x.split(',')[1])
locations

## Part 2: Google Places API

We will use Google's Places API to look up places in the vicinity of some location. You need an API key for this.
- Documentation: https://developers.google.com/places/
- Example request: https://maps.googleapis.com/maps/api/place/search/json?keyword=coffee&location=37.8683811,-122.2589063&radius=1000&sensor=false&key=YOUR-KEY-HERE

In [None]:
# google places api url, with placeholders
url = 'https://maps.googleapis.com/maps/api/place/search/json?keyword={}&location={}&radius={}&key={}&sensor=false'

# what keyword to search for
keyword = 'restaurant'

# define the radius (in meters) for the search
radius = 1000

# define the location coordinates (of wurster hall)
location = locations.loc[2, 'latlng']
location

In [None]:
# add our variables into the url, submit the request to the api, and load the response
request = url.format(keyword, location, radius, google_maps_api_key)
response = requests.get(request)
data = response.json()

In [None]:
places = pd.DataFrame(data['results'])
places = places[['name', 'geometry', 'rating', 'vicinity']]
places.head()

In [None]:
# parse out lat-long and return it as a series -> this creates a dataframe of all the results when you .apply()
def parse_coords(geometry):
    if isinstance(geometry, dict):
        lng = geometry['location']['lng']
        lat = geometry['location']['lat']
        return pd.Series({'latitude':lat, 'longitude':lng})
    
# test our function
places['geometry'].head().apply(parse_coords)

In [None]:
# now run our function on the whole dataframe and SAVE THE OUTPUT to 2 new dataframe columns
places[['latitude', 'longitude']] = places['geometry'].apply(parse_coords)
places_clean = places.drop('geometry', axis=1)
places_clean.sort_values(by='rating', ascending=False).head()

## Part 3: Reverse geocoding (address lookup)

We'll use Google's reverse geocoding API.
- Documentation: https://developers.google.com/maps/documentation/geocoding/intro#ReverseGeocoding
- Example request: https://maps.googleapis.com/maps/api/geocode/json?latlng=34.537094,-82.630303

You can do this manually, just like in the previous two sections, but it's a little more complicated to parse Google's address components results. If we just want addresses, we can use [geopy](https://geopy.readthedocs.org/en/release-0.96.3/#geopy.geocoders.GoogleV3) to simply call Google's API automatically for us.

In [None]:
# load usa point data and keep only the first 5 rows
usa = pd.read_csv('data/usa-latlong.csv')
usa = usa.head()
usa

In [None]:
# create a column to put lat-long into the format google likes - this just makes it easier to call their API
usa['latlng'] = usa.apply(lambda row: '{},{}'.format(row['latitude'], row['longitude']), axis=1)
usa

In [None]:
# tell geopy to reverse geocode some lat-long string using Google's API and return the address
def reverse_geopy(latlng):
    time.sleep(pause)
    geolocator = GoogleV3()
    address, _ = geolocator.reverse(latlng, exactly_one=True)
    return address

usa['address'] = usa['latlng'].map(reverse_geopy)
usa

#### What if you just want the city or state?
You could try to parse the address strings, but you're relying on them always having a consistent format. This might not be the case if you have international location data. In this case, you should call the API manually and extract the individual address components you are interested in.

In [None]:
# pass the Google API latlng data to reverse geocode it
def reverse_geocode(latlng):
    time.sleep(pause)
    url = 'https://maps.googleapis.com/maps/api/geocode/json?latlng={}'
    request = url.format(latlng)
    response = requests.get(request)
    data = response.json()
    if len(data['results']) > 0:
        return data['results'][0] #if we got results, return the first result
    
geocode_results = usa['latlng'].map(reverse_geocode)

Now look inside each reverse geocode result to see if address_components exists. If it does, look inside each component to see if we can find the city or the state. Google calls the city name by the abstract term 'locality' and the state name by the abstract term 'administrative_area_level_1' ...this just lets them use the same terminology anywhere in the world.

In [None]:
def get_city(geocode_result):
     if 'address_components' in geocode_result:
        for address_component in geocode_result['address_components']:
            if 'locality' in address_component['types']:
                return address_component['long_name']
                
def get_state(geocode_result):
     if 'address_components' in geocode_result:
        for address_component in geocode_result['address_components']:
            if 'administrative_area_level_1' in address_component['types']:
                return address_component['long_name']

In [None]:
# now map our functions to extract city and state names
usa['city'] = geocode_results.map(get_city)                
usa['state'] = geocode_results.map(get_state)
usa

## Part 4: Reverse geocoding to FIPS

We'll use the FCC's (very slow) Census Block Conversions API to turn lat/long into a block FIPS code. FIPS codes contain from left to right: the location's 2-digit state code, 3-digit county code, 6-digit census tract code, and 4-digit census block code (the first digit of which is the census block group code). Now you can join your data to tract (etc) level census data without doing a spatial join.

- Documentation: https://www.fcc.gov/developers/census-block-conversions-api
- Example request: http://data.fcc.gov/api/block/find?format=json&latitude=37.861055&longitude=-122.256463

In [None]:
# pass the FCC API lat/long and get FIPS data back - return block fips and county name
def get_fips(row):
    time.sleep(pause)
    url = 'http://data.fcc.gov/api/block/find?format=json&latitude={}&longitude={}'
    request = url.format(row['latitude'], row['longitude'])
    response = requests.get(request)
    data = response.json()
    
    # return multiple values as a series - this will create a dataframe with multiple columns
    return pd.Series({'fips_code':data['Block']['FIPS'], 'county':data['County']['name']})

In [None]:
# get block fips code and county name from FCC as new dataframe, then concatenate to join them
fips = usa.apply(get_fips, axis=1)
usa = pd.concat([usa, fips], axis=1)
usa