# Geocoding with Python

Agenda:
- Geocoding addresses to latitude/longitude
- Exploring locations with the Google Places API
- Reverse geocoding latitude/longitude to an address
- Reverse geocoding latitude/longitude to block FIPS code

**You'll need a Google API key to use the Google Maps Geocoding API and the Google Places API Web Service:**
1. Go to https://console.developers.google.com/project and sign in
1. Create a new (empty) project called cp255, then click Enable APIs
1. On the screen with all the APIs listed:
  1. click "Google Maps Geocoding API" under Google Maps APIs, then click the Enable API button
  1. click "Google Places API Web Service" under Google Maps APIs, then click the Enable API button
1. Go to Credentials (on the left) and click the Add Credentials button, choose API Key, then choose Browser Key
1. Give it a name like cp255-key, hit create, then copy and save your API key when it is displayed

In [1]:
import pandas as pd, json, requests, time
from geopy.geocoders import GoogleV3

In [2]:
# set your google api key here
api_key = ''

# set the pause duration
pause = 0.1

## Part 1: Geocoding addresses to lat-long

We will use the Google Maps geocoding API. You don't need an API key for this.
- Documentation: https://developers.google.com/maps/documentation/geocoding/intro
- Example request: http://maps.googleapis.com/maps/api/geocode/json?address=350+5th+Ave,+New+York,+NY+10118&sensor=false

In [3]:
locations = pd.DataFrame()
locations['address'] = ['350 5th Ave, New York, NY 10118',
                        '100 Larkin St, San Francisco, CA 94102',
                        'Wurster Hall, Berkeley, CA']
locations

Unnamed: 0,address
0,"350 5th Ave, New York, NY 10118"
1,"100 Larkin St, San Francisco, CA 94102"
2,"Wurster Hall, Berkeley, CA"


In [4]:
# function that accepts an address string, sends it to the Google API, and returns the lat-long API result
def geocode(address):
    time.sleep(pause) #pause for some duration before each request, to not hammer their server
    url = 'http://maps.googleapis.com/maps/api/geocode/json?address={}&sensor=false' #api url with placeholders
    request = url.format(address) #fill in the placeholder with a variable
    response = requests.get(request) #send the request to the server and get the response
    data = json.loads(response.text) #convert the response json string into a dict
    
    if len(data['results']) > 0: #if google was able to geolocate our address, extract lat-long from result
        latitude = data['results'][0]['geometry']['location']['lat']
        longitude = data['results'][0]['geometry']['location']['lng']
        return '{},{}'.format(latitude, longitude) #return lat-long as a string in the format google likes

In [5]:
# for each value in the address column, geocode it
locations['latlng'] = locations['address'].map(lambda x: geocode(x))
locations

Unnamed: 0,address,latlng
0,"350 5th Ave, New York, NY 10118","40.748384,-73.9854792"
1,"100 Larkin St, San Francisco, CA 94102","37.779159,-122.415808"
2,"Wurster Hall, Berkeley, CA","37.8707352,-122.2548935"


In [6]:
# parse the result into separate lat and lon columns for easy mapping
locations['latitude'] = locations['latlng'].map(lambda x: x.split(',')[0])
locations['longitude'] = locations['latlng'].map(lambda x: x.split(',')[1])
locations

Unnamed: 0,address,latlng,latitude,longitude
0,"350 5th Ave, New York, NY 10118","40.748384,-73.9854792",40.748384,-73.9854792
1,"100 Larkin St, San Francisco, CA 94102","37.779159,-122.415808",37.779159,-122.415808
2,"Wurster Hall, Berkeley, CA","37.8707352,-122.2548935",37.8707352,-122.2548935


## Part 2: Google Places API

We will use Google's Places API to look up places in the vicinity of some location. You need an API key for this.
- Documentation: https://developers.google.com/places/
- Example request: https://maps.googleapis.com/maps/api/place/search/json?keyword=coffee&location=37.8683811,-122.2589063&radius=1000&sensor=false&key=YOUR-KEY-HERE

In [7]:
# google places api url, with placeholders
url = 'https://maps.googleapis.com/maps/api/place/search/json?keyword={}&location={}&radius={}&key={}&sensor=false'

# what keyword to search for
keyword = 'restaurant'

# define the radius (in meters) for the search
radius = 1000

# define the location coordinates (of wurster hall)
location = locations.loc[2, 'latlng']
print location

37.8707352,-122.2548935


In [8]:
# add our variables into the url, submit the request to the api, and load the response
request = url.format(keyword, location, radius, api_key)
response = requests.get(request)
data = json.loads(response.text)

In [9]:
# create a new dataframe to contain the places in the api response
places = pd.DataFrame()
for place in data['results']:
    row = pd.DataFrame({'name':place['name'], 
                        'latitude':place['geometry']['location']['lat'], 
                        'longitude':place['geometry']['location']['lng'], 
                        'rating':place['rating'] if 'rating' in place else None,
                        'vicinity':place['vicinity']}, index=[0])
    # append each row to the dataframe
    places = places.append(row, ignore_index=True)
    
places.sort('rating', inplace=False, ascending=False).head()

Unnamed: 0,latitude,longitude,name,rating,vicinity
14,37.868755,-122.266882,Razan's Organic Kitchen,4.5,"2119 Kittredge Street, Berkeley"
2,37.870291,-122.267111,Ippuku,4.5,"2130 Center Street, Berkeley"
13,37.867668,-122.266156,Great China,4.5,"2190 Bancroft Way, Berkeley"
16,37.86369,-122.258984,Kiraku,4.4,"2566 Telegraph Avenue, Berkeley"
12,37.871683,-122.267481,Mandarin Garden Restaurant,4.4,"2025 Shattuck Avenue, Berkeley"


## Part 3: Reverse geocoding (address lookup)

We'll use Google's reverse geocoding API.
- Documentation: https://developers.google.com/maps/documentation/geocoding/intro#ReverseGeocoding
- Example request: https://maps.googleapis.com/maps/api/geocode/json?latlng=34.537094,-82.630303

You can do this manually, just like in the previous two sections, but it's a little more complicated to parse Google's address components results. If we just want addresses, we can use [geopy](https://geopy.readthedocs.org/en/release-0.96.3/#geopy.geocoders.GoogleV3) to simply call Google's API automatically for us.

In [10]:
# load usa point data and keep only the first 5 rows
usa = pd.read_csv('data/usa-latlong.csv')
usa = usa.head()
usa

Unnamed: 0,latitude,longitude
0,34.537094,-82.630303
1,35.0257,-78.9705
2,39.151817,-77.16381
3,38.636738,-121.31955
4,47.616955,-122.348921


In [11]:
# create a column to put lat-long into the format google likes - this just makes it easier to call their API
usa['latlng'] = usa.apply(lambda row: '{},{}'.format(row['latitude'], row['longitude']), axis=1)
usa

Unnamed: 0,latitude,longitude,latlng
0,34.537094,-82.630303,"34.537094,-82.630303"
1,35.0257,-78.9705,"35.0257,-78.9705"
2,39.151817,-77.16381,"39.151817,-77.16381"
3,38.636738,-121.31955,"38.636738,-121.31955"
4,47.616955,-122.348921,"47.616955,-122.348921"


In [12]:
# tell geopy to reverse geocode some lat-long string using Google's API and return the address
def reverse_geopy(latlng):
    geolocator = GoogleV3()
    address, _ = geolocator.reverse(latlng, exactly_one=True)
    return address

usa['address'] = usa['latlng'].map(reverse_geopy)
usa

Unnamed: 0,latitude,longitude,latlng,address
0,34.537094,-82.630303,"34.537094,-82.630303","3 Simpson Rd, Anderson, SC 29621, USA"
1,35.0257,-78.9705,"35.0257,-78.9705","5340 Sumac Cir, Fayetteville, NC 28304, USA"
2,39.151817,-77.16381,"39.151817,-77.16381","Spiceberry Cirle, Gaithersburg, MD 20877, USA"
3,38.636738,-121.31955,"38.636738,-121.31955","7932 Fair Oaks Blvd, Carmichael, CA 95608, USA"
4,47.616955,-122.348921,"47.616955,-122.348921","249-299 Cedar St, Seattle, WA 98121, USA"


#### What if you just want the city or state?
You could try to parse the address strings, but you're relying on them always having a consistent format. This might not be the case if you have international location data. In this case, you should call the API manually and extract the individual address components you are interested in.

In [13]:
# pass the Google API latlng data to reverse geocode it
def reverse_geocode(latlng):
    time.sleep(pause)
    url = 'https://maps.googleapis.com/maps/api/geocode/json?latlng={}&key={}'
    request = url.format(latlng, api_key)
    response = requests.get(request)
    data = json.loads(response.text)
    if len(data['results']) > 0:
        return data['results'][0] #if we got results, return the first result
    
geocode_results = usa['latlng'].map(reverse_geocode)

Now look inside each reverse geocode result to see if address_components exists. If it does, look inside each component to see if we can find the city or the state. Google calls the city name by the abstract term 'locality' and the state name by the abstract term 'administrative_area_level_1' ...this just lets them use the same terminology anywhere in the world.

In [14]:
def get_city(geocode_results):
     if 'address_components' in geocode_results:
        for address_component in geocode_results['address_components']:
            if 'locality' in address_component['types']:
                return address_component['long_name']
                
def get_state(geocode_results):
     if 'address_components' in geocode_results:
        for address_component in geocode_results['address_components']:
            if 'administrative_area_level_1' in address_component['types']:
                return address_component['long_name']

In [15]:
# now map our functions to extract city and state names
usa['city'] = geocode_results.map(get_city)                
usa['state'] = geocode_results.map(get_state)
usa

Unnamed: 0,latitude,longitude,latlng,address,city,state
0,34.537094,-82.630303,"34.537094,-82.630303","3 Simpson Rd, Anderson, SC 29621, USA",Anderson,South Carolina
1,35.0257,-78.9705,"35.0257,-78.9705","5340 Sumac Cir, Fayetteville, NC 28304, USA",Fayetteville,North Carolina
2,39.151817,-77.16381,"39.151817,-77.16381","Spiceberry Cirle, Gaithersburg, MD 20877, USA",Gaithersburg,Maryland
3,38.636738,-121.31955,"38.636738,-121.31955","7932 Fair Oaks Blvd, Carmichael, CA 95608, USA",Carmichael,California
4,47.616955,-122.348921,"47.616955,-122.348921","249-299 Cedar St, Seattle, WA 98121, USA",Seattle,Washington


## Part 4: Reverse geocoding to FIPS

We'll use the FCC's Census Block Conversions API to turn lat/long into a block FIPS code. FIPS codes contain from left to right: the location's 2-digit state code, 3-digit county code, 6-digit census tract code, and 4-digit census block code (the first digit of which is the census block group code). Now you can join your data to tract (etc) level census data without doing a spatial join.

- Documentation: https://www.fcc.gov/developers/census-block-conversions-api
- Example request: http://data.fcc.gov/api/block/find?format=json&latitude=37.861055&longitude=-122.256463

In [16]:
# pass the FCC API lat/long and get FIPS data back - return block fips and county name
def get_fips(row):
    time.sleep(pause)
    url = 'http://data.fcc.gov/api/block/find?format=json&latitude={}&longitude={}'
    request = url.format(row['latitude'], row['longitude'])
    response = requests.get(request)
    data = json.loads(response.text)
    
    # return multiple values as a series - this will create a dataframe with multiple columns
    return pd.Series({'fips_code':data['Block']['FIPS'], 'county':data['County']['name']})

In [17]:
# get block fips code and county name from FCC as new dataframe, then concatenate to join them
fips = usa.apply(get_fips, axis=1)
usa = pd.concat([usa, fips], axis=1)
usa

Unnamed: 0,latitude,longitude,latlng,address,city,state,county,fips_code
0,34.537094,-82.630303,"34.537094,-82.630303","3 Simpson Rd, Anderson, SC 29621, USA",Anderson,South Carolina,Anderson,450070112021073
1,35.0257,-78.9705,"35.0257,-78.9705","5340 Sumac Cir, Fayetteville, NC 28304, USA",Fayetteville,North Carolina,Cumberland,370510019034012
2,39.151817,-77.16381,"39.151817,-77.16381","Spiceberry Cirle, Gaithersburg, MD 20877, USA",Gaithersburg,Maryland,Montgomery,240317007104010
3,38.636738,-121.31955,"38.636738,-121.31955","7932 Fair Oaks Blvd, Carmichael, CA 95608, USA",Carmichael,California,Sacramento,60670078012000
4,47.616955,-122.348921,"47.616955,-122.348921","249-299 Cedar St, Seattle, WA 98121, USA",Seattle,Washington,King,530330080011008
