# Geocode addresses

Import the following libraries as usual.

In [1]:
import pandas as pd
import json

## `censusgeocode` 

The Python library [`censusgeocode`](https://pypi.org/project/censusgeocode/) is a nice easy way to geocode data from the US Census. Make sure you look at that documentation.

[The Census also has documentation](https://www.census.gov/programs-surveys/geography/technical-documentation/complete-technical-documentation/census-geocoder.html) you should read if you want to get into more geocoding.

In [2]:
import censusgeocode as cg

## Geocode one address

Let's look at just one address. There's going to be a lot of data. (This is random.)

In [3]:
result = cg.address('1241 HOLMAN RD', city='Oakland', state='California')
result

[{'tigerLine': {'side': 'R', 'tigerLineId': '125002398'},
  'geographies': {'2018 State Legislative Districts - Upper': [{'GEOID': '06009',
     'CENTLAT': '+37.8692954',
     'AREAWATER': 252962815,
     'STATE': '06',
     'BASENAME': '9',
     'OID': '212904690192874',
     'SLDU': '009',
     'LSADC': 'LU',
     'FUNCSTAT': 'N',
     'INTPTLAT': '+37.8695634',
     'NAME': 'State Senate District 9',
     'OBJECTID': 1008,
     'CENTLON': '-122.2639442',
     'LSY': '2018',
     'AREALAND': 512863741,
     'INTPTLON': '-122.2625626',
     'MTFCC': 'G5210',
     'LDTYP': 'O'}],
   'States': [{'STATENS': '01779778',
     'GEOID': '06',
     'CENTLAT': '+37.1547578',
     'AREAWATER': 20291712025,
     'STATE': '06',
     'BASENAME': 'California',
     'STUSAB': 'CA',
     'OID': '2749018475066',
     'LSADC': '00',
     'FUNCSTAT': 'A',
     'INTPTLAT': '+37.1551773',
     'DIVISION': '9',
     'NAME': 'California',
     'REGION': '4',
     'OBJECTID': 19,
     'CENTLON': '-119.527771

Check out how much the Census returns! It goes down to Census block and tract! You could do a very granular analysis using geocoded data like this.

The Census may return more than 1 address, so it's good to check how many results you have. Usually the first result is the most likely one.

In [4]:
len(result)

1

In [5]:
result[0]['coordinates'] # returns lng, lat

{'x': -122.23014397082626, 'y': 37.806448352151385}

## Batch geocode

It's more efficient to upload a batch of addresses to Census and get a result back. So let's do that.

### Sample addresses

The Census wants a CSV in the following format:

```
1,4600 Silver Hill Road,Washington,DC,20233
```
How do I know this is what Census wants? [It's on their website.](https://geocoding.geo.census.gov/geocoder/locations/addressbatch?form) Click on the link that says `Download a sample CSV file here`.

What does that format look like to you?

```
index, street_address, city, state_abbreviation, zip_code
```

### Import addresses

In [6]:
addresses = pd.read_csv('exports/random_addresses_10.csv')
addresses

Unnamed: 0,ADDRESS_CLEANED
0,500 E. 22ND ST
1,900 36TH AV
2,47TH ST & DOVER
3,850 PINE ST
4,5300 BLOCK OF JAMES AVE
5,INTERNATIONAL BLVD & 42ND AV
6,2000 CAMPBELL ST
7,611 OLD QUARRY LOOP
8,2045 EAST 15TH ST
9,MUNSON & E 15TH ST


In [7]:
# Fill in missing data
# This is from Oakland 311 data, so we know what the city and state will be.
# We can leave Zip blank for now.
addresses['City'] = 'Oakland'
addresses['State'] = 'CA'
addresses['Zip'] = ''
addresses

Unnamed: 0,ADDRESS_CLEANED,City,State,Zip
0,500 E. 22ND ST,Oakland,CA,
1,900 36TH AV,Oakland,CA,
2,47TH ST & DOVER,Oakland,CA,
3,850 PINE ST,Oakland,CA,
4,5300 BLOCK OF JAMES AVE,Oakland,CA,
5,INTERNATIONAL BLVD & 42ND AV,Oakland,CA,
6,2000 CAMPBELL ST,Oakland,CA,
7,611 OLD QUARRY LOOP,Oakland,CA,
8,2045 EAST 15TH ST,Oakland,CA,
9,MUNSON & E 15TH ST,Oakland,CA,


In [8]:
# Export file to upload to Census
addresses.to_csv(
    'exports/addresses_10.csv', 
    index=True,  # Census is expecting an index as an ID! 
    header=False # This removes the top row with column names because Census doesn't want that
)


In [9]:
results = cg.addressbatch('exports/addresses_10.csv')

In [10]:
# Create a dataframe with the results
results_df = pd.DataFrame().from_records(results)
results_df

Unnamed: 0,id,address,match,matchtype,parsed,tigerlineid,side,statefp,countyfp,tract,block,lat,lon
0,0,"500 E. 22ND ST, Oakland, CA,",True,Exact,"500 22ND ST, OAKLAND, CA, 94612",124996500.0,R,6.0,1.0,402801.0,1011.0,37.811109,-122.269588
1,1,"900 36TH AV, Oakland, CA,",True,Exact,"900 36TH AVE, OAKLAND, CA, 94601",125006474.0,R,6.0,1.0,406100.0,2002.0,37.77294,-122.223884
2,2,"47TH ST & DOVER, Oakland, CA,",False,,,,,,,,,,
3,3,"850 PINE ST, Oakland, CA,",True,Exact,"850 PINE ST, OAKLAND, CA, 94607",124995322.0,R,6.0,1.0,401700.0,1006.0,37.809568,-122.302636
4,4,"5300 BLOCK OF JAMES AVE, Oakland, CA,",True,Exact,"5300 JAMES AVE, OAKLAND, CA, 94618",124999407.0,R,6.0,1.0,400300.0,4011.0,37.838561,-122.253707
5,5,"INTERNATIONAL BLVD & 42ND AV, Oakland, CA,",False,,,,,,,,,,
6,6,"2000 CAMPBELL ST, Oakland, CA,",True,Exact,"2000 CAMPBELL ST, OAKLAND, CA, 94607",606185175.0,R,6.0,1.0,401700.0,2007.0,37.816176,-122.29194
7,7,"611 OLD QUARRY LOOP, Oakland, CA,",False,,,,,,,,,,
8,8,"2045 EAST 15TH ST, Oakland, CA,",True,Exact,"2045 E 15TH ST, OAKLAND, CA, 94606",606190585.0,R,6.0,1.0,405901.0,1005.0,37.786372,-122.238749
9,9,"MUNSON & E 15TH ST, Oakland, CA,",True,Non_Exact,"MUNSON WAY & E 15TH ST, OAKLAND, CA, 94606",,,6.0,1.0,405901.0,1010.0,37.784883,-122.236251


In [11]:
results_df_notnull = results_df.dropna(subset=['lat'])
results_df_notnull

Unnamed: 0,id,address,match,matchtype,parsed,tigerlineid,side,statefp,countyfp,tract,block,lat,lon
0,0,"500 E. 22ND ST, Oakland, CA,",True,Exact,"500 22ND ST, OAKLAND, CA, 94612",124996500.0,R,6,1,402801,1011,37.811109,-122.269588
1,1,"900 36TH AV, Oakland, CA,",True,Exact,"900 36TH AVE, OAKLAND, CA, 94601",125006474.0,R,6,1,406100,2002,37.77294,-122.223884
3,3,"850 PINE ST, Oakland, CA,",True,Exact,"850 PINE ST, OAKLAND, CA, 94607",124995322.0,R,6,1,401700,1006,37.809568,-122.302636
4,4,"5300 BLOCK OF JAMES AVE, Oakland, CA,",True,Exact,"5300 JAMES AVE, OAKLAND, CA, 94618",124999407.0,R,6,1,400300,4011,37.838561,-122.253707
6,6,"2000 CAMPBELL ST, Oakland, CA,",True,Exact,"2000 CAMPBELL ST, OAKLAND, CA, 94607",606185175.0,R,6,1,401700,2007,37.816176,-122.29194
8,8,"2045 EAST 15TH ST, Oakland, CA,",True,Exact,"2045 E 15TH ST, OAKLAND, CA, 94606",606190585.0,R,6,1,405901,1005,37.786372,-122.238749
9,9,"MUNSON & E 15TH ST, Oakland, CA,",True,Non_Exact,"MUNSON WAY & E 15TH ST, OAKLAND, CA, 94606",,,6,1,405901,1010,37.784883,-122.236251


What are you noticing here?

In [12]:
# Export your data
results_df_notnull.to_csv('exports/census_results.csv', index=False)