<a href="https://colab.research.google.com/github/kevin-kaianalytics/reverse_geo_id/blob/master/GeoCodingApp.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction
This script was developed for a COVID-19 related study at UW to help recovery address information from GeoIDs (latitude and longitude). Simply upload a csv file with two columns: "Location Latitude" and "Location Longitude". You may also wish to include additional columns such as a unique ID to link the results back to your dataset. I recommend against uploading any private information onto Google Colab. The best pracitice is to save a copy of this code into your own Google Drive. *Be aware of any laws governing how and where your dataset can be uploaded.

For any questions please email us at: info@kaianalytics.com

Kevin Chang
Founder and CEO, Kai Analytics and Survey Research Inc.

Copyright (c) 2020 Kai Analytics and Survey Research Inc.

# Licensing

MIT License

Copyright (c) 2020 Kai Analytics and Survey Research Inc.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

**The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.**

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

In [None]:
import pandas as pd
 
from google.colab import files
import io
# You need to upload your file using the menu tool bar on the left. 
# Make sure the file name matches to the single quotes below.
df = pd.read_csv('covidCopingGeoID.csv')

# Let's check and see we've loaded our data properly.
# We do this by checking its first 10 rows.
pd.options.display.max_columns = None
display(df.head(10))

Unnamed: 0,Location Latitude,Location Longitude,PID
0,41.683701,-88.349899,40720162244
1,43.701904,-72.2827,40820090711
2,41.596405,-71.2565,40820090420
3,40.7957,-77.861801,40820090950
4,33.868103,-118.183105,40820091104
5,29.165695,-82.127296,40820092004
6,42.465195,-83.3713,40820090838
7,32.430099,-80.669403,40820092817
8,36.215805,-115.066002,40820092652
9,33.664703,-117.966301,40820092932


In [None]:
df['latlon'] = df["Location Latitude"].map(str)+ "," + df["Location Longitude"].map(str)

In [None]:
print(df['latlon'].head())

0           41.68370056,-88.34989929
1    43.701904299999995,-72.28269958
2           41.59640503,-71.25650024
3           40.79570007,-77.86180115
4           33.86810303,-118.1831055
Name: latlon, dtype: object


In [None]:
!pip install geopy



In [None]:
# This is the opensource geo coding package
# Depending on the size of your dataset, you might want to slow down the number 
# of address searches per second via min_delay_seconds

from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter

locator = Nominatim(user_agent='myGeocoder')
#locator = Nominatim(user_agent="geoapiExercises")

rgeocode = RateLimiter(locator.reverse, min_delay_seconds=0.001)

In [None]:
# Define some city, county, and state lookup function
# You can probably combine theses into a bigger function but I thought it's 
# easier to show you the steps broken down one by one

def city(coord):
    location = locator.reverse(coord, exactly_one=True)
    address = location.raw['address']
    city = address.get('city', 'N/A')
    return city

def county(coord):
    location = locator.reverse(coord, exactly_one=True)
    address = location.raw['address']
    county = address.get('county', 'N/A')
    return county

def state(coord):
    location = locator.reverse(coord, exactly_one=True)
    address = location.raw['address']
    state = address.get('state', 'N/A')
    return  state

In [None]:
# Populate city data
df['city'] = df['latlon'].apply(city)

In [None]:
# Populate county data
df['county'] = df['latlon'].apply(county)

In [None]:
# Populate state data
df['state'] = df['latlon'].apply(state)

In [None]:
#print(df['address'])
print(df)

   Location Latitude  Location Longitude          PID  \
0          41.683701          -88.349899  40720162244   
1          43.701904          -72.282700  40820090711   
2          41.596405          -71.256500  40820090420   
3          40.795700          -77.861801  40820090950   
4          33.868103         -118.183105  40820091104   
5          29.165695          -82.127296  40820092004   
6          42.465195          -83.371300  40820090838   
7          32.430099          -80.669403  40820092817   
8          36.215805         -115.066002  40820092652   
9          33.664703         -117.966301  40820092932   

                            latlon              city              county  \
0         41.68370056,-88.34989929               N/A      Kendall County   
1  43.701904299999995,-72.28269958               N/A      Grafton County   
2         41.59640503,-71.25650024               N/A      Newport County   
3         40.79570007,-77.86180115     State College       Centre Co

In [None]:
from google.colab import files

df.to_csv('uwReverseGeocodeResults.csv', index=False)
files.download('uwReverseGeocodeResults.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>