<a href="https://colab.research.google.com/github/peterhaasme/How-to-Geocode-with-Python-and-Pandas/blob/master/How_to_Geocode_with_Python_and_Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Are you running a data analysis on a dataset that has physical addresses? Would you like to plot those on a map? You'll need to *geocode* them to generate their latitude and longitude co-ordinates.

## How to look up the geocode for a single address

You'll be using [geopy](https://geopy.readthedocs.io/en/stable/#module-geopy.distance), a Python client for several popular geocoding webservices. Start by installing geopy.

In [1]:
!pip install geopy



Import the library you just installed. You'll be using the [Nominatim](https://nominatim.org/release-docs/develop/api/Overview/) geocoding service. There are many geocoding services available, but this one does not require an API key to access.

In [0]:
from geopy.geocoders import Nominatim

Create a geolocator object using the [Openstreet Nominatim API](https://nominatim.org/release-docs/develop/api/Overview/). It's a good idea to increase the default timeout setting from 1s to 10s so that you don't get a TimedOut exception. You'll also need to enter a name (any name) for the 'user_agent' attribute.

To test out the geolocator, pass it an address. Print out the location object. Then print the latitude and longitude from the object.

In [5]:
geolocator = Nominatim(timeout=10, user_agent = "myGeolocator")
location = geolocator.geocode('4550 Kester Mill Rd,Winston-Salem,NC')
print(location)
print((location.latitude, location.longitude))

Walmart Supercenter, 4550, Kester Mill Road, Salem Woods, Winston-Salem, Forsyth County, North Carolina, 27103, United States of America
(36.067591, -80.337243)


## How to calculate the distance between two addresses

Another helpful tool is the ability to [calculate the geodesic distance](https://geopy.readthedocs.io/en/stable/#module-geopy.distance) between two adresses. You can do this by loading the Geopy distance module.

In [0]:
from geopy import distance

Geocode the addresses. Extract the latitude and longitude as a tuple. Pass to the distance calculator and calculate distance.

In [7]:
location_1 = geolocator.geocode('11415 Quaker Ave,Lubbock,TX')
location_1_gps = (location_1.latitude, location_1.longitude)
location_2 = geolocator.geocode('4550 Kester Mill Rd,Winston-Salem,NC')
location_2_gps = (location_2.latitude, location_2.longitude)
# distance in miles
distance_calc_mi = distance.distance(location_1_gps, location_2_gps).miles
print(distance_calc_mi)
# distance in kilometers
distance_calc_km = distance.distance(location_1_gps, location_2_gps).km
print(distance_calc_km)

1236.7167978105956
1990.3027582556954


## How to geocode in a pandas DataFrame

Do you have more than one address? Are you looking to analyze these in a pandas DataFrame? No problem. Start by loading the pandas module.

In [0]:
import pandas as pd

Upload your CSV to Google Colab. You can find the example file for this article [here](https://raw.githubusercontent.com/peterhaasme/How-to-Geocode-with-Python-and-Pandas/master/sample_addresses.csv).

In [0]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

Make a DataFrame from the CSV.

In [9]:
df = pd.read_csv('sample_addresses.csv')
print(df.head())

               address           city state    zip
0  4550 Kester Mill Rd  Winston-Salem    NC  27103
1     11415 Quaker Ave        Lubbock    TX  79424
2   8885 N Florida Ave          Tampa    FL  33604
3   16375 Merchants Ln    King George    VA  22485


Add a column that combines the address, city, state, and zip code into one line. You need to do this so you can pass it to the geolocator.

In [0]:
df['full_address'] = df.address + "," + df.city + "," + df.state
print(df.head())

Add a column with the geocodes

In [12]:
df['gcode'] = df.full_address.apply(geolocator.geocode)
print(df)

               address  ...                                              gcode
0  4550 Kester Mill Rd  ...  (Walmart Supercenter, 4550, Kester Mill Road, ...
1     11415 Quaker Ave  ...  (Walmart Supercenter, 11415, Quaker Avenue, Ha...
2   8885 N Florida Ave  ...  (Walmart Neighborhood Market, 8885, North Flor...
3   16375 Merchants Ln  ...  (Walmart Supercenter, 16375, Merchants Lane, K...

[4 rows x 6 columns]


Add columns for latitude and longitude

In [13]:
df['lat'] = [g.latitude for g in df.gcode]
df['long'] = [g.longitude for g in df.gcode]
print(df)

               address           city  ...        lat        long
0  4550 Kester Mill Rd  Winston-Salem  ...  36.067591  -80.337243
1     11415 Quaker Ave        Lubbock  ...  33.489513 -101.901833
2   8885 N Florida Ave          Tampa  ...  28.032076  -82.457679
3   16375 Merchants Ln    King George  ...  38.351884  -77.057379

[4 rows x 8 columns]


## How to map the addresses in your pandas DataFrame using Folium

Now that you have the latitude and longitude, you can plot your addresses using [Folium](https://python-visualization.github.io/folium/), a python library for making interactive maps.

Start by installing and loading the library.

In [14]:
!pip install folium
import folium



Create an empty map. Center it on a region that encompases your addresses. *Don't use map as a variable b/c it's reserved in Python.*

In [15]:
mapa = folium.Map(location=(36.104087829589844,-86.77576446533203), zoom_start=5)
display(mapa)

Add the geocoded locations to the map

In [16]:
for index, row in df.iterrows():
  folium.Marker(location=(row['lat'],row['long'])).add_to(mapa)

display(mapa)

Congratulations, you've covered the basics of geocoding and mapping in Python. For more detailed and complex operations, be sure to read through the documentation for geopy and Folium.