# Introduction

Ever dealt with geographical data (including physical addresses)?

Ever wanted to get accurate coordinates of a given city, place, or even a building's address?

Ever wanted to engineer a location-related feature?

I found myself in the same situation and I end up using `geopy`, a Python interface for geocoding APIs. This notebook demonstrates my use of `geopy` to extract information from postcodes and other location-related fields.

Enjoy!

In [3]:
!pip install pandas geopy

Collecting geopy
  Downloading geopy-2.4.0-py3-none-any.whl (125 kB)
Collecting geographiclib<3,>=1.52
  Downloading geographiclib-2.0-py3-none-any.whl (40 kB)
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-2.0 geopy-2.4.0


You should consider upgrading via the 'c:\program files\apache software foundation\tomcat 9.0_tomcat9-webgis-training\folium_basics\.venv\scripts\python.exe -m pip install --upgrade pip' command.


In [4]:
import pandas as pd
full_data = pd.read_csv('../data/London.csv')
full_data

Unnamed: 0.1,Unnamed: 0,Property Name,Price,House Type,Area in sq ft,No. of Bedrooms,No. of Bathrooms,No. of Receptions,Location,City/County,Postal Code
0,0,Queens Road,1675000,House,2716,5,5,5,Wimbledon,London,SW19 8NY
1,1,Seward Street,650000,Flat / Apartment,814,2,2,2,Clerkenwell,London,EC1V 3PA
2,2,Hotham Road,735000,Flat / Apartment,761,2,2,2,Putney,London,SW15 1QL
3,3,Festing Road,1765000,House,1986,4,4,4,Putney,London,SW15 1LP
4,4,Spencer Walk,675000,Flat / Apartment,700,2,2,2,Putney,London,SW15 1PL
...,...,...,...,...,...,...,...,...,...,...,...
3475,3475,One Lillie Square,3350000,New development,1410,3,3,3,,Lillie Square,SW6 1UE
3476,3476,St. James's Street,5275000,Flat / Apartment,1749,3,3,3,St James's,London,SW1A 1JT
3477,3477,Ingram Avenue,5995000,House,4435,6,6,6,Hampstead Garden Suburb,London,NW11 6TG
3478,3478,Cork Street,6300000,New development,1506,3,3,3,Mayfair,London,W1S 3AR


# What is a geocoding API?

Think Google Maps: you can send them a query string and they will search for that string and if they find a matching place/location, they'll send back a coordinate. They can also give you the more structured address of that location:

- street
- district
- state
- country
- even maybe postal codes, if available!

`geopy` provides simple, Pythonic interface for the APIs so you won't craft the requests and parse the responses by hand. One of the free and open-source geocoding APIs is Nominatim, provided by OpenStreetMap contributors.

In [None]:
from geopy import geocoders

# Instantiate a geocoder
# Note: Nominatim requires user_agent field to be filled with
# a general idea of what you are using Nominatim for.
geocoder = geocoders.Nominatim(user_agent = 'WebGIS Tutorial')

In [None]:
# Use the geocoder to send a sample request,
# plus additional address details.
result = geocoder.geocode('London', addressdetails = True)
result

# The response object

The following cells demonstrate what can be accessed from the `geopy`'s resulting Location object.

In [None]:
result.point

In [None]:
(result.latitude, result.longitude, result.altitude)

In [None]:
result.raw

# The 'address' field is our juicy information.

`result.raw['address']` above will return a dictionary that enlists a very neatly structured address hierarchy. If we use it with specific enough place, we can even get the name of the building, street, and the surrounding suburbs!

In [None]:
# Search for Hornsey Town Hall within Great Britain only.
result = geocoder.geocode('Hornsey Town Hall', addressdetails = True, country_codes = 'gb')
result

# Processing and rate-limiting

We can make the geocoding calls more convenient by making a helper function that

- process our inputs by adding arguments that we'll always use, and
- process the outputs by giving only fields that we want.

Furthermore, we can rate-limit our requests so we don't overwhelm the server.

In [None]:
# This is a helper function to screen in only these fields
cols = ['suburb', 'town', 'city', 'state_district', 'state', 'country']
def geocode(query):
    # Make a geocoding request with query,
    # restrict the results to Great Britain,
    # and give detailed address structure.
    result = geocoder.geocode(query, country_codes = 'gb', addressdetails = True)
    
    # If there is no result with that query, return None
    if result is None:
        return None
    
    # If there is a result, return only the desired fields
    address = result.raw['address']
    address = {key: value for key, value in address.items() if key in cols}
    return address

# Rate limit the requests to only 20 requests per second max.
from geopy.extra.rate_limiter import RateLimiter
geocode = RateLimiter(geocode, min_delay_seconds = 1/20)

In [None]:
# Call our function above with a postcode
addr = geocode('SW19 8NY')
addr

In [None]:
# Call our function above with a street name plus city
addr = geocode('Seward Street, London')
addr

In [None]:
# Call our function with a place that definitely does not exist
# in Great Britain
addr = geocode('Uttar Pradesh')
addr is None

In [None]:
# Call our function with a typo
addr = geocode('Birminghm')
addr is None

In [None]:
pd.json_normalize(full_data["Postal Code"].sample(10).apply(geocode))

In [None]:
%%time

# Get a fully structured dataframe of address by searching for UK postcodes
# in Nominatim's OpeenStreetMap database.
nominatim_address = full_data['Postal Code'].apply(geocode)

# Convert the Series of dict into proper dataframe
nominatim_address = pd.json_normalize(nominatim_address)

# Append the original Postal Code to the left
nominatim_address = pd.concat([full_data['Postal Code'], nominatim_address], axis = 1)

# Display
nominatim_address

Some caveat:

- Nominatim is an open-source, free APIs, provided by community contributions and donations. Being free, it rate-limits your use -- the above cell takes about ~30 minutes for ~3500 entries or about 2 entries per second. You are also not allowed to do bulk-geocoding and scraping, it violates the terms of service.

- Nominatim's database (OpenStreetMap) is also powered by community contributions. In my experience, when I try to use Nominatim on rural or remote places, the results get less detailed -- maybe because there's less contributors for that area. However, it's still a very capable free alternative for research and experiment purposes.

- Nominatim does not handle typos! You get what you type in! This is good, actually; so you won't get coordinates of a wrong place halfway across the world due to mistyping on you or your data's end.

# Closing

That's it! I hope this notebook helps you in dealing with addresses and other location-related data by using free, open-source, geocoding APIs. Being a citizen of the internet that loves freedom in research, you know I love free things ;)

Feel free to upvote and fork if you like to try this yourself. Keep learning and happy data-sciencing!