- Geocoding: converting a phyiscal address or location into latitude/longitude.
- Reverse geocoding: converting a lat/long to a phyiscal address or location

Python offers a number of packages to make the task  easy. Here I'm using pygeocoder, a wrapper for Google's geo-API, to demonstrate how geocoding and reverse geocoding work.

In [2]:
# Load packages:
# pygeocoder for its geo-functionality
# pandas for dataframe structures
# numpy for dealing with missing value (np.nan) 
from pygeocoder import Geocoder
import pandas as pd
import numpy as np

### Create some simulated geo data

Geo-data usually consist of latitude and longitude strings, with each coordinate in a coordinate pair seperated by a comma. For demonstration, prepare a dictionary of five latitude and longitude strings. 

In [3]:
# Create a dictionary of raw data
data = {'Site 1': '31.336968, -109.560959',
        'Site 2': '31.347745, -108.229963',
        'Site 3': '32.277621, -107.734724',
        'Site 4': '31.655494, -106.420484',
        'Site 5': '30.295053, -104.014528'}

In [4]:
# Convert the dictionary of raw data into a pandas dataframe
df = pd.DataFrame.from_dict(data, orient='index')

In [5]:
# View the dataframe
df

Unnamed: 0,0
Site 5,"30.295053, -104.014528"
Site 4,"31.655494, -106.420484"
Site 3,"32.277621, -107.734724"
Site 2,"31.347745, -108.229963"
Site 1,"31.336968, -109.560959"


As shown above, we have a dataframe with five rows, with each row containing a string of latitude and longitude. Before we can work with the data, we'll need to 
- seperate the strings into latitude and longitude
- convert them into floats

In [6]:
# Create two empty lists for the loop results to be placed
lat = []
lon = []

# For each row in a varible,
for row in df[0]:
    # Try to,
    try:
        # Split the row by comma, convert to float, and append
        # everything before the comma to lat
        lat.append(float(row.split(',')[0]))
        
        # Split the row by comma, convert to float, and append
        # everything after the comma to lon
        lon.append(float(row.split(',')[1]))
    # But an error occurs
    except:
        # append a missing value to lat
        lat.append(np.NaN)
        # append a missing value to lon
        lon.append(np.NaN)

In [7]:
# Create two new columns from lat and lon
df['latitude'] = lat
df['longitude'] = lon

In [8]:
df

Unnamed: 0,0,latitude,longitude
Site 5,"30.295053, -104.014528",30.295053,-104.014528
Site 4,"31.655494, -106.420484",31.655494,-106.420484
Site 3,"32.277621, -107.734724",32.277621,-107.734724
Site 2,"31.347745, -108.229963",31.347745,-108.229963
Site 1,"31.336968, -109.560959",31.336968,-109.560959


# Reverse Geocoding

To reverse geocode, we feed a specific latitude and longitude pair, in this case the first row (indexed as '0') into pygeocoder's reverse_geocoder function.

In [9]:
# Convert longitude and latitude to a location
results = Geocoder.reverse_geocode(df['latitude'][0], df['longitude'][0])

# Print the resulting lat/long
results.coordinates

(30.30077769999999, -104.0129162)

In [10]:
# Print the city
results.city

u'Marfa'

In [11]:
# Print the country
results.country

u'United States'

In [14]:
# Print the admin1 level
results.administrative_area_level_1

u'Texas'

In [15]:
# Print the street address (if applicable)
results.formatted_address

u'E Madrid St, Marfa, TX 79843, USA'

# Geocoding

For geocoding, we need to convert a string containing an formatted_address or location (such as a city) into the geocode function. 

However, not all strings are formatted in a way that Google's geo-API can make sense of them. We can test if an input is valid by using the .geocode().valid_address function.

In [17]:
# Verify that an address is valid (i.e. in Google's system)
Geocoder.geocode("235 Albany Street, Cambridge, MA 02139").valid_address

True

Because the output was True, we now know that this is a valid address and thus can print the latitude and longitude coordinates.

In [18]:
# Find the lat/long of a certain address
result = Geocoder.geocode("235 Albany Street, Cambridge, MA 02139")

Once the address is processed by the Google geo API, we can parse it and easily seperate street numbers, street names, etc.

In [19]:
# Print the street number
result.street_number

u'235'

In [24]:
# Print the street name
result.route

u'Albany Street'

In [21]:
# Print all data
result.data

[{u'address_components': [{u'long_name': u'235',
    u'short_name': u'235',
    u'types': [u'street_number']},
   {u'long_name': u'Albany Street',
    u'short_name': u'Albany St',
    u'types': [u'route']},
   {u'long_name': u'MIT',
    u'short_name': u'MIT',
    u'types': [u'neighborhood', u'political']},
   {u'long_name': u'Cambridge',
    u'short_name': u'Cambridge',
    u'types': [u'locality', u'political']},
   {u'long_name': u'Middlesex County',
    u'short_name': u'Middlesex County',
    u'types': [u'administrative_area_level_2', u'political']},
   {u'long_name': u'Massachusetts',
    u'short_name': u'MA',
    u'types': [u'administrative_area_level_1', u'political']},
   {u'long_name': u'United States',
    u'short_name': u'US',
    u'types': [u'country', u'political']},
   {u'long_name': u'02139',
    u'short_name': u'02139',
    u'types': [u'postal_code']}],
  u'formatted_address': u'235 Albany St, Cambridge, MA 02139, USA',
  u'geometry': {u'location': {u'lat': 42.3586984, u'