# Geocoding

Adapted from materials prepared by [Jeff Allen](http://jamaps.github.io) (April, 2023)

Geocoding is the process of taking an address string e.g. "123 Main St" and converting it to an X,Y coordinate that we can map / use for spatial analysis

Let's learn how to geocode addresses via **Nominatim**, the free to use geocoder for OpenStreetMap.

Nominatim uses OpenStreetMap data to find locations by name and address (e.g., a forward search), but it can also do the reverse, find an address for any location on Earth. We can access these functions through an API but first, take time to use the following UI to gain an understainding of the data returned from a search. For example, after performing a search for a city or address, select the **Details** of the query result and notice that XY coordinates are indluded for the centre point of a location.

- https://nominatim.openstreetmap.org/ui/

## 1. Create a Python function to geocode an address using the Nominatim API

Let's create a Python function that reads in an address and geocodes it!

This uses the [`requests`](https://pypi.org/project/requests/) library to query the **Nominatim API**, and then the [`json`](https://docs.python.org/3/library/json.html) library to parse the output into a Python dictionary.

The API documentation details the format required for a search query:
- https://nominatim.org/release-docs/latest/api/Search/ 

We will be using the [JSON WITH ADDRESS DETAILS](https://nominatim.org/release-docs/latest/api/Search/#json-with-address-details) format which requires the following parameters:

1. An address to be queried (e.g., 'Canada' or 'Toronto' or '100 St George Street'
2. **addressdetails** variable and a value from 0 to 2. 0 returns top level address information only, e.g., (country) whereas 2 returns the most detailed information available (e.g., house number).
3. **limit** variable and a value of 1 to 5. The value indicates the number of values returned from an address query. For example, if you search for 'London' and set a limit of 1, only the first queried result will be returned, even if this address query may be associated with multiple locations.

An example search query may therefore look like:

- https://nominatim.openstreetmap.org/search?q=London&format=jsonv2&addressdetails=1&limit=1


In [None]:
import requests
import json

In [None]:
def geocode(address):

    try:
        # Construct URL for Nominatim API request
        url = "https://nominatim.openstreetmap.org/search?q=" + address + "&format=jsonv2&addressdetails=2&limit=5"

        # Use get() method from requests library to send request to API and retrieve data in JSON format
        response = requests.get(url)

        # Convert data in JSON format into Python object for easier use
        data = json.loads(response.content)
        
        # Switch this out with the next line if you want to see the full data that is being returned for all locations matching query
        #return data 

        # Return only lat and long values from first location that matches the query [0]
        return(data[0]['lon'], data[0]['lat']) 
    
    except:
        
        return(0,0)

Let's use our geocode function to return the coordinates for a place of interest in the cell below. I have added in 'Toronto' to get you started, however, I encourage you to try geocoding places of different scales (e.g., a building address vs a country).

Consider the issue of geocoding a location which has the same name as another location (e.g., London, ON and London, UK). As our limit parameter is set to 5, you may switch out the line of code `#return data` in the cell above to see data for up to five locations that match the query. Notice how adding additional detail, such as a county or state improves the accuracy of the geocoding process.

In [None]:
data = geocode('Toronto')
data

## 2. Applying the geocode function to tabulated building permit address data

We'll be using cleared Building Permit data from the City of Toronto's Open Data catalogue: 
- https://open.toronto.ca/dataset/building-permits-cleared-permits/

In [None]:
import pandas as pd
import geopandas as gpd
import time

In [None]:
df = pd.read_csv('clearedbuildingpermits_2017.csv', dtype=str)

Geocoding can take a while, so let's just work with a subset of this data. We will create a subset that includes data for any new Laneway housing that has been built since September 2023.

In [None]:
df.head()

In [None]:
df = df.loc[(df["WORK"] == 'New Laneway / Rear Yard Suite') & (df["APPLICATION_DATE"] >= '2023-09-01')]
df

We can see in the table above that we have columns for the address (STREET_NAME, STREET_TYPE, and STREET_NUM), but no spatial coordinates!


Now let's loop over the dataframe, trying to geocode each row with our `geocode` function. In line with Nominatim's [Usage Policy](https://operations.osmfoundation.org/policies/nominatim/), we will limit requests to a maximum of 1 per second:

In [None]:
%%time

# Initialize empty lists to store coords and addresses
coordinates = []
addresses = []

# Loop through each row in dataframe
for index, row in df.iterrows():
    
    # Extract values for address components
    street_number = row["STREET_NUM"]
    street_name = row["STREET_NAME"]
    street_type = row["STREET_TYPE"]
    street_dir = row["STREET_DIRECTION"]
    
    # Concatenate values to build a search query for full address in TO
    address = street_number + " " + street_name + " " + street_type + " " + street_dir + " " + "Toronto, Canada"
    
    # Check if address has already been processed
    if address not in addresses:
        
        # Print address being processed
        print(address)
        
        # Add address to list
        addresses.append(address)
    
        # Use previously created function to geocode address
        data = geocode(address)

        # Ensure compliance with Nominatim usage policy by limiting requests to 1 per second
        time.sleep(1)

        # Add permit number, address, and coordinates to coords list
        coordinates.append([row["PERMIT_NUM"], address, data[0], data[1]])
    

We can save the output as a csv or convert it to geopandas GeoDataFrame (which could be further saved as a .geojson or any other spatial format)

In [None]:
# Save output as pandas dataframe
dfc = pd.DataFrame(coordinates, columns = ['PERMIT_NUM', 'address', 'X', 'Y'])

# Export to csv
dfc.to_csv("geocoded-building-permits.csv")

# Convert to GeoDataFrame
dfc = gpd.GeoDataFrame(dfc, geometry=gpd.points_from_xy(dfc.X, dfc.Y))
dfc

In [None]:
dfc.plot()

## 3. Alternative approaches

Writing our own Python function to access the Nominatim API is just one way to geocode address queries.

The code below uses a class in the [`geopy`](https://geopy.readthedocs.io/en/stable/#nominatim) library to access the Nominatim geocoder, though there are many classes for alternative APIs! 


In [None]:
import geopy
from geopy.geocoders import Nominatim

# Initialise new Nominatim client using geopy Nominatim class
geolocator = Nominatim(user_agent="my-geocoder")

location = geolocator.geocode('Toronto')

print(location, location.longitude, location.latitude)

In [None]:
coords = []

# Loop through each row in dataframe
for index, row in df.iterrows():
    
     # Extract values for address components
    street_number = row["STREET_NUM"]
    street_name = row["STREET_NAME"]
    street_type = row["STREET_TYPE"]
    street_dir = row["STREET_DIRECTION"]
    
    # Concatenate values to build a search query for full address in TO
    address = street_number + " " + street_name + " " + street_type + " " + street_dir + " " + "Toronto, Canada"
    
    location = geolocator.geocode(address)
    
    coords.append([row["PERMIT_NUM"], address, location.longitude, location.latitude])
    
# Save output as pandas dataframe etc...
#dfc = pd.DataFrame(coords, columns = ['PERMIT_NUM', 'address', 'X', 'Y'])

You may also use GIS software. The following tutorials demosntrate how to geocode addresses using plugins in QGIS:

- [GeoCoding and mmqgis](https://guides.library.ucsc.edu/DS/Resources/QGIS) (UC Santa Cruz)
- [mmqgis](https://www.lib.uwo.ca/madgic/projects/gis/Geocoding%20Using%20QGIS.pdf) (Western)

If you have access to an Esri licence, [dragging a csv into a map in ArcOnline](https://support.esri.com/en-us/knowledge-base/how-to-geocode-addresses-from-an-imported-csv-file-in-a-000021263) is a super quick way to plot places of interest!
    