# Geocoding with Geopy and GeoPandas

Overview

We will be working with an Excel sheet containing addresses of Hurricane Evacuation Centers in New York City. We will geocode these addresses and display the locations on a map.

Input Layers:

`Hurricane_Evacuation_Centers.xlsx`: A spreadsheet containing names and addresses of hurricane shelters inn NYC.

Output:

`hurricane_evacuation_centers.shp`: A shapefile with geocoded locations of the hurricane evacuation centers

Data Credit: NYC Open Data Portal. [source](https://data.cityofnewyork.us/Public-Safety/Hurricane-Evacuation-Centers-Map-/ayer-cga7)

## Setup and Data Download

The following blocks of code will install the required packages and download the datasets to your Colab environment.

In [105]:
%%capture
if 'google.colab' in str(get_ipython()):
    !apt install libspatialindex-dev
    !pip install fiona shapely pyproj rtree mapclassify
    !pip install geopandas
    !pip install openpyxl
    !pip install leafmap

In [106]:
import os
import re
import pandas as pd
import geopandas as gpd
import leafmap.foliumap as leafmap
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter

In [107]:
data_folder = 'data'
output_folder = 'output'

if not os.path.exists(data_folder):
    os.mkdir(data_folder)
if not os.path.exists(output_folder):
    os.mkdir(output_folder)

In [108]:
def download(url):
    filename = os.path.join(data_folder, os.path.basename(url))
    if not os.path.exists(filename):
        from urllib.request import urlretrieve
        local, _ = urlretrieve(url, filename)
        print('Downloaded ' + local)

download('https://github.com/spatialthoughts/python-tutorials/raw/main/data/' +
         'Hurricane_Evacuation_Centers.xlsx')

## Procedure

Read the `Bars_and_pubs__with_patron_capacity.csv` file and convert it to a GeoDataFrame.

In [109]:
excel_file = 'Hurricane_Evacuation_Centers.xlsx'
excel_file_path = os.path.join(data_folder, excel_file)
address_df = pd.read_excel(excel_file_path, skiprows=[0])
address_df

Unnamed: 0,CITY,EC_Name,ADDRESS,ZIP_CODE,BOROCODE,STATE,ACCESSIBLE
0,Flushing,J.H.S. 185 - Queens,147-26 25 Drive,11354,4,NY,Y
1,Brooklyn,I.S. 258 - Brooklyn,141 Macon Street,11216,3,NY,Y
2,Brooklyn,I.S. 88 - Brooklyn,544 7 Avenue,11215,3,NY,Y
3,Bronx,Walton HS - X,2780 Reservoir Avenue,10468,2,NY,Y
4,Brooklyn,P.S./I.S.30 Mary White Ovington - Brooklyn,7002 4 Avenue,11209,3,NY,Y
...,...,...,...,...,...,...,...
59,Bronx,I.S. 201 - Bronx,730 Bryant Avenue,10474,2,NY,Y
60,Bronx,P.S. 46 - Bronx,2760 Briggs Avenue,10458,2,NY,Y
61,Long Island City,Aviation HS - Q,45-30 36 Street,11101,4,NY,Y
62,Bronx,P.S. 102 - Bronx,1827 Archer Street,10460,2,NY,Y


The Nominatim geocoder expects the addresses with certain format. We fix our source addresses by making the street numbers as orfinal numbers

Examples

- `45-30 36 Street` → `45-30 36th Street`.
- `544 7 Avenue` → `544 7th Avenue`

In [110]:
def make_ordinal(match):
    n = int(match.group(1))
    if 11 <= (n % 100) <= 13:
        suffix = 'th'
    else:
        suffix = ['th', 'st', 'nd', 'rd', 'th'][min(n % 10, 4)]
    return str(n) + suffix + match.group(2)

def update_address(row):
  old_address = row['ADDRESS']
  pattern = r'(\d+)(\s+(?:Street|Avenue|Blvd|Drive))'
  result = re.sub(pattern, make_ordinal, old_address)
  return result

address_df['ADDRESS_FIXED'] = address_df.apply(update_address, axis=1)
address_df

Unnamed: 0,CITY,EC_Name,ADDRESS,ZIP_CODE,BOROCODE,STATE,ACCESSIBLE,ADDRESS_FIXED
0,Flushing,J.H.S. 185 - Queens,147-26 25 Drive,11354,4,NY,Y,147-26 25th Drive
1,Brooklyn,I.S. 258 - Brooklyn,141 Macon Street,11216,3,NY,Y,141 Macon Street
2,Brooklyn,I.S. 88 - Brooklyn,544 7 Avenue,11215,3,NY,Y,544 7th Avenue
3,Bronx,Walton HS - X,2780 Reservoir Avenue,10468,2,NY,Y,2780 Reservoir Avenue
4,Brooklyn,P.S./I.S.30 Mary White Ovington - Brooklyn,7002 4 Avenue,11209,3,NY,Y,7002 4th Avenue
...,...,...,...,...,...,...,...,...
59,Bronx,I.S. 201 - Bronx,730 Bryant Avenue,10474,2,NY,Y,730 Bryant Avenue
60,Bronx,P.S. 46 - Bronx,2760 Briggs Avenue,10458,2,NY,Y,2760 Briggs Avenue
61,Long Island City,Aviation HS - Q,45-30 36 Street,11101,4,NY,Y,45-30 36th Street
62,Bronx,P.S. 102 - Bronx,1827 Archer Street,10460,2,NY,Y,1827 Archer Street


We also create a new column containing the full address. The `CITY` column in the data are actually boroughs of NYC and adding it causing many geocoding requests to fail. Instead, we use the city name 'NYC' for all addresses.

In [111]:
address_df['Full_Address'] = (
    address_df['ADDRESS_FIXED'] + ',' +
    'NYC' + ',' +
    address_df['STATE']+ ',' +
    address_df['ZIP_CODE'].astype(str))
address_df

Unnamed: 0,CITY,EC_Name,ADDRESS,ZIP_CODE,BOROCODE,STATE,ACCESSIBLE,ADDRESS_FIXED,Full_Address
0,Flushing,J.H.S. 185 - Queens,147-26 25 Drive,11354,4,NY,Y,147-26 25th Drive,"147-26 25th Drive,NYC,NY,11354"
1,Brooklyn,I.S. 258 - Brooklyn,141 Macon Street,11216,3,NY,Y,141 Macon Street,"141 Macon Street,NYC,NY,11216"
2,Brooklyn,I.S. 88 - Brooklyn,544 7 Avenue,11215,3,NY,Y,544 7th Avenue,"544 7th Avenue,NYC,NY,11215"
3,Bronx,Walton HS - X,2780 Reservoir Avenue,10468,2,NY,Y,2780 Reservoir Avenue,"2780 Reservoir Avenue,NYC,NY,10468"
4,Brooklyn,P.S./I.S.30 Mary White Ovington - Brooklyn,7002 4 Avenue,11209,3,NY,Y,7002 4th Avenue,"7002 4th Avenue,NYC,NY,11209"
...,...,...,...,...,...,...,...,...,...
59,Bronx,I.S. 201 - Bronx,730 Bryant Avenue,10474,2,NY,Y,730 Bryant Avenue,"730 Bryant Avenue,NYC,NY,10474"
60,Bronx,P.S. 46 - Bronx,2760 Briggs Avenue,10458,2,NY,Y,2760 Briggs Avenue,"2760 Briggs Avenue,NYC,NY,10458"
61,Long Island City,Aviation HS - Q,45-30 36 Street,11101,4,NY,Y,45-30 36th Street,"45-30 36th Street,NYC,NY,11101"
62,Bronx,P.S. 102 - Bronx,1827 Archer Street,10460,2,NY,Y,1827 Archer Street,"1827 Archer Street,NYC,NY,10460"


In [117]:
from tqdm.notebook import tqdm

tqdm.pandas()

locator = Nominatim(user_agent='spatialthoughts', timeout=10)
geocode = RateLimiter(locator.geocode, min_delay_seconds=1)

address_df['location'] = address_df['Full_Address'].progress_apply(geocode)
address_df

  0%|          | 0/63 [00:00<?, ?it/s]

Unnamed: 0,EC_Name,Full_Address,location
0,J.H.S. 185 - Queens,"147-26 25th Drive,NYC,NY,11354","(147-26, 25th Drive, Linden Hill, Queens, City..."
1,I.S. 258 - Brooklyn,"141 Macon Street,NYC,NY,11216","(Junior High School 258, 141, Macon Street, Be..."
2,I.S. 88 - Brooklyn,"544 7th Avenue,NYC,NY,11215","(544, 7th Avenue, South Slope, Sunset Park, Ki..."
3,Walton HS - X,"2780 Reservoir Avenue,NYC,NY,10468","(Celia Cruz High School of Music, 2780, Reserv..."
4,P.S./I.S.30 Mary White Ovington - Brooklyn,"7002 4th Avenue,NYC,NY,11209","(7002, 4th Avenue, Bay Ridge, Brooklyn, Kings ..."
...,...,...,...
59,I.S. 201 - Bronx,"730 Bryant Avenue,NYC,NY,10474","(730, Bryant Avenue, Hunts Point, Bronx County..."
60,P.S. 46 - Bronx,"2760 Briggs Avenue,NYC,NY,10458","(2760, Briggs Avenue, Bedford Park, Bronx Coun..."
61,Aviation HS - Q,"45-30 36th Street,NYC,NY,11101","(45-30, 36th Street, Sunnyside, Queens, City o..."
62,P.S. 102 - Bronx,"1827 Archer Street,NYC,NY,10460","(Public School 102, 1827, Archer Street, Bronx..."


In [118]:
address_df['latitude'] = address_df['location'].apply(lambda loc: loc.latitude if loc else None)
address_df['longitude'] = address_df['location'].apply(lambda loc: loc.longitude if loc else None)
address_df

Unnamed: 0,EC_Name,Full_Address,location,latitude,longitude
0,J.H.S. 185 - Queens,"147-26 25th Drive,NYC,NY,11354","(147-26, 25th Drive, Linden Hill, Queens, City...",40.774893,-73.818642
1,I.S. 258 - Brooklyn,"141 Macon Street,NYC,NY,11216","(Junior High School 258, 141, Macon Street, Be...",40.681918,-73.945597
2,I.S. 88 - Brooklyn,"544 7th Avenue,NYC,NY,11215","(544, 7th Avenue, South Slope, Sunset Park, Ki...",40.660181,-73.987931
3,Walton HS - X,"2780 Reservoir Avenue,NYC,NY,10468","(Celia Cruz High School of Music, 2780, Reserv...",40.870595,-73.897498
4,P.S./I.S.30 Mary White Ovington - Brooklyn,"7002 4th Avenue,NYC,NY,11209","(7002, 4th Avenue, Bay Ridge, Brooklyn, Kings ...",40.633625,-74.024038
...,...,...,...,...,...
59,I.S. 201 - Bronx,"730 Bryant Avenue,NYC,NY,10474","(730, Bryant Avenue, Hunts Point, Bronx County...",40.816040,-73.885264
60,P.S. 46 - Bronx,"2760 Briggs Avenue,NYC,NY,10458","(2760, Briggs Avenue, Bedford Park, Bronx Coun...",40.867729,-73.890294
61,Aviation HS - Q,"45-30 36th Street,NYC,NY,11101","(45-30, 36th Street, Sunnyside, Queens, City o...",40.743475,-73.929464
62,P.S. 102 - Bronx,"1827 Archer Street,NYC,NY,10460","(Public School 102, 1827, Archer Street, Bronx...",40.838106,-73.865790


In [119]:
failed = address_df[address_df['location'].isna()]
failed

Unnamed: 0,EC_Name,Full_Address,location,latitude,longitude


In [None]:
address_df = address_df[~address_df['location'].isna()]
address_df = address_df[['EC_Name', 'Full_Address', 'latitude', 'longitude']]
address_df.rename(columns = {'EC_Name': 'Name', 'Full_Address': 'Address'}, inplace=True)

In [126]:
geometry = gpd.points_from_xy(address_df.longitude, address_df.latitude)
address_gdf = gpd.GeoDataFrame(address_df, crs='EPSG:4326', geometry=geometry)
address_gdf

Unnamed: 0,Name,Address,latitude,longitude,geometry
0,J.H.S. 185 - Queens,"147-26 25th Drive,NYC,NY,11354",40.774893,-73.818642,POINT (-73.81864 40.77489)
1,I.S. 258 - Brooklyn,"141 Macon Street,NYC,NY,11216",40.681918,-73.945597,POINT (-73.94560 40.68192)
2,I.S. 88 - Brooklyn,"544 7th Avenue,NYC,NY,11215",40.660181,-73.987931,POINT (-73.98793 40.66018)
3,Walton HS - X,"2780 Reservoir Avenue,NYC,NY,10468",40.870595,-73.897498,POINT (-73.89750 40.87059)
4,P.S./I.S.30 Mary White Ovington - Brooklyn,"7002 4th Avenue,NYC,NY,11209",40.633625,-74.024038,POINT (-74.02404 40.63362)
...,...,...,...,...,...
59,I.S. 201 - Bronx,"730 Bryant Avenue,NYC,NY,10474",40.816040,-73.885264,POINT (-73.88526 40.81604)
60,P.S. 46 - Bronx,"2760 Briggs Avenue,NYC,NY,10458",40.867729,-73.890294,POINT (-73.89029 40.86773)
61,Aviation HS - Q,"45-30 36th Street,NYC,NY,11101",40.743475,-73.929464,POINT (-73.92946 40.74347)
62,P.S. 102 - Bronx,"1827 Archer Street,NYC,NY,10460",40.838106,-73.865790,POINT (-73.86579 40.83811)


In [127]:
m = leafmap.Map(width=800, height=500)
m.add_gdf(address_gdf, layer_name='Shelters', style={'color':'blue', 'weight':0.5})
m.zoom_to_gdf(address_gdf)
m

In [128]:
output_file = 'hurricane_evacuation_centers.shp'
output_path = os.path.join(output_folder, output_file)

address_gdf.to_file(filename=output_path)