# 01-03 Geocoding

Sometimes you might encounter a dataset that does not yet contain a geometry column or latitude and longitude information. However, you might have information on certain addresses. In that case, it might be useful to apply geocoding. 

__Geocoding__: refers to the process of taking an address and mapping it to a location on the Earth's surface. 

To perform Geocoding in Python, you can make use of the _Geopandas_ as well as _Geopy_

In [44]:
import pandas as pd
import geopandas as gpd
import geopy
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
from shapely.geometry import Point

import matplotlib.pyplot as plt
import plotly_express as px
import folium

# progress bar
import tqdm
from tqdm import tqdm
from tqdm._tqdm_notebook import tqdm_notebook

## 1. A simple example

Geopy has a number of Geocoding services you can choose from, usch as Google Maps, ArcGIS etc.. (NOTE: some ofthese need an API key!). A common choice is to use the Nominatim Geocoding service. This service is built on top of OpenStreetMap data. You can call this service in python using ``Nominatim(user_agent="myGeocoder")``

Let's say you want to get the location of the following address: 

_Soembastraat 53, 1782SM, Den Helder, Netherlands_

Then what you do is:

In [5]:
# Define the locator
locator = Nominatim(user_agent="myGeocoder")
address = 'Soembastraat 53, 1782SM, Den Helder, Netherlands'
location = locator.geocode(address)

# Then, you can print the coordinates:
print("Latitude = {}, Longitude = {}".format(location.latitude, location.longitude))

Latitude = 52.9581664, Longitude = 4.7528677


## 2. An example with multiple addresses

However, often it happens that you don't just have one  single address, but multiple addresses stored in a csv file.
For example, the _addresses.csv_ file that is in the Data folder, contains 25 different addresses.
In that case, you can do the following:

1. Create a column that contains addresses in the suitable format
2. Create the geocoder using the RateLimiter and adding a delay of 1 second
3. Apply the geocoder and add a location column
4. From the location column, get a tuple point column
5. From this tuple column, get new columns containing separate lat, lon and altitude

If the address is not found, it returns _None_.

In [27]:
# Let's say you have the following dataset:
df = pd.read_csv('../Data/addresses.csv')

## 1. ##
df['address'] = df['Address1']+','+df['Address3']+','+df['Address4']+','+df['Address5']+', Sweden'

## 2. ##
# convenient function to delay between geocoding calls. So now there is a delay of 1 second between each address.
geocode = RateLimiter(locator.geocode, min_delay_seconds=1)

## 3. ##
# create location column in the dataset, this is where the geographical information will go.
df['location'] = df['address'].apply(geocode)

## 4. ##
# create longitude, latitude and altitude from location column (returns tuple)
df['point'] = df['location'].apply(lambda loc: tuple(loc.point) if loc else None)

## 5. ##
# split point column into latitude, longitude and altitude columns
df[['latitude', 'longitude', 'altitude']] = pd.DataFrame(df['point'].tolist(), index=df.index)

In [28]:
# Now let's make a selection
df = df.drop(['Address1','Address3','Address4','Address5','Telefon','address','location','point'], axis=1)

# And drop the ones for which we didn't find addresses
df = df.dropna(subset=['latitude'])

## 3. Turning this normal DataFrame into a GeoDataFrame

Now we do have the latitude and longitude, but we want to turn this into a geodataframe which also contains a geometry column. To do that, we simply use ``gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.lon, df.lat))``, or alternatively, you can first assign the geometry to a variable and then add that to the ``GeoDataFrame()`` function.

In [48]:
# Method 1
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.longitude, df.latitude))
gdf.crs = {'init':'epsg:3395'}

# or Method 2
geometry = [Point(xy) for xy in zip(df.longitude, df.latitude)]
gdf2 = gpd.GeoDataFrame(df, geometry=geometry)
gdf2.crs = {'init' :'epsg:3395'}  

Then again, it's possible to plot the addresses we have found onto a map.

In [49]:
print(gdf.crs)
print(gdf2.crs)

{'init': 'epsg:3395'}
{'init': 'epsg:3395'}


## 4. Reverse GeoCoding

On the other hand, it might also be possible that we have coordinates but don't have addresses. In that case, we can use Reverse GeoCoding. Again, we create a Nominatim Geocoder, except now we use ``locator.reverse()``.

In [51]:
locator = Nominatim(user_agent="myGeocoder")
coordinates = "53.480837, -2.244914"
location = locator.reverse(coordinates)

print(location.address)

Eagle Insurance Buildings, 68, Cross Street, City Centre, Manchester, Greater Manchester, North West England, England, M2 4JG, United Kingdom
