# GP13 GeoPandas
GeoPandas Full Documentation: https://geopandas.org/en/stable/docs.html

Spatial Reference: https://spatialreference.org/

GeoPy Documentation: https://geopy.readthedocs.io/en/stable/
___
## 19. Geocoding
* Often, pre-processing of datasets are required as geographic information such as location latitude and longtitude are missing.
* Geocoding is a process of taking a text-based description of a location (such as an address) and returning geographic coordinates, frequently latitude/longitude pair, to identify a location on the Earth's surface.
* GeoPy (a python geocoding toolbox) can be used to to locate the coordinates of addresses, cities, countries, and landmarks across the globe using third-party geocoders and other data sources.
* GeoPy uses various geocoding services (see below) including Google Maps, Azure Maps, etc. Some require API keys. In this exercise, the OpenStreetMap geocoding service will be illustrated.


<img src="https://geopy.readthedocs.io/en/stable/_static/geopy_and_geocoding_services.svg"></img>

In [1]:
# All libraries import
import pandas as pd
import geopandas as gpd
import geopy
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
import matplotlib.pyplot as plt
import folium
from folium.plugins import FastMarkerCluster

### Single Address Geocoding
* Geographical coordinates corresponding to the search location will be provided.
* **IMPORTANT NOTE**: For the method Nominatim(user_agent=''), create and use any unique name (see below).

In [2]:
# Method One: Specific address + the user agent name as test1
a = '128 Uam-ro, Samseong-dong, Dong-gu, Daejeon'
geolocator = Nominatim(user_agent='test1')
location = geolocator.geocode(a, timeout=10, exactly_one=False)
print(location)

[Location(솔브릿지국제경영대학, 128, 우암로, 삼성동, 동구, 대전, 34613, 대한민국, (36.338458700000004, 127.43247014647005, 0.0))]


In [12]:
# # Method Two: Input a landmark name + Print full address + Print full coordinates.
locator = Nominatim(user_agent='test2')
location = locator.geocode("Gwanghwamun, Seoul, South Korea")
print(location.address)
print("Latitude = {}, Longitude = {}".format(location.latitude, location.longitude))

광화문, 172, 세종대로, 중학동, 종로1·2·3·4가동, 종로구, 서울, 03141, 대한민국
Latitude = 37.5716162, Longitude = 126.9769026


### Geocoding with Pandas DataFrame
* Load example dataset file: 'localdata/sweden_addresses.csv'
* Display data rows.

In [3]:
df = pd.read_csv("localdata/sweden_addresses.csv")
df.head(3)

Unnamed: 0,Typ,Nr,Namn,Address1,Address3,Address4,Address5,Telefon
0,Butik,102,Fältöversten,Karlaplan 13,115 20,STOCKHOLM,Stockholms län,08/662 22 89
1,Butik,104,,Nybrogatan 47,114 39,STOCKHOLM,Stockholms län,08/662 50 16
2,Butik,106,Garnisonen,Karlavägen 100 A,115 26,STOCKHOLM,Stockholms län,08/662 64 85


* Create a new column based on the address1 to 5 columns.

In [4]:
df['ADDRESS'] = df['Address1'].astype(str) + ', ' + df['Address3'] + ', ' + df['Address4'] + ', ' + df['Address5'] + ', Sweden'   
df.head(3)

Unnamed: 0,Typ,Nr,Namn,Address1,Address3,Address4,Address5,Telefon,ADDRESS
0,Butik,102,Fältöversten,Karlaplan 13,115 20,STOCKHOLM,Stockholms län,08/662 22 89,"Karlaplan 13, 115 20, STOCKHOLM, Stockholms lä..."
1,Butik,104,,Nybrogatan 47,114 39,STOCKHOLM,Stockholms län,08/662 50 16,"Nybrogatan 47, 114 39, STOCKHOLM, Stockholms l..."
2,Butik,106,Garnisonen,Karlavägen 100 A,115 26,STOCKHOLM,Stockholms län,08/662 64 85,"Karlavägen 100 A, 115 26, STOCKHOLM, Stockholm..."


* Apply the geocoder to the addresses in the newly created column, ['ADDRESS']. New columns, ['location'] and ['point'] will be created.

In [None]:
from geopy.extra.rate_limiter import RateLimiter
locator = Nominatim(user_agent='test3')
geocode = RateLimiter(locator.geocode, min_delay_seconds=1)
df['location'] = df['ADDRESS'].apply(geocode)
df['point'] = df['location'].apply(lambda loc: tuple(loc.point) if loc else None)
df.head(3)

* Split the column, ['point'] into latitude, longitude and altitude columns

In [7]:
df[['latitude', 'longitude', 'altitude']] = pd.DataFrame(df['point'].tolist(), index=df.index)
df.head(3)

Unnamed: 0,Typ,Nr,Namn,Address1,Address3,Address4,Address5,Telefon,ADDRESS,location,point,latitude,longitude,altitude
0,Butik,102,Fältöversten,Karlaplan 13,115 20,STOCKHOLM,Stockholms län,08/662 22 89,"Karlaplan 13, 115 20, STOCKHOLM, Stockholms lä...","(13, Karlaplan, Östermalm, Norra Innerstaden, ...","(59.3388767, 18.0908655, 0.0)",59.338877,18.090865,0.0
1,Butik,104,,Nybrogatan 47,114 39,STOCKHOLM,Stockholms län,08/662 50 16,"Nybrogatan 47, 114 39, STOCKHOLM, Stockholms l...","(47, Nybrogatan, Villastaden, Östermalm, Norra...","(59.3372072, 18.0790982, 0.0)",59.337207,18.079098,0.0
2,Butik,106,Garnisonen,Karlavägen 100 A,115 26,STOCKHOLM,Stockholms län,08/662 64 85,"Karlavägen 100 A, 115 26, STOCKHOLM, Stockholm...","(Karlavägen, Östermalm, Norra Innerstaden, Sto...","(59.3354138, 18.1004767, 0.0)",59.335414,18.100477,0.0


In [8]:
# Column names check
df.columns

Index(['Typ', 'Nr', 'Namn', 'Address1', 'Address3', 'Address4', 'Address5',
       'Telefon', 'ADDRESS', 'location', 'point', 'latitude', 'longitude',
       'altitude'],
      dtype='object')

In [9]:
# Drop the unnecessary coloumns
df = df.drop(['Address1', 'Address3', 'Address4', 'Address5','Telefon', 'ADDRESS', 'location', 'point'], axis=1)
df.head(3)

Unnamed: 0,Typ,Nr,Namn,latitude,longitude,altitude
0,Butik,102,Fältöversten,59.338877,18.090865,0.0
1,Butik,104,,59.337207,18.079098,0.0
2,Butik,106,Garnisonen,59.335414,18.100477,0.0


In [11]:
# Check the number of missing values in the latitude column
df.latitude.isnull().sum()

4

In [12]:
# Drop the missing values via the .notnull() method + recheck
df = df[pd.notnull(df["latitude"])]
df.latitude.isnull().sum()

0

In [19]:
# Create Folium map + Apply the data points (latitude and longtitude) onto the map with circle markers
m1 = folium.Map(location=[59.33,18.05], tiles='cartodbpositron', zoom_start=12.5)
df.apply(lambda row:folium.CircleMarker(location=[row["latitude"], row["longitude"]]).add_to(m1), axis=1)
m1

In [None]:
# Save and export if required
# m1.save("sweden_addresses1.html")

In [20]:
# Create Folium map + Apply the data points (latitude and longtitude) onto the map as cluster placement
m2 = folium.Map(location=[59.33,18.05], zoom_start=12, tiles='CartoDB dark_matter')
FastMarkerCluster(data=list(zip(df['latitude'].values, df['longitude'].values))).add_to(m2)
folium.LayerControl().add_to(m2)
m2

In [None]:
# Save and export if required
# m2.save("sweden_addresses2.html")

___
## 20. Reverse Geocoding
* Reverse geocoding is the process of converting geographical coordinates to a human-readable address or place name.

In [22]:
# Futher import of libraries
import plotly_express as px
import tqdm
from tqdm import tqdm
from tqdm.notebook import tqdm_notebook

### Single Set of Coordinates Reverse Geocoding

In [24]:
# Method .reverse() with a set of provided coordinates
coor = "36.338458700000004, 127.43247014647005"
locator = Nominatim(user_agent='test3')
loc1 = locator.reverse(coor)
loc1

Location(솔브릿지국제경영대학, 우암로, 삼성동, 동구, 대전, 34611, 대한민국, (36.338327750000005, 127.43243066372395, 0.0))

In [25]:
# Detailed information of the reversed location finding
loc1.raw

{'place_id': 228454021,
 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright',
 'osm_type': 'way',
 'osm_id': 108332744,
 'lat': '36.338327750000005',
 'lon': '127.43243066372395',
 'class': 'amenity',
 'type': 'university',
 'place_rank': 30,
 'importance': 9.99999999995449e-06,
 'addresstype': 'amenity',
 'name': '솔브릿지국제경영대학',
 'display_name': '솔브릿지국제경영대학, 우암로, 삼성동, 동구, 대전, 34611, 대한민국',
 'address': {'amenity': '솔브릿지국제경영대학',
  'road': '우암로',
  'quarter': '삼성동',
  'suburb': '삼성동',
  'borough': '동구',
  'city': '대전',
  'ISO3166-2-lvl4': 'KR-30',
  'postcode': '34611',
  'country': '대한민국',
  'country_code': 'kr'},
 'boundingbox': ['36.3380853', '36.3388146', '127.4319994', '127.4327213']}

In [28]:
# Address printing in either of the methods shown below:
print(loc1.address)
loc1.raw["display_name"]

솔브릿지국제경영대학, 우암로, 삼성동, 동구, 대전, 34611, 대한민국


'솔브릿지국제경영대학, 우암로, 삼성동, 동구, 대전, 34611, 대한민국'

### Reverse Geocoding with Pandas

In [38]:
# Load sample dataset with the selected columns
cols = ["X", "Y", "POLE_NUM","TYPE","HEIGHT","POLE_DATE","OWNER"]
df = pd.read_csv('localdata/pole_sample.csv', usecols=cols)
df.head(3)

Unnamed: 0,X,Y,POLE_NUM,TYPE,HEIGHT,POLE_DATE,OWNER
0,-75.170097,39.942766,214423,WP,,1997-06-09T00:00:00.000Z,PECO
1,-75.166112,39.941477,215645,AAPT,25.0,1997-06-10T00:00:00.000Z,Streets
2,-75.163483,39.943068,215926,WP,,1997-06-04T00:00:00.000Z,PECO


In [30]:
# Number of rows and columns check
df.shape

(150, 7)

In [35]:
# Plotly scatter mapbox plot
px.scatter_mapbox(df, lat="Y", lon="X",  zoom=15)

In [39]:
# Create a new column to show the combined coordinates
df["GEOM"] =  df["Y"].map(str) + ', ' + df['X'].map(str)
df.head(3)

Unnamed: 0,X,Y,POLE_NUM,TYPE,HEIGHT,POLE_DATE,OWNER,GEOM
0,-75.170097,39.942766,214423,WP,,1997-06-09T00:00:00.000Z,PECO,"39.9427660880249, -75.17009743393821"
1,-75.166112,39.941477,215645,AAPT,25.0,1997-06-10T00:00:00.000Z,Streets,"39.9414773141344, -75.166112027818"
2,-75.163483,39.943068,215926,WP,,1997-06-04T00:00:00.000Z,PECO,"39.9430681055253, -75.1634826347411"


In [43]:
# Apply reverse geocoding to the coordinates in the ['GEOM'] columns + Address output into the new ['ADDRESS'] column
# Note: The process may require some time. tqdm displays the progression.
locator = Nominatim(user_agent='test4')
rgeocode = RateLimiter(locator.reverse, min_delay_seconds=0.001)

tqdm.pandas()
df['ADDRESS'] = df['GEOM'].progress_apply(rgeocode)
df.head(3)

100%|██████████| 150/150 [01:16<00:00,  1.95it/s]


Unnamed: 0,X,Y,POLE_NUM,TYPE,HEIGHT,POLE_DATE,OWNER,GEOM,ADDRESS
0,-75.170097,39.942766,214423,WP,,1997-06-09T00:00:00.000Z,PECO,"39.9427660880249, -75.17009743393821","(720, South Chadwick Street, South Philadelphi..."
1,-75.166112,39.941477,215645,AAPT,25.0,1997-06-10T00:00:00.000Z,Streets,"39.9414773141344, -75.166112027818","(Tindley Temple United Methodist Church, 750, ..."
2,-75.163483,39.943068,215926,WP,,1997-06-04T00:00:00.000Z,PECO,"39.9430681055253, -75.1634826347411","(621, South 13th Street, Martin Luther King Pl..."


In [42]:
# Full address print
df['ADDRESS'][0]

Location(720, South Chadwick Street, South Philadelphia, Philadelphia, Pennsylvania, 19146, United States, (39.94280445, -75.1702057628465, 0.0))

___