<h1>Combine with Latitude and Longitude using pgeocode</h1>

My host machine has problem querying geocode server therefore I found this alternative library that supports offline Postal Code to GPS Coordinates translation. 
Here is the link to its documentation: <a href="https://pypi.org/project/pgeocode/#description"> pgeocode 0.2.1</a>

#### Import libraries 

In [1]:
import pgeocode
import pandas as pd

#### Read the csv file scraped from wikipedia

In [2]:
df = pd.read_csv('postal_code_CAN.csv')
df.drop(['Unnamed: 0'], axis=1, inplace=True)

In [3]:
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M9A,Etobicoke,Islington Avenue


#### Experiment with pgeocode:

In [4]:
#set country to be canada - CA
n = pgeocode.Nominatim('CA')
#test with PostalCode M3A
n.query_postal_code('M3A')

postal_code                                                     M3A
country code                                                     CA
place_name        North York (York Heights / Victoria Village / ...
state_name                                                  Ontario
state_code                                                       ON
county_name                                             North York 
county_code                                                     NaN
community_name                                                  NaN
community_code                                                  NaN
latitude                                                    43.7545
longitude                                                    -79.33
accuracy                                                          1
Name: 0, dtype: object

In [5]:
print(n.query_postal_code('M3A')[9],n.query_postal_code('M3A')[10])

43.7545 -79.33


corresponding latitude and longitude are stored in the 9th and 10th of the returned list

In [6]:
#create two lists for storing latitude and longitude
lat = []
lon = []

#set country to be canada - CA
nomi = pgeocode.Nominatim('CA')

#query all postal codes from data frame
for pc in df['PostalCode']:
    lat.append(nomi.query_postal_code(pc)[9])
    lon.append(nomi.query_postal_code(pc)[10])
    
#append two new columns to the data frame
df['Latitude'] = lat
df['Longitude'] = lon

#### Compare results with the csv file provided by the course:

In [7]:
#display the data frame, sort the csv so it is in the same order as the provided file
df=df.sort_values(by=['PostalCode'], ignore_index=True)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.8113,-79.193
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.7878,-79.1564
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.7678,-79.1866
3,M1G,Scarborough,Woburn,43.7712,-79.2144
4,M1H,Scarborough,Cedarbrae,43.7686,-79.2389


In [8]:
#read in the csv provided by the course
df1 = pd.read_csv('Geospatial_Coordinates.csv')
df1 = df1.sort_values(by=['Postal Code'])

#insert the two rows to the new data frame and display it
df1.insert(1,'Borough',df['Borough'])
df1.insert(2,'Neighborhood',df['Neighborhood'])
df1.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


There are some variance. For this project, I will use the provide file so the dataframe I use is <b>df1</b>

In [9]:
#export the second data frame
df1.to_csv('Postal_CAN_LATLON.csv')