## <h1 align=center><font size = 7>Applied Data Science Capstone <br><font size = 5> Part 2: Segmenting and Clustering in Toronto <br> Adding latitude and longitude</font></h1>



For this part of the assignment it is required to get the latitude and longitude of the neighbourhoods given the postal code.

I will summarize the code used on the previous part to save time and space:

In [137]:
#Load Pandas
import pandas as pd

#Creation of the url containing the data:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

#Reading the url and storing it on a list df:
data_raw = pd.read_html(url)
df = data_raw[0]

#Filtering out the not assigned boroughs:
df = df[df["Borough"] != "Not assigned"].reset_index(drop = True)
df

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


I tried using the geocoder option, I wasn't able to get it to work using the code supplied. So I'm using the dataframe supplied.

In [138]:
#Importing the postal code dataframe
url2 = 'https://cocl.us/Geospatial_data'
postal_df = pd.read_csv(url2)

I verify the shape of the supplied dataframe is compatible with the data frame I got on the first part:

In [139]:
postal_df.shape

(103, 3)

Since the number of rows is the same as the dataframe I created on the first part, the dataframes can be combined.

In [140]:
postal_df

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


In order to add the latitude and longitude columns according to the postal code, I will first create empty columns on the `df` dataframe.

In [149]:
df['Latitude'] = ""
df['Longitude'] = ""
df

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,,
1,M4A,North York,Victoria Village,,
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",,
3,M6A,North York,"Lawrence Manor, Lawrence Heights",,
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",,
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",,
99,M4Y,Downtown Toronto,Church and Wellesley,,
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",,
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",,


With a place to add the coordinates on the `df` datframe. I will now use a loop to go through each row on the data frame, get the coordinates according to the postal code stored and add that value on the desired column.

Before using a loop, I will test the code with the first row of the `df` dataframe to make sure it is being constructed correctly:

In [151]:
#I will add on the 0 indexed row of the df dataframe, the value found after filtering the postal code column on the postal_df 
#and the df dataframe:
df['Latitude'][0] = postal_df[postal_df['Postal Code'] == df['Postal Code'][0]]['Latitude'].values[0]
df['Longitude'][0] = postal_df[postal_df['Postal Code'] == df['Postal Code'][0]]['Longitude'].values[0]
df

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7533,-79.3297
1,M4A,North York,Victoria Village,,
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",,
3,M6A,North York,"Lawrence Manor, Lawrence Heights",,
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",,
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",,
99,M4Y,Downtown Toronto,Church and Wellesley,,
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",,
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",,


For the first row everything seems to have worked fine. Now, I will use a loop to populate the latitude and longitude columns:

In [152]:
for row, postal_code in zip(df.index, df['Postal Code']):
    df['Latitude'][row] = postal_df[postal_df['Postal Code'] == postal_code]['Latitude'].values[0]
    df['Longitude'][row] = postal_df[postal_df['Postal Code'] == postal_code]['Longitude'].values[0]
df

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7533,-79.3297
1,M4A,North York,Victoria Village,43.7259,-79.3156
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6543,-79.3606
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.7185,-79.4648
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6623,-79.3895
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.6537,-79.5069
99,M4Y,Downtown Toronto,Church and Wellesley,43.6659,-79.3832
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.6627,-79.3216
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.6363,-79.4985


To make sure the coordinates were properly assigned, I will compare the postal code **M9V** from the sample dataframe. The **M9V** postal code on the sample corresponds to Etobicoke and coordinates 43.739415, -79.588437

In [191]:
df[df['Postal Code'] == 'M9V']

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
89,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.7394,-79.5884


The notebook I'm using is truncating the decimals places, so far I haven't been able to override this. However if I print only the values as a list, it will show the decimals are correct:

In [200]:
df[df['Postal Code'] == 'M9V']['Latitude'].values

array([43.739416399999996], dtype=object)