 # Toronto Neighborhoods
 
 1. [Getting Data and Setting Up Dataframe](#Section_1)
 2. [Adding the Latitude and Longitude](#Section_2)

In [1]:
import numpy as np
import pandas as pd
import json
#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim
import requests
from pandas import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
!pip install folium
#!conda install -c conda-forge folium --yes -> this would not install, conflicts in the environment, so used !pip install instead. no problems
import folium

print('libraries imported')


libraries imported


<a id="Section_1"></a>
## 1. Getting Data and Setting Up Dataframe

Downloading the file and loading the data

In [2]:
url ='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

toronto_data = pd.read_html(url)

toronto_df = toronto_data[0]
toronto_df


    

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


Need to remove the rows where Borough is not assigned

In [3]:
cleaned_df = toronto_df[toronto_df.Borough!='Not assigned']
cleaned_df

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


If any Neighbourhoods are not assigned then replace the value with the Borough value

In [4]:
cleaned_df.Neighbourhood.replace('Not assigned', cleaned_df['Borough'])
cleaned_df.shape

(103, 3)

Find out if any Postal codes are repeated

In [5]:
print(len(cleaned_df['Postal Code'].unique()))

103


There are as many unique Postal Codes as there are rows in the dataframe so there are no repeated Postal Code entries

In [6]:
# reset the index
cleaned_df.reset_index(drop=True, inplace=True)
cleaned_df

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


In [7]:
cleaned_df.shape

(103, 3)

<a id = 'Section_2'></a>
## Adding the Latitude and Longitude

Could not get geocoder to return coordinates so used the csv file instead

In [8]:

latlon_df = pd.read_csv('http://cocl.us/Geospatial_data')
latlon_df

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


In [9]:
#are all the codes unique? Yes.
print(len(latlon_df['Postal Code'].unique()))

103


I decided to assume that the codes were the same in both dataframes, since they both have only unique codes and both have the same number of rows. I examined the spreadsheet and the sorted coord_df and the lists of Postal Codes were identical. 

In [10]:
# sort the cleaned_df by the Postal Codes so the order will match the latlon_df
coord_df = cleaned_df.sort_values(by='Postal Code', ignore_index = True)
coord_df

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
...,...,...,...
98,M9N,York,Weston
99,M9P,Etobicoke,Westmount
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ..."
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest..."


Now to add the Latitude and Longitude columns to the coord_df

In [11]:
coord_df['Latitude']= latlon_df['Latitude']
coord_df['Longitude'] = latlon_df['Longitude']
coord_df

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
98,M9N,York,Weston,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ...",43.688905,-79.554724
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.739416,-79.588437
