# Clustering of neighborhoods in Toronto, Canada

### This notebook will try to scrape the following Wikipedia page: 
[https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M]

First we start by importing the important libraries we will need

In [1]:
import pandas as pd
import numpy as np
print("Library importing successful!")

Library importing successful!


Next, we feed the webpage's url into the pandas library web scraper and have a quick look at the first 5 rows of the dataframe

In [2]:
url= "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
dfs=pd.read_html(url)
df1=dfs[:][0]
df1.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


After reading the wikipedia page content into a dataframe (dfs), we then start cleaning the data from the "Not assigned" missing data

In [3]:
dfAssigned=df1[df1['Borough']!='Not assigned'].reset_index(drop=True)
dfAssigned.head(165)

print(" this shows the portion of the dataframe that has a not assigned neighbourhood",dfAssigned[dfAssigned['Neighbourhood']=='Not Assigned'].count())
dfAssigned.head(10)

 this shows the portion of the dataframe that has a not assigned neighbourhood Postal Code      0
Borough          0
Neighbourhood    0
dtype: int64


Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


After making sure the 'not assigned' data is excluded, let's get the shape of the remaining dataframe (the available neighborhood data)

In [4]:
dfAssigned.shape[0]

103


Now, we need to add a column containing latitude and longitude of each neighborhood

### Latitude and longitude extraction from Geospatial data

Due to the unreliability of geocoder, we will use ready geospatial data

In [6]:
url2='http://cocl.us/Geospatial_data'
df4=pd.read_csv(url2)
df4.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Now, we have every Postal Code with the corresponding latitude and longitude

Let's crate an empty dataframe and fill it with the neighbourhood information as well as add the latitude and longitude data

In [7]:
column_names = ['Postal Code', 'Borough', 'Neighborhood', 'Latitude','Longitude']

neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude


Next, we loop on the geospatial data and use postal code to combine the 2 dataframes

In [10]:

for index, row in dfAssigned.iterrows():
    PostalCode=row['Postal Code']
    Borough=row['Borough']
    Neighborhood = row['Neighbourhood']
    desiredRowDF4=df4[df4['Postal Code'] ==PostalCode].reset_index()
    Longitude =desiredRowDF4.at[0,'Longitude']
    #desiredRowDF4.at[0,'Latitude' #=Latitude
    Latitude = desiredRowDF4.at[0,'Latitude']
    neighborhoods=neighborhoods.append({'Postal Code':PostalCode,
                                        'Borough':Borough,
                                        'Neighborhood':Neighborhood,
                                        'Latitude':Latitude,
                                        'Longitude':Longitude},ignore_index=True)

neighborhoods.head(12)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
