# Segmenting and Clustering Neighborhoods in Toronto

### This notebook involves the data exploration, segmentation & clustering of neighborhoods in the city of Toronto. The neighborhood data would be scraped from wikepedia, <a href=https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M>List of postal codes of Canada </a>  for segmenting & clustering neighborhoods in Toronto using Beautiful soup.

In [1]:
#Import the required libraries 
import numpy as np
import pandas as pd
import matplotlib.pyplot as ply
import urllib.request   # use to open URLs
from bs4 import BeautifulSoup  # use to fetch data from wikipedia / webpages
!pip install folium;
import folium



In [2]:
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
neigbor_toronto= urllib.request.urlopen(url)

In [3]:
#Parse the HTML from url into Beatiful soup tree format
Soup=BeautifulSoup(neigbor_toronto,'lxml')
Soup

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>List of postal codes of Canada: M - Wikipedia</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"3b2f57d6-ac83-4a5f-9af0-bc9556041a9e","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":968796333,"wgRevisionId":968796333,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with short description","Communications in Ontario","Postal codes in Canada","Toronto","Ontario-relat

In [4]:
#Lets explore soup functions
Soup.title.string

'List of postal codes of Canada: M - Wikipedia'

In [5]:
#Find the postal code table
postalcode_table=Soup.find('table',class_='wikitable sortable')
postalcode_table

<table class="wikitable sortable">
<tbody><tr>
<th>Postal Code
</th>
<th>Borough
</th>
<th>Neighbourhood
</th></tr>
<tr>
<td>M1A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M2A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3A
</td>
<td>North York
</td>
<td>Parkwoods
</td></tr>
<tr>
<td>M4A
</td>
<td>North York
</td>
<td>Victoria Village
</td></tr>
<tr>
<td>M5A
</td>
<td>Downtown Toronto
</td>
<td>Regent Park, Harbourfront
</td></tr>
<tr>
<td>M6A
</td>
<td>North York
</td>
<td>Lawrence Manor, Lawrence Heights
</td></tr>
<tr>
<td>M7A
</td>
<td>Downtown Toronto
</td>
<td>Queen's Park, Ontario Provincial Government
</td></tr>
<tr>
<td>M8A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M9A
</td>
<td>Etobicoke
</td>
<td>Islington Avenue, Humber Valley Village
</td></tr>
<tr>
<td>M1B
</td>
<td>Scarborough
</td>
<td>Malvern, Rouge
</td></tr>
<tr>
<td>M2B
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3B
</td>
<td

In [6]:
#Loop through the rows
A=[]
B=[]
C=[]

for row in postalcode_table.findAll('tr'):
    cells=row.findAll('td')
    if len(cells)==3:
        A.append(cells[0].find(text=True))
        B.append(cells[1].find(text=True))
        C.append(cells[2].find(text=True))
        

In [7]:
#Create a dataframe from html table
df=pd.DataFrame(A, columns=['PostalCode'])
df['Borough']=B
df['Neighborhood']=C
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [8]:
print('There are {} rows and {} columns in the dataset'.format(df.shape[0],df.shape[1]))

There are 180 rows and 3 columns in the dataset


In [9]:
df.Borough.unique()

array(['Not assigned\n', 'North York\n', 'Downtown Toronto\n',
       'Etobicoke\n', 'Scarborough\n', 'East York\n', 'York\n',
       'East Toronto\n', 'West Toronto\n', 'Central Toronto\n',
       'Mississauga\n'], dtype=object)

In [11]:
df.Borough.head()

0        Not assigned

1        Not assigned

2          North York

3          North York

4    Downtown Toronto

Name: Borough, dtype: object

#### Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.

In [12]:
df=df[~df['Borough'].str.strip().isin(['Not assigned'])].reset_index()
df=df.drop('index',axis=1)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [13]:
#check if they are any Not assignged values in Neighborhood variable
df['Neighborhood'].str.strip().isin(['Not assigned']).sum()

0

In [14]:
print('There are {} rows and {} columns in the dataset now'.format(df.shape[0],df.shape[1]))

There are 103 rows and 3 columns in the dataset now


## Now get the longitude & latitude coordinates of each neighborhood

In [15]:
df_latlng=pd.read_csv('http://cocl.us/Geospatial_data')
df_latlng=df_latlng.rename(columns={'Postal Code':'PostalCode'})
df_latlng.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [16]:
#Remove whitespaces in postalcode column
df['PostalCode']=df['PostalCode'].str.strip()
df_latlng['PostalCode']=df_latlng['PostalCode'].str.strip()

In [17]:
df=pd.merge(df,df_latlng,on='PostalCode')
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [18]:
#check nan values if they are any
df.isnull().sum()

PostalCode      0
Borough         0
Neighborhood    0
Latitude        0
Longitude       0
dtype: int64

## Explore & Cluster neighborhoods in Toronto

In [39]:
#Create a function to assign a unique color to each borough
def boroughcolors(color):
    if color['Borough'] == 'Downtown Toronto\n':
        return 'blue'
    elif color['Borough'] == 'East Toronto\n':
        return 'red'
    elif color['Borough'] == 'West Toronto\n':
        return 'green'
    elif color['Borough'] == 'Central Toronto\n':
        return 'orange'
    else:
        return 'pink'
    
df['colors']=df.apply(boroughcolors,axis=1)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,colors
0,M3A,North York,Parkwoods,43.753259,-79.329656,pink
1,M4A,North York,Victoria Village,43.725882,-79.315572,pink
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,blue
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,pink
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,blue


In [40]:
locations=df[['Latitude','Longitude']]
locationlist=locations.values.tolist()
locationlist[0]

[43.7532586, -79.3296565]

### Create a map of neighborhoods centered around Downtown Toronto

In [41]:
df_downtown=df[df['Borough'].str.strip()=='Downtown Toronto']
df_downtown.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,colors
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,blue
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,blue
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,blue
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,blue
20,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,blue


In [42]:
#Create a map of Downtown Toronto using latitude & longitude values

map_downtown=folium.Map(location=locationlist[2],zoom_level=15)

#add markers to map
for value in range(0,len(locationlist)):
    folium.Marker(
        locationlist[value],
        popup=df['Neighborhood'][value]+','+df['Borough'][value],
        icon=folium.Icon(color=df['colors'][value],icon_color='white',icon='male',angle=0,prefix='fa')).add_to(map_downtown)
map_downtown

### Create a map of neighborhoods centered around East Toronto

In [43]:
df_east=df[df['Borough'].str.strip()=='East Toronto']
df_east.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,colors
19,M4E,East Toronto,The Beaches,43.676357,-79.293031,red
41,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,red
47,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572,red
54,M4M,East Toronto,Studio District,43.659526,-79.340923,red
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558,red


In [44]:
#Create a map of East Toronto using latitude & longitude values

map_east=folium.Map(location=locationlist[19],zoom_level=15)

#add markers to map
for value in range(0,len(locationlist)):
    folium.Marker(
        locationlist[value],
        popup=df['Neighborhood'][value]+','+df['Borough'][value],
        icon=folium.Icon(color=df['colors'][value],icon_color='white',icon='male',angle=0,prefix='fa')).add_to(map_east)
map_east

### Create a map of neighborhoods centered around West Toronto

In [45]:
df_west=df[df['Borough'].str.strip()=='West Toronto']
df_west.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,colors
31,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259,green
37,M6J,West Toronto,"Little Portugal, Trinity",43.647927,-79.41975,green
43,M6K,West Toronto,"Brockton, Parkdale Village, Exhibition Place",43.636847,-79.428191,green
69,M6P,West Toronto,"High Park, The Junction South",43.661608,-79.464763,green
75,M6R,West Toronto,"Parkdale, Roncesvalles",43.64896,-79.456325,green


In [46]:
#Create a map of West Toronto using latitude & longitude values

map_west=folium.Map(location=locationlist[31],zoom_level=12)

#add markers to map
for value in range(0,len(locationlist)):
    folium.Marker(
        locationlist[value],
        popup=df['Neighborhood'][value]+','+df['Borough'][value],
        icon=folium.Icon(color=df['colors'][value],icon_color='white',icon='male',angle=0,prefix='fa')).add_to(map_west)
map_west

### Create a map of neighborhoods centered around Central Toronto

In [47]:
df_central=df[df['Borough'].str.strip()=='Central Toronto']
df_central.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,colors
61,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,orange
62,M5N,Central Toronto,Roselawn,43.711695,-79.416936,orange
67,M4P,Central Toronto,Davisville North,43.712751,-79.390197,orange
68,M5P,Central Toronto,"Forest Hill North & West, Forest Hill Road Park",43.696948,-79.411307,orange
73,M4R,Central Toronto,"North Toronto West, Lawrence Park",43.715383,-79.405678,orange


In [50]:
#Create a map of Central Toronto using latitude & longitude values

map_central=folium.Map(location=locationlist[61],zoom_level=12)

#add markers to map
for value in range(0,len(locationlist)):
    folium.Marker(
        locationlist[value],
        popup=df['Neighborhood'][value]+','+df['Borough'][value],
        icon=folium.Icon(color=df['colors'][value],icon_color='white',icon='male',angle=0,prefix='fa')).add_to(map_central)
map_central