# <center>Neighborhoods in Toronto

As the part of this project we will explore, segment, and cluster the neighborhoods in the city of Toronto based on the postalcode and borough information.
#### Note: All the part of project is mentioned in this single Notebook.

## <center> Part 1 : Data Importing and Pre- Processing

First we import required libraries.

In [1]:
import pandas as pd
import numpy as np

#!conda config --set channel_priority false    #bypass issue: failed with initial frozen solve. Retrying with flexible solve
#!conda install -c conda-forge folium=0.5.0 --yes   # uncomment you your environment dont have folium

import folium   # reqire for map renedering
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors



We will scrape the data from Wiki for Torronto postal codes using Pandas read_html function.It will convert html data into a datframe.
<br> Link: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

In [2]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

TRT = pd.read_html(url, header=0)[0]                    #Read HTML tables into a list of DataFrame objects
print('The Data shpe is {}'.format(TRT.shape))
TRT.head()


The Data shpe is (180, 3)


Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


### Data Cleaning and Pre- Processing

77 postal codes are not assigned to any Borough. So we will drop these rows and gets a cleaned data.


In [3]:
TRT['Borough'].value_counts()

Not assigned        77
North York          24
Downtown Toronto    19
Scarborough         17
Etobicoke           12
Central Toronto      9
West Toronto         6
York                 5
East York            5
East Toronto         5
Mississauga          1
Name: Borough, dtype: int64

In [4]:
TRT1 = TRT[TRT.Borough != 'Not assigned']   #dropping cells with a borough that is Not assigned
TRT1.shape

(103, 3)

There are more than one neighborhood in one postal code area, We need to combine these in "Neighbourhood".

In [5]:
TRT1 = TRT1.groupby(['Postal Code','Borough'], sort=False).agg(', '.join)
TRT1.reset_index(inplace=True)
print(TRT1.shape)
TRT1.head()

(103, 3)


Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


Incase any cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. Although this doesn't have such case, But below script assign borough name as neighborhood.

In [6]:
TRT1['Neighbourhood'] = np.where(TRT1['Neighbourhood'] == 'Not assigned',TRT1['Borough'], TRT1['Neighbourhood'])
print((TRT1['Neighbourhood'] == 'Not assigned').value_counts())


False    103
Name: Neighbourhood, dtype: int64


In [7]:
print('Shape of final Data is :',TRT1.shape)

Shape of final Data is : (103, 3)


## <center> Part: 2 Appending latitude and the longitude coordinates
Due to unreliablilty of Geocoder Python package, we will use the provided link for a csv file that has the geographical coordinates of Torronto's each postal code.
<br>Link: http://cocl.us/Geospatial_data

In [8]:
Cords = pd.read_csv('https://cocl.us/Geospatial_data')
print(Cords.shape)
Cords.head()

(103, 3)


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


We need to merge the coordinates and Postal Code data for the further analysis

In [9]:
TRT2 = pd.merge(TRT1,Cords,on='Postal Code')
TRT2.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


## <Center> Part3: Data Exploration and Cluster Formation

We will work with boroughs that contain the word Toronto, Hence we have to filter the records.

In [10]:
TRT2 = TRT2[TRT2['Borough'].str.contains('Toronto',regex=False)]
print(TRT2.shape)
TRT2

(39, 5)


Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
19,M4E,East Toronto,The Beaches,43.676357,-79.293031
20,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
24,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
25,M6G,Downtown Toronto,Christie,43.669542,-79.422564
30,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
31,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259


Now we will map these points on folium map

In [11]:
map_TRT = folium.Map(location=[43.651070,-79.347015],zoom_start=10)  #Toronto, ON, Canada Lat Long Coordinates Info

for lat,lng,borough,neighbourhood in zip(TRT2['Latitude'],TRT2['Longitude'],TRT2['Borough'],TRT2['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='steelblue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_TRT)
map_TRT

### Cluster Formation
We will create cluster based on Borough division.
1. First we drop the Borough information.
2. Divide cluster based upon Latitude and Longitude information.
3. Analyze data appending with cluster labels
4. Define these cluster on Geographical map.

In [12]:
k=4
toronto_clustering = TRT2.drop(['Postal Code','Borough','Neighbourhood'],1)
kmeans = KMeans(n_clusters = k,random_state=0).fit(toronto_clustering)
kmeans.labels_
TRT2.insert(0, 'Cluster Labels', kmeans.labels_)

In [13]:
TRT2

Unnamed: 0,Cluster Labels,Postal Code,Borough,Neighbourhood,Latitude,Longitude
2,3,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,3,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
9,3,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
15,3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
19,0,M4E,East Toronto,The Beaches,43.676357,-79.293031
20,3,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
24,3,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
25,1,M6G,Downtown Toronto,Christie,43.669542,-79.422564
30,3,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
31,1,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259


In [14]:
# create map

map_TRT2 = folium.Map(location=[43.651070,-79.347015],zoom_start=10)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, neighbourhood, cluster in zip(TRT2['Latitude'], TRT2['Longitude'], TRT2['Neighbourhood'], TRT2['Cluster Labels']):
    label = folium.Popup(' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_TRT2)
       
map_TRT2

## <center>Thank you