# Visualization of Toronto neighbourhoods 

Github has problems displaying the folium maps correctly. Therefore, you can look at the notebook via this link:

https://nbviewer.jupyter.org/github/petrKantek/Coursera_Capstone/blob/main/Torronto.ipynb

### First part - loading the neighbourhoods dataset

In [14]:
import pandas as pd
import numpy as np
%matplotlib inline

Lets load the dataset with neighbourhoods and their related information from a wikipedia site

In [2]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

df_hoods = pd.read_html( url )[0]
df_hoods

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


We can see there are boroughs and neighbourhoods that have 'Not assigned' values. We will get rid of them.

In [3]:
df_hoods = df_hoods[ df_hoods[ 'Borough' ] != "Not assigned" ]
df_hoods

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


Lets check we did it correctly and there are no 'Not assigned' values left both in Borough and Neighbourhood columns.

In [16]:
print( df_hoods[ df_hoods['Borough'] == "Not assigned" ].shape[0],
       df_hoods[ df_hoods['Neighbourhood'] == "Not assigned" ].shape[0]) 

0 0


Lastly, lets check the shape of the prepocessed dataset

In [5]:
df_hoods.shape

(103, 3)

### Second part - loading the geocoordinates

Since the API calls to get the geocoordinates are horribly slow and unstable, it is more efficient to load a csv file of already prepared data

In [6]:
geocoords = pd.read_csv( "data/Geospatial_Coordinates.csv" )
geocoords

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


We can now merge the geocoordinates with the neighbourhood dataset on the *Postal Code* column.

In [7]:
complete_data = df_hoods.set_index('Postal Code').join(geocoords.set_index('Postal Code'))
complete_data 

Unnamed: 0_level_0,Borough,Neighbourhood,Latitude,Longitude
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
M3A,North York,Parkwoods,43.753259,-79.329656
M4A,North York,Victoria Village,43.725882,-79.315572
M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...
M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


### Third part - clustering

In [8]:
from sklearn.cluster import KMeans

We will use the K-means algorithm to cluster the neighbourhoods.

In [9]:
cluster_data = complete_data.drop( ["Borough", "Neighbourhood"], 1)
kmeans = KMeans(n_clusters = 5, random_state=0).fit( cluster_data )

The resulting labels are the following.

In [10]:
kmeans.labels_

array([4, 4, 2, 0, 2, 1, 3, 4, 4, 2, 0, 1, 3, 4, 4, 2, 2, 1, 3, 4, 2, 2,
       3, 4, 2, 2, 3, 0, 0, 4, 2, 2, 3, 4, 0, 4, 2, 2, 4, 0, 0, 2, 2, 2,
       4, 0, 1, 4, 2, 1, 1, 3, 0, 1, 2, 0, 1, 1, 4, 0, 1, 0, 0, 1, 1, 4,
       0, 0, 2, 1, 1, 4, 0, 0, 2, 2, 1, 1, 3, 2, 2, 1, 4, 2, 2, 3, 2, 2,
       1, 1, 4, 2, 2, 1, 1, 3, 2, 2, 1, 2, 4, 1, 1], dtype=int32)

We cab add them to the neighbourhood dataset, so we can nicely plot them on a map inside their clusters.

In [11]:
res = complete_data
res["Cluster data"] = kmeans.labels_

In [12]:
res

Unnamed: 0_level_0,Borough,Neighbourhood,Latitude,Longitude,Cluster data
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
M3A,North York,Parkwoods,43.753259,-79.329656,4
M4A,North York,Victoria Village,43.725882,-79.315572,4
M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636,2
M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,0
M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,2
...,...,...,...,...,...
M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,1
M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160,2
M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558,4
M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509,1


For plotting the clusters on a map, we will use the Folium library.

In [20]:
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors

k_clusters = 5
torronto_latitude = 43.6532 
torronto_longitude = -79.3832

map_clusters = folium.Map( location = [ torronto_latitude, torronto_longitude ],
                           zoom_start = 11 )

x = np.arange( k_clusters )
ys = [ i + x + (i * x) ** 2 for i in range( k_clusters ) ]
colors_array = cm.rainbow( np.linspace( 0, 1, len( ys ) ) )
rainbow = [ colors.rgb2hex( i ) for i in colors_array ]

markers_colors = []
for lat, lon, poi, cluster in zip( res['Latitude'], res['Longitude'], res['Neighbourhood'], res['Cluster data'] ):
    label = folium.Popup( str( poi ) + ' Cluster ' + str( cluster ), parse_html = True )
    folium.CircleMarker(
        [ lat, lon ],
        radius = 5,
        popup = label,
        color = rainbow[ cluster - 1 ],
        fill = True,
        fill_color = rainbow[ cluster - 1 ],
        fill_opacity = 0.7).add_to( map_clusters )
       
map_clusters