Installing folium that helps in visualization using maps

In [3]:
!pip install folium

Collecting folium
  Downloading folium-0.11.0-py2.py3-none-any.whl (93 kB)
Collecting branca>=0.3.0
  Downloading branca-0.4.1-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0


You should consider upgrading via the 'c:\users\joel9\anaconda3\python.exe -m pip install --upgrade pip' command.


In [56]:
import folium
import pandas as pd
import numpy as np
import requests
from sklearn.cluster import KMeans

### code below consist of all the tasks solved in the previous two notebook of week 3 series as it is essential for further processing and visualization tasks to be performed.

In [43]:
post_canada = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M', header=0, index_col=False)[0]
post_canada.drop(post_canada[post_canada['Borough']=='Not assigned'].index, axis = 0, inplace = True)
values = post_canada.loc[:, 'Neighborhood']=='Not assigned'
for num, val in enumerate(values):
    if val:
        post_canada.loc[num, 'Borough'] = post_canada.loc[num, 'Neighborhood']
post_canada.reset_index(drop=True, inplace=True)
geodata = pd.read_csv('Geospatial_Coordinates.csv')
df = post_canada.merge(geodata, on='Postal Code', how='left')
df


Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


### The next step is to only evaluate those borough that has the term 'Toronto' in it and exclude everything else. This will give us an accurate idea of the Canadian region of interest.

In [81]:
df.drop(df[df.loc[:, 'Borough'].str.contains('Toronto')==False].index, axis = 0, inplace = True)
df.reset_index(drop=True, inplace = True)
df.shape

(39, 6)

In [82]:
df

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,1
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,1
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,3
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,1
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,1
7,M6G,Downtown Toronto,Christie,43.669542,-79.422564,2
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,1
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259,2


## question: to generate maps to visualize your neighborhoods and how they cluster together.

We use the kmeans algorithm for clustering algorithm to cluster the data frame and displaying the cluster on the map of toronto and come up with a meaningful conclusion.

In [90]:
toronto = folium.Map(location=[43.65, -79.4], zoom_start=12,tiles='CartoDB dark_matter')
num_of_clusters = 4
cluster = KMeans(n_clusters = num_of_clusters, random_state = 0).fit(df[['Latitude', 'Longitude']])
df['Cluster'] = cluster.labels_
df

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,1
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,1
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,3
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,1
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,1
7,M6G,Downtown Toronto,Christie,43.669542,-79.422564,2
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,1
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259,2


In [101]:
color=['red','blue', 'purple', 'green']
for index, row in df.iterrows():
    label = folium.Popup(row['Borough'], parse_html=True)
    folium.CircleMarker(
        [row['Latitude'], row['Longitude']],
        popup=label,
        color='yellow',
        fill=True,
        fill_color=color[row['Cluster']],
        fill_opacity=0.2, 
    ).add_to(toronto)
toronto

We observe that clusters are very high over the Downtown Toronto Borough and it is closely packed based on the location. So we can come to a conclusion that the post code with the boroughs accurately depict the position of the place in conjugate with the location markers i.e. the latitude and the longitude.