## Toronto cluster analysis

In this notebook, you will respond the questions presented for week 3.


Importing libraries


In [1]:
import pandas as pd
from pandas import DataFrame
import numpy as np

Let's create the dataframe

In [2]:
columns_names = ['Postal Code', 'Borough', 'Neighborhood']
df = pd.DataFrame(columns = columns_names)
df

Unnamed: 0,Postal Code,Borough,Neighborhood


Let's get raw data fron Wipkipedia and convert it to a dataframe for future manipulation

In [3]:
url = 'https://wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

In [4]:
table = pd.read_html(url)

In [5]:
table[0]

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


Let's get the first table in the web and save the information to a csv file 

In [6]:
table[0].to_csv('Canada_Postal_Code.csv')

Let's visualize the dataframe

In [7]:
table[0]

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


In [8]:
df = table[0]

In [9]:
df.head(10)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


Cleaning process


First, we should eliminate all Borough with value Not assigned

In [10]:
index_names = df[df['Borough'] == 'Not assigned'].index

In [11]:
df.drop(index_names, inplace=True)

Let's save dataframe to csv

In [12]:
df.to_csv('Canada_Postal_code_clean.csv')

Let's see the new dataframe

In [13]:
df.head(11)

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"
11,M3B,North York,Don Mills
12,M4B,East York,"Parkview Hill, Woodbine Gardens"
13,M5B,Downtown Toronto,"Garden District, Ryerson"


Let's apply shape to get dataframe dimmensions

In [14]:
df.isna()

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,False,False,False
3,False,False,False
4,False,False,False
5,False,False,False
6,False,False,False
...,...,...,...
160,False,False,False
165,False,False,False
168,False,False,False
169,False,False,False


Therefore, dataframe has three columns and 103 rows


Add coordinates based on Postal Code

In [15]:
df_c = pd.read_csv('Geospatial_Coordinates.csv')

In [16]:
df_c


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


In [17]:
import pandas as pd
df.sort_values('Postal Code')

Unnamed: 0,Postal Code,Borough,Neighbourhood
9,M1B,Scarborough,"Malvern, Rouge"
18,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
27,M1E,Scarborough,"Guildwood, Morningside, West Hill"
36,M1G,Scarborough,Woburn
45,M1H,Scarborough,Cedarbrae
...,...,...,...
98,M9N,York,Weston
107,M9P,Etobicoke,Westmount
116,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ..."
143,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest..."


Let's join both dataframes in one 

In [18]:
df_new = df.merge(df_c)
df_new

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


As it was said, we may use Tororonto information only, as far as I understood, I need to include borouggh that contains "Toronto", for example, "Downtown Toronto", "East Toronto" and so on, to do that I will export dataframe to a csv file, AND externally, using Excel I will edit the csv with column Boroughs including Toronto borougHs only...

In [19]:
df_new.to_csv('new_Canada_Postal_code_clean.csv')
df_new.shape

(103, 5)

...new file in my HD in csv format is called Toronto.csv, let's import it into a new datafarme called toronto

In [40]:

toronto = pd.read_csv('Toronto.csv')
toronto

Unnamed: 0.1,Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,19,M4E,East Toronto,The Beaches,43.676357,-79.293031
5,20,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
6,24,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
7,25,M6G,Downtown Toronto,Christie,43.669542,-79.422564
8,30,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
9,31,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259


.. let's clean new dataframe and delete column Unnamed

In [41]:
toronto = toronto.loc[:, ~toronto.columns.str.contains('^Unnamed')]
toronto

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
7,M6G,Downtown Toronto,Christie,43.669542,-79.422564
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259


Therefore, we have a new dataframe with Postal Code, Borough, Neighborwooh and latitude and longitudes 
Now, Let's go into visualization process, first at all, let's install folium and webbrowser libraries...

In [26]:
!conda install -c conda-forge folium=0.5.0 --yes 
import folium 
import webbrowser

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.



let's get Toronto coordinates by Google and generate our object map and named it as traffic_map

In [29]:
latitude = 43.651070
longitude = -79.347015

traffic_map = folium.Map(location=[latitude, longitude], zoom_start=4)

now, let's create the map instance m using the location...

In [30]:
m = folium.Map(location=[latitude, longitude])

and let's add the makers to map using method CircleMaker ...


In [39]:
for lat, lng, label in zip(toronto['Latitude'], toronto['Longitude'], toronto['Borough']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(m)
m

Finally, we have visualized the Toronto boroughs in the map.