# Segmenting and Clustering Neighborhoods in Toronto

# Web scraping postal codes of neighborhoods in Toronto

Use `beautifulsoup4` to scrape this Wikipedia [page](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M) containing the postal codes of neighborhood of Toronto. 

The postal codes will be used for geocoding.

From the [documentation](https://beautiful-soup-4.readthedocs.io/en/latest/#making-the-soup) we see that we need to create the html file to pass to `beautifulsoup4`. That can be achieved with the module `requests` handling the GET call to the Wikipedia page and transforming the result into text (we could also save it on file if needed)

In [1]:
from bs4 import BeautifulSoup
import requests

url_postal_codes = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
source_page = requests.get(url_postal_codes).text

soup = BeautifulSoup(source_page, 'lxml')

In [2]:
# explore the soup
# print(soup.prettify())

### Extracting the table with postal codes

By inspecting the website (from the browser or from the soup object above) we see that the relevant information is in a `<table>` object with class `wikitable sortable`.

We can use the `find_all` method ([docs](https://beautiful-soup-4.readthedocs.io/en/latest/#searching-the-tree)) to look for the table and extract its parts.

In [3]:
table = soup.find('table', class_='wikitable sortable')
print(table)

<table class="wikitable sortable">
<tbody><tr>
<th>Postcode</th>
<th>Borough</th>
<th>Neighbourhood
</th></tr>
<tr>
<td>M1A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>
<tr>
<td>M2A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Parkwoods" title="Parkwoods">Parkwoods</a>
</td></tr>
<tr>
<td>M4A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Victoria_Village" title="Victoria Village">Victoria Village</a>
</td></tr>
<tr>
<td>M5A</td>
<td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
<td><a href="/wiki/Harbourfront_(Toronto)" title="Harbourfront (Toronto)">Harbourfront</a>
</td></tr>
<tr>
<td>M5A</td>
<td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
<td><a href="/wiki/Regent_Park" title="Regent Park">Regent Park</a>
</td></tr>
<tr>
<td>M6A</td>

In [4]:
rows = table.find_all('tr')
print(rows)

[<tr>
<th>Postcode</th>
<th>Borough</th>
<th>Neighbourhood
</th></tr>, <tr>
<td>M1A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>, <tr>
<td>M2A</td>
<td>Not assigned</td>
<td>Not assigned
</td></tr>, <tr>
<td>M3A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Parkwoods" title="Parkwoods">Parkwoods</a>
</td></tr>, <tr>
<td>M4A</td>
<td><a href="/wiki/North_York" title="North York">North York</a></td>
<td><a href="/wiki/Victoria_Village" title="Victoria Village">Victoria Village</a>
</td></tr>, <tr>
<td>M5A</td>
<td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
<td><a href="/wiki/Harbourfront_(Toronto)" title="Harbourfront (Toronto)">Harbourfront</a>
</td></tr>, <tr>
<td>M5A</td>
<td><a href="/wiki/Downtown_Toronto" title="Downtown Toronto">Downtown Toronto</a></td>
<td><a href="/wiki/Regent_Park" title="Regent Park">Regent Park</a>
</td></tr>, <tr>
<td>M6A</td>
<td><a href="/wiki/North_York" ti

#### Tests (supplementary)

This is how a row of the table looks like:

In [5]:
test = rows[7]
print(test.text)


M6A
North York
Lawrence Heights



And this is how to split a row into its elements and keeping only the ones with text. This is assuming that the first and last line in the row are empty.

In [6]:
test.text.split('\n')[1:4]

['M6A', 'North York', 'Lawrence Heights']

In [7]:
test.find_all('td')

[<td>M6A</td>,
 <td><a href="/wiki/North_York" title="North York">North York</a></td>,
 <td><a href="/wiki/Lawrence_Heights" title="Lawrence Heights">Lawrence Heights</a>
 </td>]

#### Get the data from each row in the table

Iterate through all the rows in the table to extract the `Postal Code`, `Borough` and `Neighborhood`, assuming their positions in the text results.

In [8]:
header = rows[0].text
table = []
for r in rows[1:]:
    try:
        line = r.text.split('\n')[1:4]
    except Exception as e:
        print('cannot get line {}'.format(r))
        line = []
    table.append(line)
# print(table)

### Create a `pandas` dataframe with the postal codes

In [9]:
import pandas as pd
print(pd.__version__)

0.25.0


Use `from_records` to create a DataFrame directly from the table of data, giving names to the columns:

In [10]:
# do this if the rows in the table were parsed as a whole
# pc = pd.DataFrame.from_records(table,exclude=['0','1'],columns=['0','PostalCode','Borough','Neighborhood','1'])
pc = pd.DataFrame.from_records(table,columns=['PostalCode','Borough','Neighborhood'])

In [11]:
pc.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


### Process dataframe to remove unwanted items

In [12]:
import numpy as np
print(np.__version__)

1.17.0


Keep only the rows where `Borough` is different from `Not assigned`.

In [13]:
pc_clean = pc[pc.Borough != 'Not assigned']

In [14]:
pc_clean.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


If a `Neighborhood` does not have a name, assign the name of the corresponding `Borough`:

In [15]:
pc_clean.query('Neighborhood == "Not assigned"')

Unnamed: 0,PostalCode,Borough,Neighborhood
8,M7A,Queen's Park,Not assigned


In [16]:
pc_clean.at[8,'Neighborhood']=pc_clean.at[8,'Borough']

In [17]:
pc_clean.loc[8]

PostalCode               M7A
Borough         Queen's Park
Neighborhood    Queen's Park
Name: 8, dtype: object

If there where multiple instances to change, we could have used `Dataframe.where` to replace the values.

In [18]:
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.where.html#pandas.DataFrame.where
#test = pc[pc.Borough != 'Not assigned']
#test.where(test.Neighborhood=='Not assigned',test.Borough, axis=1)

### Merge neighborhoods with the same postal code

Group by `PostalCode` and merge the neighborhoods

In [19]:
pc_grouped = pc_clean.groupby('PostalCode')

You can iterate through all the postal codes and look at the different neighborhoods

In [20]:
for name, group in pc_grouped:
    group

Unnamed: 0,PostalCode,Borough,Neighborhood
11,M1B,Scarborough,Rouge
12,M1B,Scarborough,Malvern


Unnamed: 0,PostalCode,Borough,Neighborhood
27,M1C,Scarborough,Highland Creek
28,M1C,Scarborough,Rouge Hill
29,M1C,Scarborough,Port Union


Unnamed: 0,PostalCode,Borough,Neighborhood
42,M1E,Scarborough,Guildwood
43,M1E,Scarborough,Morningside
44,M1E,Scarborough,West Hill


Unnamed: 0,PostalCode,Borough,Neighborhood
53,M1G,Scarborough,Woburn


Unnamed: 0,PostalCode,Borough,Neighborhood
62,M1H,Scarborough,Cedarbrae


Unnamed: 0,PostalCode,Borough,Neighborhood
76,M1J,Scarborough,Scarborough Village


Unnamed: 0,PostalCode,Borough,Neighborhood
91,M1K,Scarborough,East Birchmount Park
92,M1K,Scarborough,Ionview
93,M1K,Scarborough,Kennedy Park


Unnamed: 0,PostalCode,Borough,Neighborhood
107,M1L,Scarborough,Clairlea
108,M1L,Scarborough,Golden Mile
109,M1L,Scarborough,Oakridge


Unnamed: 0,PostalCode,Borough,Neighborhood
123,M1M,Scarborough,Cliffcrest
124,M1M,Scarborough,Cliffside
125,M1M,Scarborough,Scarborough Village West


Unnamed: 0,PostalCode,Borough,Neighborhood
140,M1N,Scarborough,Birch Cliff
141,M1N,Scarborough,Cliffside West


Unnamed: 0,PostalCode,Borough,Neighborhood
151,M1P,Scarborough,Dorset Park
152,M1P,Scarborough,Scarborough Town Centre
153,M1P,Scarborough,Wexford Heights


Unnamed: 0,PostalCode,Borough,Neighborhood
164,M1R,Scarborough,Maryvale
165,M1R,Scarborough,Wexford


Unnamed: 0,PostalCode,Borough,Neighborhood
180,M1S,Scarborough,Agincourt


Unnamed: 0,PostalCode,Borough,Neighborhood
191,M1T,Scarborough,Clarks Corners
192,M1T,Scarborough,Sullivan
193,M1T,Scarborough,Tam O'Shanter


Unnamed: 0,PostalCode,Borough,Neighborhood
205,M1V,Scarborough,Agincourt North
206,M1V,Scarborough,L'Amoreaux East
207,M1V,Scarborough,Milliken
208,M1V,Scarborough,Steeles East


Unnamed: 0,PostalCode,Borough,Neighborhood
236,M1W,Scarborough,L'Amoreaux West


Unnamed: 0,PostalCode,Borough,Neighborhood
246,M1X,Scarborough,Upper Rouge


Unnamed: 0,PostalCode,Borough,Neighborhood
63,M2H,North York,Hillcrest Village


Unnamed: 0,PostalCode,Borough,Neighborhood
77,M2J,North York,Fairview
78,M2J,North York,Henry Farm
79,M2J,North York,Oriole


Unnamed: 0,PostalCode,Borough,Neighborhood
94,M2K,North York,Bayview Village


Unnamed: 0,PostalCode,Borough,Neighborhood
110,M2L,North York,Silver Hills
111,M2L,North York,York Mills


Unnamed: 0,PostalCode,Borough,Neighborhood
126,M2M,North York,Newtonbrook
127,M2M,North York,Willowdale


Unnamed: 0,PostalCode,Borough,Neighborhood
142,M2N,North York,Willowdale South


Unnamed: 0,PostalCode,Borough,Neighborhood
154,M2P,North York,York Mills West


Unnamed: 0,PostalCode,Borough,Neighborhood
166,M2R,North York,Willowdale West


Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods


Unnamed: 0,PostalCode,Borough,Neighborhood
14,M3B,North York,Don Mills North


Unnamed: 0,PostalCode,Borough,Neighborhood
31,M3C,North York,Flemingdon Park
32,M3C,North York,Don Mills South


Unnamed: 0,PostalCode,Borough,Neighborhood
64,M3H,North York,Bathurst Manor
65,M3H,North York,Downsview North
66,M3H,North York,Wilson Heights


Unnamed: 0,PostalCode,Borough,Neighborhood
80,M3J,North York,Northwood Park
81,M3J,North York,York University


Unnamed: 0,PostalCode,Borough,Neighborhood
95,M3K,North York,CFB Toronto
96,M3K,North York,Downsview East


Unnamed: 0,PostalCode,Borough,Neighborhood
112,M3L,North York,Downsview West


Unnamed: 0,PostalCode,Borough,Neighborhood
128,M3M,North York,Downsview Central


Unnamed: 0,PostalCode,Borough,Neighborhood
143,M3N,North York,Downsview Northwest


Unnamed: 0,PostalCode,Borough,Neighborhood
3,M4A,North York,Victoria Village


Unnamed: 0,PostalCode,Borough,Neighborhood
15,M4B,East York,Woodbine Gardens
16,M4B,East York,Parkview Hill


Unnamed: 0,PostalCode,Borough,Neighborhood
33,M4C,East York,Woodbine Heights


Unnamed: 0,PostalCode,Borough,Neighborhood
47,M4E,East Toronto,The Beaches


Unnamed: 0,PostalCode,Borough,Neighborhood
56,M4G,East York,Leaside


Unnamed: 0,PostalCode,Borough,Neighborhood
67,M4H,East York,Thorncliffe Park


Unnamed: 0,PostalCode,Borough,Neighborhood
82,M4J,East York,East Toronto


Unnamed: 0,PostalCode,Borough,Neighborhood
97,M4K,East Toronto,The Danforth West
98,M4K,East Toronto,Riverdale


Unnamed: 0,PostalCode,Borough,Neighborhood
113,M4L,East Toronto,The Beaches West
114,M4L,East Toronto,India Bazaar


Unnamed: 0,PostalCode,Borough,Neighborhood
129,M4M,East Toronto,Studio District


Unnamed: 0,PostalCode,Borough,Neighborhood
144,M4N,Central Toronto,Lawrence Park


Unnamed: 0,PostalCode,Borough,Neighborhood
156,M4P,Central Toronto,Davisville North


Unnamed: 0,PostalCode,Borough,Neighborhood
168,M4R,Central Toronto,North Toronto West


Unnamed: 0,PostalCode,Borough,Neighborhood
183,M4S,Central Toronto,Davisville


Unnamed: 0,PostalCode,Borough,Neighborhood
196,M4T,Central Toronto,Moore Park
197,M4T,Central Toronto,Summerhill East


Unnamed: 0,PostalCode,Borough,Neighborhood
211,M4V,Central Toronto,Deer Park
212,M4V,Central Toronto,Forest Hill SE
213,M4V,Central Toronto,Rathnelly
214,M4V,Central Toronto,South Hill
215,M4V,Central Toronto,Summerhill West


Unnamed: 0,PostalCode,Borough,Neighborhood
239,M4W,Downtown Toronto,Rosedale


Unnamed: 0,PostalCode,Borough,Neighborhood
249,M4X,Downtown Toronto,Cabbagetown
250,M4X,Downtown Toronto,St. James Town


Unnamed: 0,PostalCode,Borough,Neighborhood
262,M4Y,Downtown Toronto,Church and Wellesley


Unnamed: 0,PostalCode,Borough,Neighborhood
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park


Unnamed: 0,PostalCode,Borough,Neighborhood
17,M5B,Downtown Toronto,Ryerson
18,M5B,Downtown Toronto,Garden District


Unnamed: 0,PostalCode,Borough,Neighborhood
34,M5C,Downtown Toronto,St. James Town


Unnamed: 0,PostalCode,Borough,Neighborhood
48,M5E,Downtown Toronto,Berczy Park


Unnamed: 0,PostalCode,Borough,Neighborhood
57,M5G,Downtown Toronto,Central Bay Street


Unnamed: 0,PostalCode,Borough,Neighborhood
68,M5H,Downtown Toronto,Adelaide
69,M5H,Downtown Toronto,King
70,M5H,Downtown Toronto,Richmond


Unnamed: 0,PostalCode,Borough,Neighborhood
83,M5J,Downtown Toronto,Harbourfront East
84,M5J,Downtown Toronto,Toronto Islands
85,M5J,Downtown Toronto,Union Station


Unnamed: 0,PostalCode,Borough,Neighborhood
99,M5K,Downtown Toronto,Design Exchange
100,M5K,Downtown Toronto,Toronto Dominion Centre


Unnamed: 0,PostalCode,Borough,Neighborhood
115,M5L,Downtown Toronto,Commerce Court
116,M5L,Downtown Toronto,Victoria Hotel


Unnamed: 0,PostalCode,Borough,Neighborhood
130,M5M,North York,Bedford Park
131,M5M,North York,Lawrence Manor East


Unnamed: 0,PostalCode,Borough,Neighborhood
145,M5N,Central Toronto,Roselawn


Unnamed: 0,PostalCode,Borough,Neighborhood
157,M5P,Central Toronto,Forest Hill North
158,M5P,Central Toronto,Forest Hill West


Unnamed: 0,PostalCode,Borough,Neighborhood
169,M5R,Central Toronto,The Annex
170,M5R,Central Toronto,North Midtown
171,M5R,Central Toronto,Yorkville


Unnamed: 0,PostalCode,Borough,Neighborhood
184,M5S,Downtown Toronto,Harbord
185,M5S,Downtown Toronto,University of Toronto


Unnamed: 0,PostalCode,Borough,Neighborhood
198,M5T,Downtown Toronto,Chinatown
199,M5T,Downtown Toronto,Grange Park
200,M5T,Downtown Toronto,Kensington Market


Unnamed: 0,PostalCode,Borough,Neighborhood
216,M5V,Downtown Toronto,CN Tower
217,M5V,Downtown Toronto,Bathurst Quay
218,M5V,Downtown Toronto,Island airport
219,M5V,Downtown Toronto,Harbourfront West
220,M5V,Downtown Toronto,King and Spadina
221,M5V,Downtown Toronto,Railway Lands
222,M5V,Downtown Toronto,South Niagara


Unnamed: 0,PostalCode,Borough,Neighborhood
240,M5W,Downtown Toronto,Stn A PO Boxes 25 The Esplanade


Unnamed: 0,PostalCode,Borough,Neighborhood
251,M5X,Downtown Toronto,First Canadian Place
252,M5X,Downtown Toronto,Underground city


Unnamed: 0,PostalCode,Borough,Neighborhood
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor


Unnamed: 0,PostalCode,Borough,Neighborhood
19,M6B,North York,Glencairn


Unnamed: 0,PostalCode,Borough,Neighborhood
35,M6C,York,Humewood-Cedarvale


Unnamed: 0,PostalCode,Borough,Neighborhood
49,M6E,York,Caledonia-Fairbanks


Unnamed: 0,PostalCode,Borough,Neighborhood
58,M6G,Downtown Toronto,Christie


Unnamed: 0,PostalCode,Borough,Neighborhood
71,M6H,West Toronto,Dovercourt Village
72,M6H,West Toronto,Dufferin


Unnamed: 0,PostalCode,Borough,Neighborhood
86,M6J,West Toronto,Little Portugal
87,M6J,West Toronto,Trinity


Unnamed: 0,PostalCode,Borough,Neighborhood
101,M6K,West Toronto,Brockton
102,M6K,West Toronto,Exhibition Place
103,M6K,West Toronto,Parkdale Village


Unnamed: 0,PostalCode,Borough,Neighborhood
117,M6L,North York,Downsview
118,M6L,North York,North Park
119,M6L,North York,Upwood Park


Unnamed: 0,PostalCode,Borough,Neighborhood
132,M6M,York,Del Ray
133,M6M,York,Keelesdale
134,M6M,York,Mount Dennis
135,M6M,York,Silverthorn


Unnamed: 0,PostalCode,Borough,Neighborhood
146,M6N,York,The Junction North
147,M6N,York,Runnymede


Unnamed: 0,PostalCode,Borough,Neighborhood
159,M6P,West Toronto,High Park
160,M6P,West Toronto,The Junction South


Unnamed: 0,PostalCode,Borough,Neighborhood
172,M6R,West Toronto,Parkdale
173,M6R,West Toronto,Roncesvalles


Unnamed: 0,PostalCode,Borough,Neighborhood
186,M6S,West Toronto,Runnymede
187,M6S,West Toronto,Swansea


Unnamed: 0,PostalCode,Borough,Neighborhood
8,M7A,Queen's Park,Queen's Park


Unnamed: 0,PostalCode,Borough,Neighborhood
174,M7R,Mississauga,Canada Post Gateway Processing Centre


Unnamed: 0,PostalCode,Borough,Neighborhood
265,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern


Unnamed: 0,PostalCode,Borough,Neighborhood
225,M8V,Etobicoke,Humber Bay Shores
226,M8V,Etobicoke,Mimico South
227,M8V,Etobicoke,New Toronto


Unnamed: 0,PostalCode,Borough,Neighborhood
243,M8W,Etobicoke,Alderwood
244,M8W,Etobicoke,Long Branch


Unnamed: 0,PostalCode,Borough,Neighborhood
255,M8X,Etobicoke,The Kingsway
256,M8X,Etobicoke,Montgomery Road
257,M8X,Etobicoke,Old Mill North


Unnamed: 0,PostalCode,Borough,Neighborhood
266,M8Y,Etobicoke,Humber Bay
267,M8Y,Etobicoke,King's Mill Park
268,M8Y,Etobicoke,Kingsway Park South East
269,M8Y,Etobicoke,Mimico NE
270,M8Y,Etobicoke,Old Mill South
271,M8Y,Etobicoke,The Queensway East
272,M8Y,Etobicoke,Royal York South East
273,M8Y,Etobicoke,Sunnylea


Unnamed: 0,PostalCode,Borough,Neighborhood
282,M8Z,Etobicoke,Kingsway Park South West
283,M8Z,Etobicoke,Mimico NW
284,M8Z,Etobicoke,The Queensway West
285,M8Z,Etobicoke,Royal York South West
286,M8Z,Etobicoke,South of Bloor


Unnamed: 0,PostalCode,Borough,Neighborhood
10,M9A,Etobicoke,Islington Avenue


Unnamed: 0,PostalCode,Borough,Neighborhood
22,M9B,Etobicoke,Cloverdale
23,M9B,Etobicoke,Islington
24,M9B,Etobicoke,Martin Grove
25,M9B,Etobicoke,Princess Gardens
26,M9B,Etobicoke,West Deane Park


Unnamed: 0,PostalCode,Borough,Neighborhood
38,M9C,Etobicoke,Bloordale Gardens
39,M9C,Etobicoke,Eringate
40,M9C,Etobicoke,Markland Wood
41,M9C,Etobicoke,Old Burnhamthorpe


Unnamed: 0,PostalCode,Borough,Neighborhood
122,M9L,North York,Humber Summit


Unnamed: 0,PostalCode,Borough,Neighborhood
138,M9M,North York,Emery
139,M9M,North York,Humberlea


Unnamed: 0,PostalCode,Borough,Neighborhood
150,M9N,York,Weston


Unnamed: 0,PostalCode,Borough,Neighborhood
163,M9P,Etobicoke,Westmount


Unnamed: 0,PostalCode,Borough,Neighborhood
176,M9R,Etobicoke,Kingsview Village
177,M9R,Etobicoke,Martin Grove Gardens
178,M9R,Etobicoke,Richview Gardens
179,M9R,Etobicoke,St. Phillips


Unnamed: 0,PostalCode,Borough,Neighborhood
228,M9V,Etobicoke,Albion Gardens
229,M9V,Etobicoke,Beaumond Heights
230,M9V,Etobicoke,Humbergate
231,M9V,Etobicoke,Jamestown
232,M9V,Etobicoke,Mount Olive
233,M9V,Etobicoke,Silverstone
234,M9V,Etobicoke,South Steeles
235,M9V,Etobicoke,Thistletown


Unnamed: 0,PostalCode,Borough,Neighborhood
245,M9W,Etobicoke,Northwest


#### Create final dataframe

In [21]:
column_names = ['PostalCode','Borough','Neighborhood']
df = pd.DataFrame(columns=column_names)

for name, group in pc_grouped:
    p = list(dict.fromkeys(group.PostalCode))[0]
    b = list(dict.fromkeys(group.Borough))[0]
    n = list(dict.fromkeys(group.Neighborhood))
    df = df.append({'PostalCode': p, 'Borough': b, 'Neighborhood': ', '.join(n)}, ignore_index=True)

df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


Final shape:

In [22]:
df.shape

(103, 3)

# Get geospatial coordinates of the neighborhoods

In [23]:
# import geocoder

In [24]:
# # initialize your variable to None
# lat_lng_coords = None
# postal_code = 'M9W'

# # loop until you get the coordinates
# while(lat_lng_coords is None):
#     g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
#     lat_lng_coords = g.latlng

# latitude = lat_lng_coords[0]
# longitude = lat_lng_coords[1]
# print(latitude,longitude)

In [25]:
# g = geocoder.canadapost('{}, Toronto, Ontario'.format(postal_code))
# g

In [26]:
# g = geocoder.google('M9N, Toronto, Ontario')
# g

Since the `geocoder.google` does not seem to work (giving `REQUEST DENIED`) and the `geocoder.canadapost` does not have info on latitude or longitude, we use the provided `Geospatial_Coordinates.csv` file

In [27]:
gc = pd.read_csv('Geospatial_Coordinates.csv')
gc.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [28]:
gc.shape

(103, 3)

Merge the Neighborhood dataset and the coordinates dataset. Be careful about the different naming convention in the columns.

In [29]:
toronto_data = pd.merge(df, gc, left_on='PostalCode', right_on='Postal Code')

In [39]:
toronto_data.drop(columns='Postal Code',inplace=True)

In [40]:
toronto_data.shape

(103, 5)

# Maps and segmentation of the city of Toronto

Import libraries for maps and clustering

In [56]:
import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import folium # map rendering library
# import k-means from scikit-learn
from sklearn.cluster import KMeans

## Create a Folium Map for Toronto

First get the address of Toronto to center the map:

In [32]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


Then use the coordinates of each neighborhood saved in `toronto_data` to create a pin on the map corresponding to the location of the center of the neighborhood. First of all, let's restrict the map to different boroughs.

In [48]:
neighborhoods = toronto_data.groupby('Borough').head(1)

In [49]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(width='100%',height='100%',location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, postal, borough in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['PostalCode'], neighborhoods['Borough']):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=folium.Popup('{}, {}'.format(postal, borough), parse_html=True),
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)

<folium.vector_layers.CircleMarker at 0x11dc75630>

<folium.vector_layers.CircleMarker at 0x11dc75518>

<folium.vector_layers.CircleMarker at 0x11dc75550>

<folium.vector_layers.CircleMarker at 0x11dc75278>

<folium.vector_layers.CircleMarker at 0x11dc75048>

<folium.vector_layers.CircleMarker at 0x11dc750f0>

<folium.vector_layers.CircleMarker at 0x11dc61c88>

<folium.vector_layers.CircleMarker at 0x11dc61b00>

<folium.vector_layers.CircleMarker at 0x11dc61a58>

<folium.vector_layers.CircleMarker at 0x11da45278>

<folium.vector_layers.CircleMarker at 0x11da45240>

In [50]:
map_toronto

### Explore Downtown Toronto

For illustrative purposes, let's focus on Downtown Toronto

In [51]:
downtown_toronto = toronto_data[toronto_data['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
downtown_toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529
1,M4X,Downtown Toronto,"Cabbagetown, St. James Town",43.667967,-79.367675
2,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316
3,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
4,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937


In [57]:
downtown_toronto.shape

(18, 5)

Visualize the 18 different postal codes on the map after getting the coordinated for Downtown Toronto:

In [52]:
address = 'Downtown Toronto, Toronto'

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Downtown Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Downtown Toronto are 43.6541737, -79.3808116451341.


In [54]:
# create map of Toronto using latitude and longitude values
map_downtown = folium.Map(width='100%',height='100%',location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, postal, neighborhoods in zip(downtown_toronto['Latitude'], downtown_toronto['Longitude'], downtown_toronto['PostalCode'], downtown_toronto['Neighborhood']):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=folium.Popup('{}, {}'.format(postal, neighborhoods), parse_html=True),
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtown)

<folium.vector_layers.CircleMarker at 0x11da23518>

<folium.vector_layers.CircleMarker at 0x11da23898>

<folium.vector_layers.CircleMarker at 0x11da23588>

<folium.vector_layers.CircleMarker at 0x11da23400>

<folium.vector_layers.CircleMarker at 0x11da23470>

<folium.vector_layers.CircleMarker at 0x11da23198>

<folium.vector_layers.CircleMarker at 0x11da23048>

<folium.vector_layers.CircleMarker at 0x11dc35f60>

<folium.vector_layers.CircleMarker at 0x11dc35cc0>

<folium.vector_layers.CircleMarker at 0x11dc35b38>

<folium.vector_layers.CircleMarker at 0x11dc35ba8>

<folium.vector_layers.CircleMarker at 0x11dc35668>

<folium.vector_layers.CircleMarker at 0x11dc356a0>

<folium.vector_layers.CircleMarker at 0x11dc355f8>

<folium.vector_layers.CircleMarker at 0x11dc35198>

<folium.vector_layers.CircleMarker at 0x11dc35240>

<folium.vector_layers.CircleMarker at 0x11dc350f0>

<folium.vector_layers.CircleMarker at 0x11dc31be0>

In [55]:
map_downtown

We can now reproduce the same analysis we did for Manhattan, New York

## Clustering the neighborhoods of Downtown Toronto

There are 18 different postal codes in Downtown Toronto, corresponding to different neighborhoods.

#### Use FourSquare API

Gather credentials to access API (they are stored in a file). Then define the query to the API to gather the first 100 venues in a 500m radius around a specific latitude and longitude (FourSquare need the geospatial coordinates)

In [58]:
# get credentials from file
with open('../credentials.json') as f:
    cred = json.load(f)
CLIENT_ID = cred['client_id'] # your Foursquare ID
CLIENT_SECRET = cred['client_secret'] # your Foursquare Secret
VERSION = '20180605'
LIMIT = 100

For illustration purposes, let's fix the postal code to be `M5T` and later we will repeat the whole process on the full list of postal codes

In [59]:
radius = 500
latitude = downtown_toronto.query('PostalCode=="M5T"').Latitude.values[0]
longitude = downtown_toronto.query('PostalCode=="M5T"').Longitude.values[0]
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
print(url)

https://api.foursquare.com/v2/venues/explore?client_id=U0CCM3AW5HNICCFKWFJYJP44CYIQCUOQB1WT52W1H2FYJKD2&client_secret=XF3EB3ZTPQOCC41N1WHH0HBP3PNFVHTFEE1E3OUCAGHGLOV0&ll=43.6532057,-79.4000493&v=20180605&&radius=500&limit=100


In [60]:
results = requests.get(url).json()

In [62]:
results

{'meta': {'code': 200, 'requestId': '5d63753c018cbb002c0c78fc'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Kensington',
  'headerFullLocation': 'Kensington, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 109,
  'suggestedBounds': {'ne': {'lat': 43.6577057045, 'lng': -79.3938414091248},
   'sw': {'lat': 43.6487056955, 'lng': -79.40625719087521}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b116957f964a520087c23e3',
       'name': 'Kid Icarus',
       'location': {'address': '205 Augusta Ave.',
        'crossStreet': 'Denison Square',
        'lat': 43.653933260442265,
        'lng': -79.40171859012935,
        'labeledLatLngs': [{'label': 'disp

#### Clean up and collect different venues for a specific postal code

We extract the information about the venues from the result of the API call

In [63]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [64]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Kid Icarus,Arts & Crafts Store,43.653933,-79.401719
1,Seven Lives - Tacos y Mariscos,Mexican Restaurant,43.654418,-79.400545
2,Little Pebbles,Coffee Shop,43.654883,-79.400264
3,El Rey,Cocktail Bar,43.652764,-79.400048
4,The Moonbean Cafe,Café,43.654147,-79.400182


In [65]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


We need to repeat this process for all the neighborhoods (postal codes) in Downtown Toronto. This is analogous to what we did for Manhattan, New York.

#### Define a function to gather venues from FourSquare for all neighborhoods

In [66]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list]) # just a nested loop using list comprehension:https://docs.python.org/3.6/tutorial/datastructures.html#list-comprehensions
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Get all venues

Run the function for all the rows in the `downtown_toronto` dataframe

In [67]:
venues = getNearbyVenues(names=downtown_toronto['Neighborhood'],
                        latitudes=downtown_toronto['Latitude'],
                        longitudes=downtown_toronto['Longitude']
                        )

Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront, Regent Park
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie


We have venues for each list of neighborhoods (one list correspond to a specific postal code)

In [69]:
venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Berczy Park,56,56,56,56,56,56
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",16,16,16,16,16,16
"Cabbagetown, St. James Town",46,46,46,46,46,46
Central Bay Street,84,84,84,84,84,84
"Chinatown, Grange Park, Kensington Market",100,100,100,100,100,100
Christie,15,15,15,15,15,15
Church and Wellesley,84,84,84,84,84,84
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
"Design Exchange, Toronto Dominion Centre",100,100,100,100,100,100


In [70]:
print('There are {} uniques categories.'.format(len(venues['Venue Category'].unique())))

There are 206 uniques categories.


### Analyze the neighborhoods

In order to use a clustering algorithm to segment Downtown Toronto, we need to transform the venues and venue categories into numbers. We want to use these numbers as features for each neighborhood.

In [71]:
# one hot encoding of categories for clustering
toronto_onehot = pd.get_dummies(venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column one-hot dataframe
toronto_onehot['Neighborhood'] = venues['Neighborhood']
# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0


In [72]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,...,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.01,0.0,...,0.0,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,...,0.0,0.0,0.017857,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0
2,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.0,0.0625,0.0625,0.0625,0.125,0.1875,0.125,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,...,0.021739,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.011905,0.0,...,0.0,0.0,0.011905,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.011905,0.0
5,"Chinatown, Grange Park, Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,...,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.06,0.0,0.04,0.01,0.0
6,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Church and Wellesley,0.011905,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.011905,...,0.0,0.0,0.011905,0.011905,0.011905,0.011905,0.0,0.0,0.0,0.0,0.0,0.011905,0.011905,0.0,0.0
8,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.01,0.0,0.0,...,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0
9,"Design Exchange, Toronto Dominion Centre",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,...,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0


In [73]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [74]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Bar,Steakhouse,Thai Restaurant,Breakfast Spot,Gym,Restaurant,Asian Restaurant,Hotel
1,Berczy Park,Coffee Shop,Cocktail Bar,Beer Bar,Bakery,Seafood Restaurant,Farmers Market,Steakhouse,Cheese Shop,Café,Park
2,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Service,Airport Terminal,Airport Lounge,Harbor / Marina,Coffee Shop,Plane,Sculpture Garden,Boutique,Boat or Ferry,Airport Food Court
3,"Cabbagetown, St. James Town",Coffee Shop,Park,Restaurant,Café,Italian Restaurant,Pizza Place,Pub,Bakery,Gym / Fitness Center,American Restaurant
4,Central Bay Street,Coffee Shop,Italian Restaurant,Ice Cream Shop,Middle Eastern Restaurant,Sandwich Place,Burger Joint,Café,Bubble Tea Shop,Spa,Bakery


### Cluster the neighborhoods

Run the k-means clustering algorithm with 4 clusters

In [78]:
# set number of clusters
kclusters = 4

# remove the Neighborhood name from the dataframe, and leave only the frequencies: 260 of them per 18 neighborhoods.
toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1) # drop column, axis=1

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_ 

array([3, 3, 2, 3, 3, 3, 0, 3, 3, 3, 3, 3, 3, 3, 1, 3, 3, 3], dtype=int32)

Changing the number of clusters does not seem to matter much. Most of the neighborhoods fall in the same cluster because they are pretty similar (they are all in Downtown Toronto after all) but two of them, the third and the 15th. 
They are very specific neighborhood: for example one has an airport and the other a park.

In [82]:
neighborhoods_venues_sorted.iloc[[2,14]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Service,Airport Terminal,Airport Lounge,Harbor / Marina,Coffee Shop,Plane,Sculpture Garden,Boutique,Boat or Ferry,Airport Food Court
14,Rosedale,Park,Playground,Trail,Building,Women's Store,Dim Sum Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop


### Map the different clusters

Create a dataframe with the cluster label for each neighborhood and map them with different colored markers

In [83]:
# add clustering labels to the sorted venue dataframe
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_) # insert as first column.

downtown_merged = downtown_toronto

# merge neighborhoods_venues_sorted with manhattan_data to add latitude/longitude for each neighborhood
downtown_merged = downtown_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

downtown_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529,1,Park,Playground,Trail,Building,Women's Store,Dim Sum Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
1,M4X,Downtown Toronto,"Cabbagetown, St. James Town",43.667967,-79.367675,3,Coffee Shop,Park,Restaurant,Café,Italian Restaurant,Pizza Place,Pub,Bakery,Gym / Fitness Center,American Restaurant
2,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316,3,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Restaurant,Gay Bar,Pub,Men's Store,Gastropub,Hotel,Fast Food Restaurant
3,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636,3,Coffee Shop,Café,Pub,Bakery,Park,Theater,Breakfast Spot,Mexican Restaurant,Restaurant,Spa
4,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937,3,Coffee Shop,Clothing Store,Cosmetics Shop,Middle Eastern Restaurant,Café,Ramen Restaurant,Diner,Italian Restaurant,Ice Cream Shop,Bubble Tea Shop


Latitude and Longitude are the ones of Downtown Toronto

In [84]:
address = 'Downtown Toronto, Toronto'

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Downtown Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Downtown Toronto are 43.6541737, -79.3808116451341.


In [85]:
import matplotlib.cm as cm
import matplotlib.colors as colors
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
colors_array = cm.rainbow(np.linspace(0, 1, kclusters))  # array of colors from the rainbow colormap
rainbow = [colors.rgb2hex(i) for i in colors_array]  # get HEX code for each color

# add markers to the map
for lat, lon, poi, cluster in zip(downtown_merged['Latitude'], downtown_merged['Longitude'], downtown_merged['Neighborhood'], downtown_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

ModuleNotFoundError: No module named 'matplotlib'

In [86]:
map_clusters

ModuleNotFoundError: No module named 'matplotlib'