# Segmenting and Clustering Neighborhoods in Toronto
## Applied data capstone

First let's import the necessary packages.

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import json
import folium
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

## 1 - Building the Dataframe

Now I read the code of the wikipedia page and I store it into a variable, then I import it to BeautifulSoup

In [2]:
wikipage = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

In [3]:
soup = BeautifulSoup(wikipage, 'lxml')

Here I select only the part of code that belongs to "wikitable sortable" with BS functions, i.e. the table that retains the info I want.

In [4]:
toronto_table = soup.find('table', {'class':'wikitable sortable'})

Here I separate each single line in the "row" list, with BS functions.

In [5]:
row = toronto_table.find_all('tr')
len(row)

288

Now I cicle between all rows with thew outer cicle, and between all the columns with the inner cicle and assign each element to the correct list, removing at the same time the html commands I don't need, as "td" and "tr", with the "strip" command. I also discard the lines in which the neighborhood is not assigned, with an "if" command.

At the end I create the dataframe. If the borough is not assigned I discard again the whole line.

In [6]:
postal_code=[]
borough=[]
neighborhood=[]

for i in range(1,len(row)):
    column = row[i].find_all('td')
    postal_code.append(column[0].get_text(strip=True))
    borough.append(column[1].get_text(strip=True))
    if column[2].get_text(strip=True) == 'Not assigned':
        neighborhood.append(borough[i-1])
    else:
        neighborhood.append(column[2].get_text(strip=True))

In [7]:
merge={'postal_code':postal_code, 'borough':borough, 'neighborhood':neighborhood}
df=pd.DataFrame(merge)
df=df[df.borough!='Not assigned']
df.reset_index(inplace=True, drop=True)
df.head(10)

Unnamed: 0,postal_code,borough,neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor
5,M7A,Queen's Park,Queen's Park
6,M9A,Queen's Park,Queen's Park
7,M1B,Scarborough,Rouge
8,M1B,Scarborough,Malvern
9,M3B,North York,Don Mills North


Now I create a dictionary with the "groupby" method from pandas. The dictionary group each neighborhood by the postal code and give me the index of the lines pertaining to the same postal code.

Finally, I create the string with all the neighborhood pertaining to the same postal code with the "join" function, separating them with a comma.

In [8]:
dictio=df.groupby(['postal_code']).groups
dictio

{'M1B': Int64Index([7, 8], dtype='int64'),
 'M1C': Int64Index([20, 21, 22], dtype='int64'),
 'M1E': Int64Index([32, 33, 34], dtype='int64'),
 'M1G': Int64Index([38], dtype='int64'),
 'M1H': Int64Index([42], dtype='int64'),
 'M1J': Int64Index([53], dtype='int64'),
 'M1K': Int64Index([65, 66, 67], dtype='int64'),
 'M1L': Int64Index([78, 79, 80], dtype='int64'),
 'M1M': Int64Index([92, 93, 94], dtype='int64'),
 'M1N': Int64Index([107, 108], dtype='int64'),
 'M1P': Int64Index([116, 117, 118], dtype='int64'),
 'M1R': Int64Index([126, 127], dtype='int64'),
 'M1S': Int64Index([140], dtype='int64'),
 'M1T': Int64Index([146, 147, 148], dtype='int64'),
 'M1V': Int64Index([154, 155, 156, 157], dtype='int64'),
 'M1W': Int64Index([181], dtype='int64'),
 'M1X': Int64Index([187], dtype='int64'),
 'M2H': Int64Index([43], dtype='int64'),
 'M2J': Int64Index([54, 55, 56], dtype='int64'),
 'M2K': Int64Index([68], dtype='int64'),
 'M2L': Int64Index([81, 82], dtype='int64'),
 'M2M': Int64Index([95, 96], dty

In [9]:
s=','
neigh=[]
for code,ind in dictio.items():
    a=[]
    for i in ind:
        a.append(df.neighborhood[i])   
    b=s.join(a)
    neigh.append(b)
neigh

['Rouge,Malvern',
 'Highland Creek,Rouge Hill,Port Union',
 'Guildwood,Morningside,West Hill',
 'Woburn',
 'Cedarbrae',
 'Scarborough Village',
 'East Birchmount Park,Ionview,Kennedy Park',
 'Clairlea,Golden Mile,Oakridge',
 'Cliffcrest,Cliffside,Scarborough Village West',
 'Birch Cliff,Cliffside West',
 'Dorset Park,Scarborough Town Centre,Wexford Heights',
 'Maryvale,Wexford',
 'Agincourt',
 "Clarks Corners,Sullivan,Tam O'Shanter",
 "Agincourt North,L'Amoreaux East,Milliken,Steeles East",
 "L'Amoreaux West",
 'Upper Rouge',
 'Hillcrest Village',
 'Fairview,Henry Farm,Oriole',
 'Bayview Village',
 'Silver Hills,York Mills',
 'Newtonbrook,Willowdale',
 'Willowdale South',
 'York Mills West',
 'Willowdale West',
 'Parkwoods',
 'Don Mills North',
 'Flemingdon Park,Don Mills South',
 'Bathurst Manor,Downsview North,Wilson Heights',
 'Northwood Park,York University',
 'CFB Toronto,Downsview East',
 'Downsview West',
 'Downsview Central',
 'Downsview Northwest',
 'Victoria Village',
 'Woodb

Now I create the borough list discarding the duplicates.

In [10]:
df_unique=df[['postal_code','borough']].drop_duplicates()
df_unique.head(10)

Unnamed: 0,postal_code,borough
0,M3A,North York
1,M4A,North York
2,M5A,Downtown Toronto
3,M6A,North York
5,M7A,Queen's Park
6,M9A,Queen's Park
7,M1B,Scarborough
9,M3B,North York
10,M4B,East York
12,M5B,Downtown Toronto


Here I create the new dataframe with the grouped neighborhoods, and the postal code as index.

In [11]:
df_merge=pd.DataFrame(neigh,dictio.keys())
df_merge.head(10)

Unnamed: 0,0
M1B,"Rouge,Malvern"
M1C,"Highland Creek,Rouge Hill,Port Union"
M1E,"Guildwood,Morningside,West Hill"
M1G,Woburn
M1H,Cedarbrae
M1J,Scarborough Village
M1K,"East Birchmount Park,Ionview,Kennedy Park"
M1L,"Clairlea,Golden Mile,Oakridge"
M1M,"Cliffcrest,Cliffside,Scarborough Village West"
M1N,"Birch Cliff,Cliffside West"


Here I add the borough column with the command "merge" from pandas, using the postal code as common key, to maintain coherence with the data. After that I rename the columns with the correct name.

In [12]:
df_final = df_unique.merge(df_merge, left_on='postal_code', right_index=True)
df_final.reset_index(inplace=True, drop=True)
final_columns={'postal_code':'PostalCode','borough':'Borough',0:'Neighborhood'}
df_final.rename(columns=final_columns, inplace=True)
df_final.shape

(103, 3)

## 2 - Adding the geographical coordinates

Here I create a new dataframe with the geographical coordinates of the postal codes, and merge the two dataframes as before. The coordinates are from the csv file linked in the description page of the project.

In [13]:
geo=pd.read_csv('Geospatial_Coordinates.csv')

In [14]:
df_geo = df_final.merge(geo, left_on='PostalCode', right_on='Postal Code')
df_geo.drop(columns=['Postal Code'], inplace=True)
df_geo.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
3,M6A,North York,"Lawrence Heights,Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494
5,M9A,Queen's Park,Queen's Park,43.667856,-79.532242
6,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"Woodbine Gardens,Parkview Hill",43.706397,-79.309937
9,M5B,Downtown Toronto,"Ryerson,Garden District",43.657162,-79.378937


## 3 - Clustering the postal codes

### 3.1 - Prepare the dataframe

Before Clustering I need to find the features, and I do this obtaining the information on the venues for each postal code 
from Foursquare.

__Important: for this purpose, each postal code will be considerated as a neighborhood__

In [15]:
CLIENT_ID = 'V5LEQ0N20C5DMIPCG0VKDV5IRNKF40F4WP1BZCIT4ZTZYI5L' # your Foursquare ID
CLIENT_SECRET = 'PK3VMR4KJZ524KCBUQC500KXI2QS1TGBIRP0HD1XPJH24W1N' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 50
radius= 500

Now I create a function "getNearbyVenues" that runs along all the neighborhoods (or postal codes) and makes a request to Foursquare server, to find the first 50 venues within a radius of 500 metres from the relative geographical coordinates.

The result of the request, a json document, is then elaborated and only the relevant information is stored into the "nearby_venues" dataframe.
The inputs of the function are Series relative to the name of the location and the geographical coordinates. The output is the dataframe of the result of the request, withe the category and position of the venues.

In [16]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

After the function is defined, I have to make the real requests, so I declare the input vectors and the name of the dataframe to store the output, that is, "Toronto_neigh_venues".

In [17]:
Toronto_neigh_venues = getNearbyVenues(names=df_geo['PostalCode'],
                                   latitudes=df_geo['Latitude'],
                                   longitudes=df_geo['Longitude']
                                  )
Toronto_neigh_venues.head()

M3A
M4A
M5A
M6A
M7A
M9A
M1B
M3B
M4B
M5B
M6B
M9B
M1C
M3C
M4C
M5C
M6C
M9C
M1E
M4E
M5E
M6E
M1G
M4G
M5G
M6G
M1H
M2H
M3H
M4H
M5H
M6H
M1J
M2J
M3J
M4J
M5J
M6J
M1K
M2K
M3K
M4K
M5K
M6K
M1L
M2L
M3L
M4L
M5L
M6L
M9L
M1M
M2M
M3M
M4M
M5M
M6M
M9M
M1N
M2N
M3N
M4N
M5N
M6N
M9N
M1P
M2P
M4P
M5P
M6P
M9P
M1R
M2R
M4R
M5R
M6R
M7R
M9R
M1S
M4S
M5S
M6S
M1T
M4T
M5T
M1V
M4V
M5V
M8V
M9V
M1W
M4W
M5W
M8W
M9W
M1X
M4X
M5X
M8X
M4Y
M7Y
M8Y
M8Z


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M3A,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,M3A,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,M4A,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,M4A,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,M4A,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


This is a backup of the dataframe just created, in case I reach the limit of requests to Foursquare server. In that case I only have to recall the "Toronto_neigh_venues.csv" file, stored in my pc.

In [18]:
Toronto_neigh_venues.to_csv(r'C:\X Federico Bianchi Drive\Lab Python\Applied Data Science\Toronto_neigh_venues.csv')


Now I check the number of venues found per each postal code. As you can see many locations have very few venues, so the result of clustering may not be very significant.

In [19]:
Toronto_neigh_venues.groupby('Neighborhood').count().head()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
M1B,1,1,1,1,1,1
M1C,2,2,2,2,2,2
M1E,9,9,9,9,9,9
M1G,5,5,5,5,5,5
M1H,8,8,8,8,8,8


The following operation is very important, as I remove the categorical variables of type of venues, and make a new dataframe with each category expressed as a column. The value of the column will be "1" only for the correct type of venue. This operation can be easily done with the "get_dummies" command from pandas. The resulting dataframe is "Toronto_dummies".

In [20]:
Toronto_dummies = pd.get_dummies(Toronto_neigh_venues[['Venue Category']], prefix="", prefix_sep="")
Toronto_dummies['Neighborhood'] = Toronto_neigh_venues['Neighborhood'] 
neighborhoods = Toronto_dummies['Neighborhood']
Toronto_dummies.drop(labels=['Neighborhood'], axis=1,inplace = True)
Toronto_dummies.insert(0, 'Neighborhood', neighborhoods)

Toronto_dummies.head()

Unnamed: 0,Neighborhood,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,M3A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M3A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M4A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M4A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M4A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


This is another important step of the data preparation. In this step I aggregate all the venues in a certain postal code in a single row, and permorm a mean along the columns. This is actually a normalization of data. If you sum along each row the result will be "1".

In [21]:
Toronto_dummies = Toronto_dummies.groupby('Neighborhood').mean().reset_index()
Toronto_dummies.head(10)

Unnamed: 0,Neighborhood,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,M1B,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M1C,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,M1E,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M1G,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M1H,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,M1J,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,M1K,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,M1L,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,M1M,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,M1N,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


So I have a dataframe with 99 postal codes (originally they were 103, but 4 of them does not have any venue), with 252 different kind of venues (the other column is for the name of the neighborhood/postal code).

In [22]:
Toronto_dummies.shape

(100, 259)

### 3.2 - Clustering, finally!

In the following code I define the number of clusters and perform clustering with KMeans algorithm from sklearn, then I report the labels obtained into an array.

In [23]:
ncluster=4
Toronto_dummies_clust = Toronto_dummies.drop('Neighborhood', 1)
clustering = KMeans(n_clusters=ncluster, random_state=0).fit(Toronto_dummies_clust)
clustering.labels_

array([3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1,
       0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,
       0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       1, 2, 0, 0, 0, 0, 2, 1, 0, 1, 0, 0])

Now I'm creating a dataframe with the most 3 common venues per each zone, to gain information for the interpretation of clustering. The result is stored into the "Toronto_sorted" dataframe.

In [24]:
num_top_venues = 3

for hood in Toronto_dummies['Neighborhood']:
    temp = Toronto_dummies[Toronto_dummies['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})

In [25]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [26]:
num_top_venues = 3

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
Toronto_sorted = pd.DataFrame(columns=columns)
Toronto_sorted['Neighborhood'] = Toronto_dummies['Neighborhood']

for ind in np.arange(Toronto_dummies.shape[0]):
    Toronto_sorted.iloc[ind, 1:] = return_most_common_venues(Toronto_dummies.iloc[ind, :], num_top_venues)

Toronto_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,M1B,Fast Food Restaurant,Yoga Studio,Event Space
1,M1C,Moving Target,Bar,Yoga Studio
2,M1E,Electronics Store,Medical Center,Spa
3,M1G,Coffee Shop,Soccer Field,Korean Restaurant
4,M1H,Gas Station,Bakery,Fried Chicken Joint


Now I insert the result of clustering into the original dataframe "Toronto_sorted". After that I merge "Toronto_sorted" with the dataframe that contains the gorgraphical coordinates, with the "merge" command of pandas. This allow to successively create a map with "folium" package.

In [27]:
# add clustering labels
Toronto_sorted.insert(0, 'Cluster Labels', clustering.labels_)

In [28]:
# merge toronto_sorted with df_geo to add latitude/longitude for each neighborhood/postal_code
Toronto_sorted = Toronto_sorted.merge(df_geo, left_on='Neighborhood', right_on='PostalCode')
Toronto_sorted.drop('Neighborhood_x', 1, inplace=True)
Toronto_sorted.drop('Borough', 1, inplace=True)
Toronto_sorted.rename({'Neighborhood_y':'Neighborhood'}, axis=1, inplace=True)
Toronto_sorted=Toronto_sorted[['PostalCode','Neighborhood','Cluster Labels','Latitude','Longitude','1st Most Common Venue','2nd Most Common Venue','3rd Most Common Venue']]
Toronto_sorted.head()

Unnamed: 0,PostalCode,Neighborhood,Cluster Labels,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,M1B,"Rouge,Malvern",3,43.806686,-79.194353,Fast Food Restaurant,Yoga Studio,Event Space
1,M1C,"Highland Creek,Rouge Hill,Port Union",0,43.784535,-79.160497,Moving Target,Bar,Yoga Studio
2,M1E,"Guildwood,Morningside,West Hill",0,43.763573,-79.188711,Electronics Store,Medical Center,Spa
3,M1G,Woburn,0,43.770992,-79.216917,Coffee Shop,Soccer Field,Korean Restaurant
4,M1H,Cedarbrae,0,43.773136,-79.239476,Gas Station,Bakery,Fried Chicken Joint


Finally I create the map with all the neighboors, or postal codes, with the "folium" package, indicating the relative cluster with a different color per each cluster.

In [29]:
lat_toronto=43.716589
lon_toronto=-79.340686
map_neigh_toronto = folium.Map(location=[lat_toronto, lon_toronto], zoom_start=11)

In [30]:
# set color scheme for the clusters
x = np.arange(ncluster)
ys = [i + x + (i*x)**2 for i in range(ncluster)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Toronto_sorted['Latitude'], Toronto_sorted['Longitude'], Toronto_sorted['Neighborhood'], Toronto_sorted['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_neigh_toronto)
       
map_neigh_toronto

## 4 - Conclusions

Except for few postal codes/neighborhoods in the City Center, the number of the features (venues) were too small to perform a good clusterization (4 zones have none!!), I also tried different number of clusters, but the results are not very different.

It seems that most of the noighborhoods have the same features (regarding the venues) but they really have too few venues.

I could have tried a larger radius to obtain more venues per neighborhood, but then a lot of them could have pertained to more than one neighborhood, so I discarded this way.