# Segmenting and Clustering Neighborhoods in Toronto

In [8]:
#Importing required dependencies
import pandas as pd 
import numpy as np
import requests
import io

Obtaining the data from the provided Wikipedia page and creating a pandas dataframe which excludes the cells in which a Borough was not assigned

In [9]:
data_table=pd.read_html('https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=862527922',header=0)
df1=data_table[0]
df_clean=df1[df1['Borough']!='Not assigned']
df_clean.head()
df_clean.shape

(212, 3)

Combining into one row neighborhoods which belong to the same post code. These neighborhoods will be separated by a comma

In [10]:
df_join = df_clean.groupby(['Postcode', 'Borough'])['Neighbourhood'].apply(list).apply(lambda x:', '.join(x)).to_frame().reset_index()

In [11]:

df_join.shape

(103, 3)

In [12]:
df_join.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


Any neighborhood with the 'Not Assigned' id, will be replaced by the Borough name

In [14]:
df_final = df_join
df_final.loc[df_final['Neighbourhood']=='Not assigned','Neighbourhood']=df_final.loc[df_final['Neighbourhood']=='Not assigned']['Borough']

#An example of this case corrsponds to Postcode M7A:
df_final[df_final['Postcode']=='M7A']

Unnamed: 0,Postcode,Borough,Neighbourhood
85,M7A,Queen's Park,Queen's Park


Using the shape command to print the number of rows in the dataframe

In [15]:
df_final.shape[0]

103

Using the link to a csv file containing the geographical coordinates of each neighborhood, we can get the latitude and longitude



In [16]:
#url="http://cocl.us/Geospatial_data"
#answer=requests.get(url).content
#geo=pd.read_csv(io.StringIO(answer.decode('utf-8')))
geo = pd.read_csv("/Users/juan/Downloads/Geospatial_Coordinates.csv")
geo.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


We need to rename the first column of the acquired data frame in order to merge it with our existing data frame (df_final)

In [17]:
geo.columns = ['Postcode', 'Latitude', 'Longitude']
df_final = pd.merge(geo, df_final, on='Postcode')

Reordering the column names in the merged data frame, containing the latitude and longitude of each neighborhood

In [19]:
df_final = df_final[['Postcode', 'Borough', 'Neighbourhood', 'Latitude', 'Longitude']]
df_final.head()
df_final.shape

(103, 5)

In [20]:
df_final.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


# Create a map of Toronto with neighborhoods superimposed on top

In [21]:
import folium

tlat=43.653963
tlong=-79.387207
torontoMap=folium.Map(location=[tlat,tlong],zoom_start=10.7)

for lat,long,borough,neighborhood in zip(df_final['Latitude'],df_final['Longitude'],df_final['Borough'],df_final['Neighbourhood']):
    label='{}, {}'.format(neighborhood,borough)
    label=folium.Popup(label,parse_html=True)
    folium.CircleMarker([lat,long],
                       radius=5,
                       popup=label,
                       color='blue',
                       fill=True,
                       fill_color='#3186cc',
                       fill_opacity=0.7,
                       parse_html=False).add_to(torontoMap)
    
torontoMap

Finding the boroughs that contain the word Toronto

In [22]:
torontoData = df_final[df_final['Borough'].str.contains('Toronto')].reset_index(drop=True)
torontoData.shape

(38, 5)

In [23]:
torontoData.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


In [24]:
print('The geographical coordinates of Toronto are {}, {}'.format(tlat,tlong))

The geographical coordinates of Toronto are 43.653963, -79.387207


Create a visualization of the Boroughs that contain Toronto in their name

In [25]:
torontonameMap=folium.Map(location=[tlat,tlong],zoom_start=12)
for lat,long,label in zip(torontoData['Latitude'],torontoData['Longitude'],torontoData['Neighbourhood']):
    label=folium.Popup(label,parse_html=True)
    folium.CircleMarker([lat,long],
                       radius=5,
                       popup=label,
                       color='blue',
                       fill=True,
                       fill_color='#3186cc',
                       fil_opacity=0.7,
                       parse_html=False).add_to(torontonameMap)
torontonameMap

Define the Foursquare credentials and version

In [26]:
CLIENT_ID='OCAZNCAZQIXBS3DO52YEIUSZDUVF1N1GEMMCISX12RUEK41C'
CLIENT_SECRET='XLHVDTTDJZ3ZZBJQ2IMWZGUUYYWH2MFXBKGWGM4JTTIAVFDJ'
VERSION='20180605'
print('Your credentials:')
print('Client ID:  '+ CLIENT_ID)
print('Client Secret:  '+CLIENT_SECRET)

Your credentials:
Client ID:  OCAZNCAZQIXBS3DO52YEIUSZDUVF1N1GEMMCISX12RUEK41C
Client Secret:  XLHVDTTDJZ3ZZBJQ2IMWZGUUYYWH2MFXBKGWGM4JTTIAVFDJ


Exploring the first neighborhood in the dataframe

In [27]:
nName=torontoData.loc[0,'Neighbourhood']
nLat=torontoData.loc[0,'Latitude']
nLong=torontoData.loc[0,'Longitude']
print('Latitude and longitude values of {} are {}, {}.'.format(nName,nLat,nLong))

Latitude and longitude values of The Beaches are 43.67635739999999, -79.2930312.


Obtaining the top 100 venues located within Boroughs with Toronto in their name within a radius of 500 m

In [28]:
LIMIT = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{},&radius={}&limit={}'.format(
       CLIENT_ID,
       CLIENT_SECRET,
       VERSION,
       tlat,
       tlong,
       radius,
       LIMIT)

url

'https://api.foursquare.com/v2/venues/explore?client_id=OCAZNCAZQIXBS3DO52YEIUSZDUVF1N1GEMMCISX12RUEK41C&client_secret=XLHVDTTDJZ3ZZBJQ2IMWZGUUYYWH2MFXBKGWGM4JTTIAVFDJ&v=20180605&ll=43.653963,-79.387207,&radius=500&limit=100'

Sending the get request and examining the result

In [29]:
import requests
import json
results=requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e9f8037b4b684001c069c54'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Bay Street Corridor',
  'headerFullLocation': 'Bay Street Corridor, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 44,
  'suggestedBounds': {'ne': {'lat': 43.6584630045, 'lng': -79.38099903084075},
   'sw': {'lat': 43.649462995499995, 'lng': -79.39341496915925}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5227bb01498e17bf485e6202',
       'name': 'Downtown Toronto',
       'location': {'lat': 43.65323167517444,
        'lng': -79.38529600606677,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.65323167517444,
          

The get_category_type function from the Foursquare lab extracts the category of the venue

In [30]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

We will clean the json and structure it into a pandas dataframe

In [31]:
from pandas.io.json import json_normalize
venues=results['response']['groups'][0]['items']
venuesNearBy=json_normalize(venues)

# filter columns
filtered_columns=['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
venuesNearBy=venuesNearBy.loc[:, filtered_columns]

# clean columns
venuesNearBy.columns = [col.split(".")[-1] for col in venuesNearBy.columns]

venuesNearBy.head()

Unnamed: 0,name,categories,lat,lng
0,Downtown Toronto,"[{'id': '4f2a25ac4b909258e854f55f', 'name': 'N...",43.653232,-79.385296
1,Japango,"[{'id': '4bf58dd8d48988d1d2941735', 'name': 'S...",43.655268,-79.385165
2,Rolltation,"[{'id': '4bf58dd8d48988d111941735', 'name': 'J...",43.654918,-79.387424
3,Sansotei Ramen 三草亭,"[{'id': '55a59bace4b013909087cb24', 'name': 'R...",43.655157,-79.386501
4,Karine's,"[{'id': '4bf58dd8d48988d143941735', 'name': 'B...",43.653699,-79.390743


Checking number of venues returned by Foursquare

In [32]:
print('{} venues were returned by Foursquare.'.format(venuesNearBy.shape[0]))

44 venues were returned by Foursquare.


# Explore Neighborhoods in Toronto

We create a function that will reapeat the same process to all the neighborhoods in Toronto

In [33]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now write the code to run the above function on each neighborhood and create the dataframe torontoVenues

In [34]:
torontoVenues=getNearbyVenues(names=torontoData['Neighbourhood'],
                                   latitudes=torontoData['Latitude'],
                                   longitudes=torontoData['Longitude']
                                  )

The Beaches
The Danforth West, Riverdale
The Beaches West, India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront, Regent Park
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North, Forest Hill West
The Annex, North Midtown, Yorkville
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie
Dovercourt Village, Dufferin
Little Portugal, Trinity
Brockton, Exhibition Place, Parkdale Village
High Park, The 

Check the size of the resulting dataframe

In [36]:
print(torontoVenues.shape)
torontoVenues.head()

(1582, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,"The Danforth West, Riverdale",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant


Check how many venues were returned for each neighborhood

In [37]:
torontoVenues.rename(columns={'Neighborhood':'Neighbourhood'}, inplace = True)

In [38]:
torontoVenues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",94,94,94,94,94,94
Berczy Park,57,57,57,57,57,57
"Brockton, Exhibition Place, Parkdale Village",23,23,23,23,23,23
Business reply mail Processing Centre969 Eastern,17,17,17,17,17,17
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",16,16,16,16,16,16
"Cabbagetown, St. James Town",46,46,46,46,46,46
Central Bay Street,65,65,65,65,65,65
"Chinatown, Grange Park, Kensington Market",56,56,56,56,56,56
Christie,18,18,18,18,18,18
Church and Wellesley,72,72,72,72,72,72


Find out how many unique catergories can be curated from the returned values

In [39]:
print('There are {} unique categories.'.format(len(torontoVenues['Venue Category'].unique())))

There are 228 unique categories.


# Analyzing each Neighborhood

In [40]:
# one hot encoding
torontoOnehot=pd.get_dummies(torontoVenues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
torontoOnehot['Neighbourhood'] = torontoVenues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [torontoOnehot.columns[-1]] + list(torontoOnehot.columns[:-1])
torontoOnehot=torontoOnehot[fixed_columns]

torontoOnehot.head()

Unnamed: 0,Neighbourhood,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
1,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"The Danforth West, Riverdale",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Let's examine the new dataframe size

In [41]:
torontoOnehot.shape

(1582, 229)

Next, we group the rows by neighborhood by taking the mean of the frequency of occurrence of each category

In [42]:
torontoGrouped=torontoOnehot.groupby('Neighbourhood').mean().reset_index()
torontoGrouped

Unnamed: 0,Neighbourhood,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.031915,0.0,0.0,...,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.010638,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0
2,"Brockton, Exhibition Place, Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business reply mail Processing Centre969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0625,0.0625,0.0625,0.125,0.125,0.125,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.015385
7,"Chinatown, Grange Park, Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.035714,0.0,0.053571,0.017857,0.0,0.0
8,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,...,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778


Examining the new dataframe size

In [43]:
torontoGrouped.shape

(38, 229)

Let's write a function to sort the venues in descending order

In [44]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Let's create the new dataframe and display the top 10 venues for each neighborhood

In [45]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted=pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood']=torontoGrouped['Neighbourhood']

for ind in np.arange(torontoGrouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:]=return_most_common_venues(torontoGrouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Restaurant,Gym,American Restaurant,Clothing Store,Thai Restaurant,Hotel,Deli / Bodega,Cosmetics Shop
1,Berczy Park,Coffee Shop,Cocktail Bar,Seafood Restaurant,Farmers Market,Bakery,Italian Restaurant,Cheese Shop,Café,Restaurant,Beer Bar
2,"Brockton, Exhibition Place, Parkdale Village",Café,Nightclub,Coffee Shop,Breakfast Spot,Bakery,Convenience Store,Performing Arts Venue,Pet Store,Climbing Gym,Restaurant
3,Business reply mail Processing Centre969 Eastern,Park,Auto Workshop,Comic Shop,Pizza Place,Butcher,Recording Studio,Restaurant,Burrito Place,Brewery,Light Rail Station
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Lounge,Airport Service,Airport Terminal,Airport,Harbor / Marina,Rental Car Location,Sculpture Garden,Plane,Coffee Shop,Boat or Ferry


# Cluster Neighborhoods

In [48]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

torontoGroupedClustering=torontoGrouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans=KMeans(n_clusters=kclusters, random_state=0).fit(torontoGroupedClustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

In [49]:
kmeans.labels_

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0,
       4, 0, 1, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

In [50]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

torontoMerged = torontoData

torontoMerged = torontoMerged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

torontoMerged.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Health Food Store,Pub,Neighborhood,Trail,Dog Run,Dessert Shop,Diner,Discount Store,Distribution Center,Yoga Studio
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0,Greek Restaurant,Coffee Shop,Italian Restaurant,Furniture / Home Store,Bookstore,Ice Cream Shop,Yoga Studio,Pub,Pizza Place,Lounge
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572,0,Sandwich Place,Park,Fast Food Restaurant,Food & Drink Shop,Liquor Store,Burrito Place,Restaurant,Italian Restaurant,Fish & Chips Shop,Ice Cream Shop
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Café,Coffee Shop,Gastropub,Brewery,Bakery,American Restaurant,Yoga Studio,Comfort Food Restaurant,Sandwich Place,Cheese Shop
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,4,Park,Bus Line,Swim School,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop


Finally, let's visualize the resulting clusters

In [51]:
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters=folium.Map(location=[tlat,tlong], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array=cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(torontoMerged['Latitude'],torontoMerged['Longitude'],torontoMerged['Neighbourhood'],torontoMerged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Now that we have clustered the venues, lets examine them

## Cluster 1

In [52]:
torontoMerged.loc[torontoMerged['Cluster Labels'] == 0, torontoMerged.columns[[1] + list(range(5, torontoMerged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,0,Health Food Store,Pub,Neighborhood,Trail,Dog Run,Dessert Shop,Diner,Discount Store,Distribution Center,Yoga Studio
1,East Toronto,0,Greek Restaurant,Coffee Shop,Italian Restaurant,Furniture / Home Store,Bookstore,Ice Cream Shop,Yoga Studio,Pub,Pizza Place,Lounge
2,East Toronto,0,Sandwich Place,Park,Fast Food Restaurant,Food & Drink Shop,Liquor Store,Burrito Place,Restaurant,Italian Restaurant,Fish & Chips Shop,Ice Cream Shop
3,East Toronto,0,Café,Coffee Shop,Gastropub,Brewery,Bakery,American Restaurant,Yoga Studio,Comfort Food Restaurant,Sandwich Place,Cheese Shop
5,Central Toronto,0,Sandwich Place,Breakfast Spot,Hotel,Food & Drink Shop,Department Store,Gym,Park,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
6,Central Toronto,0,Clothing Store,Coffee Shop,Yoga Studio,Sporting Goods Shop,Mexican Restaurant,Dessert Shop,Diner,Café,Fast Food Restaurant,Salon / Barbershop
7,Central Toronto,0,Dessert Shop,Sandwich Place,Pizza Place,Café,Coffee Shop,Italian Restaurant,Gym,Sushi Restaurant,Park,Diner
9,Central Toronto,0,Pub,Coffee Shop,Bagel Shop,Supermarket,Bank,Sports Bar,Fried Chicken Joint,Pizza Place,Sushi Restaurant,American Restaurant
11,Downtown Toronto,0,Coffee Shop,Park,Restaurant,Café,Pub,Italian Restaurant,Bakery,Pizza Place,Grocery Store,Convenience Store
12,Downtown Toronto,0,Sushi Restaurant,Coffee Shop,Japanese Restaurant,Restaurant,Yoga Studio,Mediterranean Restaurant,Hotel,Smoke Shop,Gay Bar,Gastropub


The first cluster, offers more coffee shops and restaurants as top venues. 

## Cluster 2

In [53]:
torontoMerged.loc[torontoMerged['Cluster Labels'] == 1, torontoMerged.columns[[1] + list(range(5, torontoMerged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Central Toronto,1,Park,Trail,Playground,Summer Camp,Deli / Bodega,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant
10,Downtown Toronto,1,Park,Playground,Trail,Deli / Bodega,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant


The top venue for cluster 2, corresponding to Central Toronto are Parks followed by Trails and Playgrounds

## Cluster 3

In [54]:
torontoMerged.loc[torontoMerged['Cluster Labels'] == 2, torontoMerged.columns[[1] + list(range(5, torontoMerged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Central Toronto,2,Ice Cream Shop,Garden,Yoga Studio,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run


For cluster number 3, also located in Central Toronto, the top venue is an Ice Cream Shop

## Cluster 4

In [56]:
torontoMerged.loc[torontoMerged['Cluster Labels'] == 3, torontoMerged.columns[[1] + list(range(5, torontoMerged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,Central Toronto,3,Jewelry Store,Trail,Bus Line,Sushi Restaurant,Yoga Studio,Dessert Shop,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant


For cluster 4, the top venue is a Jewelry Store

## Cluster 5

In [57]:
torontoMerged.loc[torontoMerged['Cluster Labels'] == 4, torontoMerged.columns[[1] + list(range(5, torontoMerged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Central Toronto,4,Park,Bus Line,Swim School,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop


Similarly to cluster 2, cluster 5 also offers a park as its top venue, with other venues consisting in restaurants and shops

The clusters show that the greatest concentration of restaurants and coffee shops are located in Downtown and Central Toronto. Parks are also popular venues in Central Toronto.