#Segmenting and Clustering Neighborhoods in Toronto

*In* this notebook I will explore, segment and cluster the neighborhoods in Toronto, Canada.
The data for this task will be scraped from a Wikipedia Article (https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M) as well as multiple other data sources.

##Part1 - Scraping and preparing the data

Part 1 will consist of scraping the initial Toronto neighborhood data set from Wikipedia and turning it into a pandas dataframe.


In [0]:
#Importing the necessary libraries
import requests #required to send https requests
from bs4 import BeautifulSoup #package for parsing HTML data
import pandas as pd
import numpy as np

In [0]:
website_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

In [0]:
#Scraping source code from website
website = requests.get(website_url).text

In [4]:
#Turning website source code into a readable format
soup = BeautifulSoup(website, 'lxml')
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of postal codes of Canada: M - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgMonthNamesShort":["","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"wgRequestId":"Xk4pBApAAEUAAEJ7l8sAAAAH","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":935851093,"wgRevisionId":935851093,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Communi

In [0]:
#Getting the source code for the table containing the Toronto neighborhood data
neighborhood_table = soup.find('table',{'class':'wikitable sortable'})

In [6]:
#Extracting column names

column_raw = neighborhood_table.find_all('th')
column_names = []

for col in column_raw:
  col_name = col.get_text().replace("\n", " ").strip()
  column_names.append(col_name)

#Replacing Postcode with PostalCode to be in  line with the assignment
column_names[0] = 'PostalCode'

print(column_names)

['PostalCode', 'Borough', 'Neighbourhood']


In [7]:
#Extracting row data

row_raw_data = neighborhood_table.find_all('tr')
row_data = []
row = []

for row_dat in row_raw_data:
  for r in row_dat.find_all('td'):
    row.append(r.get_text().replace("\n", " ").strip())
  row = []
  row_data.append(row)

print(row_data)

[['M1A', 'Not assigned', 'Not assigned'], ['M2A', 'Not assigned', 'Not assigned'], ['M3A', 'North York', 'Parkwoods'], ['M4A', 'North York', 'Victoria Village'], ['M5A', 'Downtown Toronto', 'Harbourfront'], ['M6A', 'North York', 'Lawrence Heights'], ['M6A', 'North York', 'Lawrence Manor'], ['M7A', 'Downtown Toronto', "Queen's Park"], ['M8A', 'Not assigned', 'Not assigned'], ['M9A', "Queen's Park", 'Not assigned'], ['M1B', 'Scarborough', 'Rouge'], ['M1B', 'Scarborough', 'Malvern'], ['M2B', 'Not assigned', 'Not assigned'], ['M3B', 'North York', 'Don Mills North'], ['M4B', 'East York', 'Woodbine Gardens'], ['M4B', 'East York', 'Parkview Hill'], ['M5B', 'Downtown Toronto', 'Ryerson'], ['M5B', 'Downtown Toronto', 'Garden District'], ['M6B', 'North York', 'Glencairn'], ['M7B', 'Not assigned', 'Not assigned'], ['M8B', 'Not assigned', 'Not assigned'], ['M9B', 'Etobicoke', 'Cloverdale'], ['M9B', 'Etobicoke', 'Islington'], ['M9B', 'Etobicoke', 'Martin Grove'], ['M9B', 'Etobicoke', 'Princess Gard

In [8]:
#Creating a Pandas dateframe by combining column names and row data

neighborhood_df = pd.DataFrame(data=row_data, columns=column_names)

print('Dataframe shape:', neighborhood_df.shape)
neighborhood_df.head()

Dataframe shape: (288, 3)


Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [9]:
#Cleaning the dataframe

#Droping entries without a borough
neighborhood_df = neighborhood_df.drop(neighborhood_df[neighborhood_df['Borough']=='Not assigned'].index.values).reset_index(drop=True)
print('Dataframe shape:', neighborhood_df.shape)
neighborhood_df.head()

Dataframe shape: (211, 3)


Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor


In [10]:
#Grouping boroughs with multiple neighborhoods

neighborhood_df = neighborhood_df.groupby(['PostalCode', 'Borough'])['Neighbourhood'].apply(lambda Neighbourhoods: ','.join(Neighbourhoods)).reset_index()

print('Dataframe shape:', neighborhood_df.shape)
neighborhood_df.head()

Dataframe shape: (103, 3)


Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [11]:
#Getting the size of the dataframe
print('The dataframe contains', neighborhood_df.shape[0], 'rows.')

The dataframe contains 103 rows.


In [0]:
#Focusing on the neighborhoods that include Toronto only
neighborhood_df = neighborhood_df[neighborhood_df['Borough'].str.contains("Toronto")].reset_index()

In [13]:
#Getting the size of the dataframe
print('The dataframe contains', neighborhood_df.shape[0], 'rows.')

The dataframe contains 39 rows.


##Part2 - Geocoding


In order to use the Neighbourhood data with the FourSquare API we need to add latitude and longitude coordinates for each of the neighbourhoods.

In [14]:
neighborhood_df.head()

Unnamed: 0,index,PostalCode,Borough,Neighbourhood
0,37,M4E,East Toronto,The Beaches
1,41,M4K,East Toronto,"The Danforth West,Riverdale"
2,42,M4L,East Toronto,"The Beaches West,India Bazaar"
3,43,M4M,East Toronto,Studio District
4,44,M4N,Central Toronto,Lawrence Park


In [18]:
#Scraping latitude and longitude from geocoder.ca

import time
import random

lat_long = []

for ZIP in neighborhood_df.PostalCode.iloc[np.r_[0:39]]: 
  x = requests.get('https://geocoder.ca/?addresst=&stno=1&city=&prov=ON&postal={}&geoit=GeoCode+it%21'.format(ZIP)).text
  x = BeautifulSoup(x, 'lxml')
  lat_long.append([x.find_all('strong')[0].get_text()])
  time.sleep(random.random()*3)


#Showing the first five pairs of geospatial data
lat_long[0:5]

[[' 43.68755,-79.291656'],
 [' 43.667854,-79.38896'],
 [' 43.670441,-79.311844'],
 [' 43.648724,-79.342476'],
 [' 43.719292,-79.386154']]

In [19]:
#Turning latitude and longitude data into a dataframe
lat_long_df = pd.DataFrame(lat_long)
lat_long_df = lat_long_df[0].str.split(',', expand = True)
lat_long_df.columns = ['Latitude', 'Longitude']
lat_long_df['PostalCode'] = neighborhood_df['PostalCode']

lat_long_df.head()

Unnamed: 0,Latitude,Longitude,PostalCode
0,43.68755,-79.291656,M4E
1,43.667854,-79.38896,M4K
2,43.670441,-79.311844,M4L
3,43.648724,-79.342476,M4M
4,43.719292,-79.386154,M4N


In [20]:
#Importing Geospatial data from .csv file in order to compare it to the scraped data
url = 'https://raw.githubusercontent.com/jofrank21/DataScience_Capstone/master/Geospatial_Coordinates.csv'

lat_long_csv = pd.read_csv(url, sep=',')

lat_long_csv.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [21]:
#Comparing the two sets of geospatial data

#Scraped geospatial data
print(lat_long_df.head())
print()

#Geospatial data from provided .csv file
print(lat_long_csv.head())

     Latitude   Longitude PostalCode
0    43.68755  -79.291656        M4E
1   43.667854   -79.38896        M4K
2   43.670441  -79.311844        M4L
3   43.648724  -79.342476        M4M
4   43.719292  -79.386154        M4N

  Postal Code   Latitude  Longitude
0         M1B  43.806686 -79.194353
1         M1C  43.784535 -79.160497
2         M1E  43.763573 -79.188711
3         M1G  43.770992 -79.216917
4         M1H  43.773136 -79.239476


Comparing the geopspatial data from scraping geocoder.ca and from the provided .csv file, shows cleary discrepancies between the two datasets. Therefore I will use the .csv for the remainder of the project.

In [22]:
#Joining latitude and longitude data with the neighborhood dataframe

neighborhood_df['Latitude'] = lat_long_csv['Latitude']
neighborhood_df['Longitude'] = lat_long_csv['Longitude']

neighborhood_df.head()

Unnamed: 0,index,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,37,M4E,East Toronto,The Beaches,43.806686,-79.194353
1,41,M4K,East Toronto,"The Danforth West,Riverdale",43.784535,-79.160497
2,42,M4L,East Toronto,"The Beaches West,India Bazaar",43.763573,-79.188711
3,43,M4M,East Toronto,Studio District,43.770992,-79.216917
4,44,M4N,Central Toronto,Lawrence Park,43.773136,-79.239476


##Part3 - Exploring and clustering the Toronto neighborhoods

In [0]:
#Importing the necessary libraries
import folium

In [24]:
  #Scraping geospatial data for Toronto
  geo_toronto = requests.get('https://geocoder.ca/?locate=toronto+ON+canada&geoit=GeoCode').text
  geo_toronto = BeautifulSoup(geo_toronto, 'lxml')
  
  geo_toronto = geo_toronto.find_all('strong')[0].get_text()

  toronto_lat = float(geo_toronto.split(',')[0].strip())
  toronto_long = float(geo_toronto.split(',')[1].strip())

  print('Latitude for Toronto is:' ,toronto_lat, 'and the longitude is:', toronto_long)

Latitude for Toronto is: 43.653226 and the longitude is: -79.383184


In [25]:
#Creating a map of Toronto's neighborhoods

map_toronto = folium.Map(location=[toronto_lat, toronto_long], zoom_start=11)

map_toronto

In [26]:
#Adding markers for the different neighborhoods to the map

for lat, lng, borough, neighborhood in zip(neighborhood_df['Latitude'], neighborhood_df['Longitude'], neighborhood_df['Borough'], neighborhood_df['Neighbourhood']):
  label = '{},{}'.format(neighborhood, borough)
  label = folium.Popup(label, parse_html=True)
  folium.CircleMarker(
      [lat, lng],
      radius = 4,
      popup = label,
      color = 'red',
      fill = False,
      parse_html = False).add_to(map_toronto)

#Displaying the map
map_toronto

#####Utilizing the Foursquare API to explore the Toronto neighborhoods and segment them.

In [0]:
# @hidden_cell
#Defining the Foursquare credentials and version

CLIENT_ID = ' '
CLIENT_SECRET = ' '
version = '20180604'

####Exploring the first neighborhood

In [28]:
#Lets see what the first neighborhood in our dataset is
neigh_name = neighborhood_df.Neighbourhood[0]
print(neigh_name)

The Beaches


In [29]:
#Getting the name and latitude and longitude of the first neighborhood
neigh_lat = neighborhood_df.Latitude[0]
neigh_lng = neighborhood_df.Longitude[0]

print('The latitude and longitude of {} are: {}, {}'.format(neigh_name, neigh_lat, neigh_lng))

The latitude and longitude of The Beaches are: 43.806686299999996, -79.19435340000001


In [0]:
#Getting the top 50 venues within 500 meters of our first neighborhood

#Setting limit and radius for Foursquare API call
limit = 50
radius = 1200

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    version, 
    neigh_lat, 
    neigh_lng,
    radius, 
    limit)

In [31]:
#Printing the name of the first venue
results = requests.get(url).json()

venue_one = results['response']['groups'][0]['items'][0]['venue']['name']

print('The first venues is', venue_one)

The first venues is Wendy's


In [0]:
#Function that extracts the category of the venue (borrowed from Foursquare Lab)

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [33]:
#Extracting nearby venues for all Toronto neighborhoods

venues_list=[] #list containing the venues for each neighborhood

for borough, name, lat, lng in zip(neighborhood_df.Borough, neighborhood_df.Neighbourhood, neighborhood_df.Latitude, neighborhood_df.Longitude):
    print(name)
            
        # create the API request URL
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        version, 
        lat, 
        lng, 
        radius, 
        limit)
            
      # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
        
       # return only relevant information for each nearby venue
    venues_list.append([(
          borough,
          name, 
          lat, 
          lng, 
          v['venue']['name'], 
          v['venue']['location']['lat'], 
          v['venue']['location']['lng'],  
          v['venue']['categories'][0]['name']) for v in results])

nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
nearby_venues.columns = ['Borough',
                         'Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']

The Beaches
The Danforth West,Riverdale
The Beaches West,India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park,Summerhill East
Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West
Rosedale
Cabbagetown,St. James Town
Church and Wellesley
Harbourfront
Ryerson,Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide,King,Richmond
Harbourfront East,Toronto Islands,Union Station
Design Exchange,Toronto Dominion Centre
Commerce Court,Victoria Hotel
Roselawn
Forest Hill North,Forest Hill West
The Annex,North Midtown,Yorkville
Harbord,University of Toronto
Chinatown,Grange Park,Kensington Market
CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place,Underground city
Christie
Dovercourt Village,Dufferin
Little Portugal,Trinity
Brockton,Exhibition Place,Parkdale Village
High Park,The Junction South
Parkdale,Roncesvalles
Runnymede

In [34]:
#Taking a look at the first five venues found
nearby_venues.head()

Unnamed: 0,Borough,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,East Toronto,The Beaches,43.806686,-79.194353,Wendy's,43.802008,-79.19808,Fast Food Restaurant
1,East Toronto,The Beaches,43.806686,-79.194353,Staples Morningside,43.800285,-79.196607,Paper / Office Supplies Store
2,East Toronto,The Beaches,43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
3,East Toronto,The Beaches,43.806686,-79.194353,Caribbean Wave,43.798558,-79.195777,Caribbean Restaurant
4,East Toronto,The Beaches,43.806686,-79.194353,Harvey's,43.80002,-79.198307,Restaurant


In [35]:
#Number of neighborhoods for which we could find nearby venues
print('There are {} neighborhoods with venues nearby,'.format(len(nearby_venues.Neighborhood.unique())))

There are 38 neighborhoods with venues nearby,


Seems like there are 3 neighborhoods where FourSquare could not find any venues within 500 metres.

In [36]:
#How many venues could we find in total?
print('There is a total of {} venues near our Toronto neighborhoods.'.format(nearby_venues.shape[0]))

There is a total of 1249 venues near our Toronto neighborhoods.


In [37]:
#How many unique categories are among our venues?
print('There are {} unique venue categories.'.format(len(nearby_venues['Venue Category'].unique())))

There are 203 unique venue categories.


In [84]:
#Taking a look at the distribution of the different venue categories

pd.set_option('display.max_rows', 203) #showing every row of the dataframe

venue_dist = nearby_venues['Venue Category'].value_counts().to_frame()
venue_dist['Share'] = venue_dist['Venue Category'] / nearby_venues.shape[0]


venue_dist.head()

Unnamed: 0,Venue Category,Share
Coffee Shop,97,0.077662
Pizza Place,56,0.044836
Park,52,0.041633
Chinese Restaurant,44,0.035228
Sandwich Place,38,0.030424


In [86]:
#Getting the share of venue categories that only appear once or twice.

sum(venue_dist['Venue Category'] <= 2) / len(nearby_venues['Venue Category'].unique())


0.5467980295566502

More than half of the venue categories only appear once or twice in our entire dataset. This might prove difficult for our later analysis and clustering.

####Analyzing each neighborhood

Having gathered all the necessary information about the Toronto neighborhoods, it is now time to analyze them.

In [38]:
#First step is to get the data we collected into the right format.

#One hot encoding
toronto_onehot = pd.get_dummies(nearby_venues['Venue Category'], prefix="", prefix_sep="")

print(toronto_onehot.shape)
print()
toronto_onehot.head()

(1249, 203)



Unnamed: 0,Airport,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Beach,Beer Store,Big Box Store,Bike Shop,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Butcher,Café,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,...,Salon / Barbershop,Sandwich Place,Science Museum,Seafood Restaurant,Shanghai Restaurant,Shop & Service,Shopping Mall,Skating Rink,Ski Area,Ski Chalet,Smoke Shop,Soccer Field,Soccer Stadium,Spa,Sporting Goods Shop,Sports Bar,Sri Lankan Restaurant,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Taiwanese Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [39]:
#Adding the one hot encoded venue categories to the neighborhood data to create a complete data set

toronto_fulldat = nearby_venues

for col in toronto_onehot.columns:
  if col == 'Neighborhood':
    toronto_fulldat['Neighborhood_ven'] = toronto_onehot[col]
  else:
    toronto_fulldat[col] = toronto_onehot[col]


#Take a look at our full data set
print(toronto_fulldat.shape) 
print()
toronto_fulldat.head()

(1249, 211)



Unnamed: 0,Borough,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Airport,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Beach,Beer Store,Big Box Store,Bike Shop,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Butcher,...,Salon / Barbershop,Sandwich Place,Science Museum,Seafood Restaurant,Shanghai Restaurant,Shop & Service,Shopping Mall,Skating Rink,Ski Area,Ski Chalet,Smoke Shop,Soccer Field,Soccer Stadium,Spa,Sporting Goods Shop,Sports Bar,Sri Lankan Restaurant,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Taiwanese Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
0,East Toronto,The Beaches,43.806686,-79.194353,Wendy's,43.802008,-79.19808,Fast Food Restaurant,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,East Toronto,The Beaches,43.806686,-79.194353,Staples Morningside,43.800285,-79.196607,Paper / Office Supplies Store,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,East Toronto,The Beaches,43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,East Toronto,The Beaches,43.806686,-79.194353,Caribbean Wave,43.798558,-79.195777,Caribbean Restaurant,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,East Toronto,The Beaches,43.806686,-79.194353,Harvey's,43.80002,-79.198307,Restaurant,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [0]:
#For the further analysis we wil focus only on Boroughs in the center of Toronto
toronto_fulldat = toronto_fulldat[toronto_fulldat['Borough'].str.contains("Toronto")]

Since not every neighborhood will have the same number of venues close by, we will check how many neighborhoods have at least 5 venues in their proximity.

In [41]:
#Creating dataframe that aggregates across venues for each neighborhood
venue_type_count = toronto_fulldat.groupby('Neighborhood').sum().reset_index()

venue_type_count['No_venues'] = venue_type_count.iloc[:,6:258].sum(axis=1)

print('There are {} neighborhoods that have at least 5 venues nearby'.format(sum(venue_type_count['No_venues'] >= 5)))

There are 38 neighborhoods that have at least 5 venues nearby


In [0]:
#Grouping by neighborhood and calculating ratio for each venue type
toronto_grouped = toronto_fulldat.groupby('Neighborhood').mean().reset_index()

#Adding the number of venues to the toronto_fulldat dataframe
toronto_grouped.insert(1, 'No_venues', venue_type_count['No_venues'])

In [43]:
toronto_grouped.head()

Unnamed: 0,Neighborhood,No_venues,Neighborhood Latitude,Neighborhood Longitude,Venue Latitude,Venue Longitude,Airport,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Beach,Beer Store,Big Box Store,Bike Shop,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Butcher,Café,Cajun / Creole Restaurant,...,Salon / Barbershop,Sandwich Place,Science Museum,Seafood Restaurant,Shanghai Restaurant,Shop & Service,Shopping Mall,Skating Rink,Ski Area,Ski Chalet,Smoke Shop,Soccer Field,Soccer Stadium,Spa,Sporting Goods Shop,Sports Bar,Sri Lankan Restaurant,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Taiwanese Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
0,"Adelaide,King,Richmond",50,43.778517,-79.346556,43.778419,-79.344297,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.02,0.04,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Brockton,Exhibition Place,Parkdale Village",33,43.761631,-79.520999,43.758256,-79.519632,0.0,0.030303,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.060606,0.0,0.0,0.0,0.0,0.0
2,Business Reply Mail Processing Centre 969 Eastern,50,43.70906,-79.363452,43.707794,-79.365684,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.02,0.02,0.0,0.0,0.06,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.02,0.0,...,0.0,0.04,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.04,0.02,0.0,0.0,0.02,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0
3,"CN Tower,Bathurst Quay,Island airport,Harbourf...",50,43.7259,-79.340923,43.728813,-79.341652,0.0,0.02,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.02,0.0,0.0,0.04,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,...,0.0,0.02,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.04,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0
4,"Cabbagetown,St. James Town",35,43.750072,-79.295849,43.747559,-79.296271,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.028571,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.028571,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.028571,0.0,0.0,0.0,0.0


In [44]:
#Getting the Top5 venues for each neighborhood 

for neighborhood in toronto_grouped['Neighborhood']:
    
    print("----"+neighborhood+"----")
    
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == neighborhood].T.reset_index()
    temp.columns = ['Venue','Freq']
    temp = temp.iloc[5:] #start at 5 to exclude values for latitude and longitude
    temp['Freq'] = temp['Freq'].astype(float)
    temp = temp.round({'Freq': 2})

    print(temp.sort_values('Freq', ascending=False).reset_index(drop=True).head(5))
    print('\n')


----Adelaide,King,Richmond----
                 Venue  Freq
0          Coffee Shop  0.12
1       Clothing Store  0.12
2          Gas Station  0.04
3  Japanese Restaurant  0.04
4            Juice Bar  0.04


----Brockton,Exhibition Place,Parkdale Village----
            Venue  Freq
0           Hotel  0.12
1         Theater  0.06
2  Discount Store  0.06
3     Coffee Shop  0.06
4     Pizza Place  0.06


----Business Reply Mail Processing Centre 969 Eastern----
              Venue  Freq
0       Coffee Shop  0.08
1            Bakery  0.06
2  Sushi Restaurant  0.04
3   Thai Restaurant  0.04
4        Restaurant  0.04


----CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara----
                 Venue  Freq
0          Coffee Shop  0.10
1           Restaurant  0.08
2                  Gym  0.06
3  Japanese Restaurant  0.04
4        Movie Theater  0.04


----Cabbagetown,St. James Town----
                       Venue  Freq
0  Middle Eastern Restaur

#####Turning the top venues for each neighborhood into a Pandas Dataframe

In [45]:
#Defining a function that will return the n top venues

def return_most_common_venues(row, n_top_venues):
    row_categories = row.iloc[5:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:n_top_venues]

#Setting the number of top venues we want to condsider. Given that 67 out of 103 have 5 or more venues, 5 seems a reasonable choice.
n_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(n_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], n_top_venues)

neighborhoods_venues_sorted.insert(1,'No_venues', toronto_grouped['No_venues'])
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,No_venues,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,"Adelaide,King,Richmond",50,Coffee Shop,Clothing Store,Japanese Restaurant,Juice Bar,Bakery
1,"Brockton,Exhibition Place,Parkdale Village",33,Hotel,Fast Food Restaurant,Pizza Place,Discount Store,Theater
2,Business Reply Mail Processing Centre 969 Eastern,50,Coffee Shop,Bakery,Indian Restaurant,Thai Restaurant,Sporting Goods Shop
3,"CN Tower,Bathurst Quay,Island airport,Harbourf...",50,Coffee Shop,Restaurant,Gym,Beer Store,Japanese Restaurant
4,"Cabbagetown,St. James Town",35,Middle Eastern Restaurant,Pizza Place,Grocery Store,Intersection,Burger Joint


#Clustering the Toronto neighborhoods

Now that we have prepared our data, we can go and train a k-means algorithm to cluster our neighborhoods

In [0]:
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_samples, silhouette_score

In [48]:
#Getting the cluster sizes for different numbers of cluster and calculating the Silhouette Score for each cluster

print('Clusters and their respective size')
print()

for k in range(2,11):
  toronto_grouped_clusters = toronto_grouped.drop(['Neighborhood','No_venues','Neighborhood Latitude','Neighborhood Longitude','Venue Latitude','Venue Longitude'], 1)

  #Run the k-means clustering algorithm
  kmeans = KMeans(n_clusters = k, random_state = 3).fit(toronto_grouped_clusters)
  print('--- k = {} --- Share --- Mean Silhouette Score ---'.format(k))
  for i in range(0,k):
    silhoutte_cluster = (silhouette_samples(toronto_grouped_clusters, kmeans.labels_)[kmeans.labels_ == i]).mean()
    print('Cluster {}:'.format(i), sum(kmeans.labels_ == i), '({0:.2f}%)'.format(100*sum(kmeans.labels_ == i)/len(kmeans.labels_)), '- - {0:.2f}'.format(silhoutte_cluster))
  print('--- --- --- --- ---')
  print('Silhouette Score:', silhouette_score(toronto_grouped_clusters, kmeans.labels_))
  print()

Clusters and their respective size

--- k = 2 --- Share --- Mean Silhouette Score ---
Cluster 0: 31 (81.58%) - - 0.25
Cluster 1: 7 (18.42%) - - -0.08
--- --- --- --- ---
Silhouette Score: 0.1868213195802903

--- k = 3 --- Share --- Mean Silhouette Score ---
Cluster 0: 7 (18.42%) - - -0.07
Cluster 1: 1 (2.63%) - - 0.00
Cluster 2: 30 (78.95%) - - 0.24
--- --- --- --- ---
Silhouette Score: 0.1768124046416386

--- k = 4 --- Share --- Mean Silhouette Score ---
Cluster 0: 16 (42.11%) - - 0.11
Cluster 1: 4 (10.53%) - - -0.09
Cluster 2: 1 (2.63%) - - 0.00
Cluster 3: 17 (44.74%) - - -0.01
--- --- --- --- ---
Silhouette Score: 0.033299617939596036

--- k = 5 --- Share --- Mean Silhouette Score ---
Cluster 0: 1 (2.63%) - - 0.00
Cluster 1: 20 (52.63%) - - 0.06
Cluster 2: 1 (2.63%) - - 0.00
Cluster 3: 15 (39.47%) - - 0.04
Cluster 4: 1 (2.63%) - - 0.00
--- --- --- --- ---
Silhouette Score: 0.04528209662871089

--- k = 6 --- Share --- Mean Silhouette Score ---
Cluster 0: 1 (2.63%) - - 0.00
Cluster 1:

Given that we obtain the second best silhouette score when using k = 3 clusters, the further clustering will be made with three clusters.

In [87]:
#Set the number of cluster
k = 3

toronto_grouped_clusters = toronto_grouped.drop(['Neighborhood','No_venues','Neighborhood Latitude','Neighborhood Longitude','Venue Latitude','Venue Longitude'], 1)

#Run the k-means clustering algorithm
kmeans = KMeans(n_clusters = k, random_state = 4, init='random').fit(toronto_grouped_clusters)

#Getting the cluster labels for the first five neighborhoods
kmeans.labels_[0:5]

array([2, 2, 2, 2, 2], dtype=int32)

No matter the size of k, there is always one large cluster, containing most of the neighborhoods. Furthermore, most clusters only contain 1 neighborhood.

Combining the top venues and cluster data to a new dataframe

In [89]:
#Add the clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = neighborhood_df

#Merge Toronto_grouped with Toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')

#Drop rows where Cluster Label is NaN
toronto_merged = toronto_merged.dropna(subset=['Cluster Labels'])

toronto_merged.head()

Unnamed: 0,index,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,No_venues,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,37,M4E,East Toronto,The Beaches,43.806686,-79.194353,2.0,19.0,Fast Food Restaurant,Trail,Zoo Exhibit,Filipino Restaurant,Spa
1,41,M4K,East Toronto,"The Danforth West,Riverdale",43.784535,-79.160497,0.0,6.0,Playground,Breakfast Spot,Gym / Fitness Center,Park,Italian Restaurant
2,42,M4L,East Toronto,"The Beaches West,India Bazaar",43.763573,-79.188711,2.0,27.0,Pizza Place,Fast Food Restaurant,Coffee Shop,Bank,Sandwich Place
3,43,M4M,East Toronto,Studio District,43.770992,-79.216917,2.0,20.0,Pizza Place,Park,Department Store,Indian Restaurant,Coffee Shop
4,44,M4N,Central Toronto,Lawrence Park,43.773136,-79.239476,2.0,43.0,Coffee Shop,Bakery,Indian Restaurant,Gas Station,Pharmacy


Visualizing the clusters with Folium

In [90]:
import matplotlib.cm as cm
import matplotlib.colors as colors

#Create map centered around Toronto
map_clusters = folium.Map(location=[toronto_lat, toronto_long], zoom_start=11)

#Set the color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    cluster = int(cluster)
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

####Examining the clusters

Having plotted the different cluster on the map, I will now take a closer look at the venue categories, to identify how each cluster distinguished from one another.

In [91]:
#Cluster 1
pd.set_option('display.max_rows', 100) #showing every row of the dataframe
print(toronto_merged[toronto_merged['Cluster Labels'] == 0].shape)

toronto_merged[toronto_merged['Cluster Labels'] == 0]

(1, 13)


Unnamed: 0,index,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,No_venues,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,41,M4K,East Toronto,"The Danforth West,Riverdale",43.784535,-79.160497,0.0,6.0,Playground,Breakfast Spot,Gym / Fitness Center,Park,Italian Restaurant


Cluster 1 consists of only one neighborhood and is distinguished by playgrounds and breakfast spots.

In [92]:
#Cluster 2
print(toronto_merged[toronto_merged['Cluster Labels'] == 1].shape)

toronto_merged[toronto_merged['Cluster Labels'] == 1]

(9, 13)


Unnamed: 0,index,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,No_venues,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
7,47,M4S,Central Toronto,Davisville,43.711112,-79.284577,1.0,31.0,Intersection,Bakery,Bus Station,Bus Line,Park
9,49,M4V,Central Toronto,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",43.692657,-79.264848,1.0,12.0,Park,Ice Cream Shop,Thai Restaurant,Café,General Entertainment
17,57,M5G,Downtown Toronto,Central Bay Street,43.803762,-79.363452,1.0,33.0,Park,Pharmacy,Chinese Restaurant,Bank,Bakery
19,59,M5J,Downtown Toronto,"Harbourfront East,Toronto Islands,Union Station",43.786947,-79.385975,1.0,15.0,Bank,Japanese Restaurant,Gas Station,Trail,Café
20,60,M5K,Downtown Toronto,"Design Exchange,Toronto Dominion Centre",43.75749,-79.374714,1.0,9.0,Park,Japanese Restaurant,Pub,Gym / Fitness Center,Gym
23,64,M5P,Central Toronto,"Forest Hill North,Forest Hill West",43.752758,-79.400049,1.0,40.0,Coffee Shop,Park,Gym,Restaurant,Thai Restaurant
25,66,M5S,Downtown Toronto,"Harbord,University of Toronto",43.753259,-79.329656,1.0,27.0,Park,Pharmacy,Bus Stop,Convenience Store,Cosmetics Shop
31,76,M6H,West Toronto,"Dovercourt Village,Dufferin",43.739015,-79.506944,1.0,8.0,Park,Pizza Place,Vietnamese Restaurant,Shopping Mall,Coffee Shop
32,77,M6J,West Toronto,"Little Portugal,Trinity",43.728496,-79.495697,1.0,15.0,Vietnamese Restaurant,Pharmacy,Park,Fast Food Restaurant,Coffee Shop


Cluster 2 is dominated by parks - so this will be the outdoors cluster.

In [93]:
#Cluster 3
print(toronto_merged[toronto_merged['Cluster Labels'] == 2].shape)
toronto_merged[toronto_merged['Cluster Labels'] == 2]

(28, 13)


Unnamed: 0,index,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,No_venues,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,37,M4E,East Toronto,The Beaches,43.806686,-79.194353,2.0,19.0,Fast Food Restaurant,Trail,Zoo Exhibit,Filipino Restaurant,Spa
2,42,M4L,East Toronto,"The Beaches West,India Bazaar",43.763573,-79.188711,2.0,27.0,Pizza Place,Fast Food Restaurant,Coffee Shop,Bank,Sandwich Place
3,43,M4M,East Toronto,Studio District,43.770992,-79.216917,2.0,20.0,Pizza Place,Park,Department Store,Indian Restaurant,Coffee Shop
4,44,M4N,Central Toronto,Lawrence Park,43.773136,-79.239476,2.0,43.0,Coffee Shop,Bakery,Indian Restaurant,Gas Station,Pharmacy
5,45,M4P,Central Toronto,Davisville North,43.744734,-79.239476,2.0,24.0,Pharmacy,Ice Cream Shop,Breakfast Spot,Sandwich Place,Bookstore
6,46,M4R,Central Toronto,North Toronto West,43.727929,-79.262029,2.0,29.0,Coffee Shop,Chinese Restaurant,Fast Food Restaurant,Discount Store,Convenience Store
8,48,M4T,Central Toronto,"Moore Park,Summerhill East",43.716316,-79.239476,2.0,22.0,Harbor / Marina,Pizza Place,Beach,Ice Cream Shop,Discount Store
10,50,M4W,Downtown Toronto,Rosedale,43.75741,-79.273304,2.0,47.0,Indian Restaurant,Restaurant,Fast Food Restaurant,Coffee Shop,Bakery
11,51,M4X,Downtown Toronto,"Cabbagetown,St. James Town",43.750072,-79.295849,2.0,35.0,Middle Eastern Restaurant,Pizza Place,Grocery Store,Intersection,Burger Joint
12,52,M4Y,Downtown Toronto,Church and Wellesley,43.7942,-79.262029,2.0,50.0,Chinese Restaurant,Shopping Mall,Bakery,Park,Pizza Place


Cluster 3 is especially comprised of Coffee Shops and Asian Restaurants - so this will be the food & drinks cluster.

Thank you for reading and reviewing my notebook - looking forward to your feedback!