# Clustering Toronto Neighbourhoods


## Introduction
In this notebook, we will explore and cluster neighbourhoods in Toronto. 

To do this we will need a list of all the neighborhoods in Toronto with details like their names, postal codes, boroughs, latitude and longitude values.

Once we have this data we can use it to find neighborhoods that are similar. We will use the K-Means algorithm to cluster the neighborhoods. Finally, we will visualize the clusters on a map.

This notebook will have 3 sections: Data Collection and Preprocessing, Fetching location data and Analysis. In the first section, we will get the data for the neighborhoods and process it. In the second section, we will get the location information for each neighborhood through an API. In the third section, we will use K-Means on the dataset and visualize the result on a map.

## Table of Contents
I. <a href="#section1">Data Collection and Preprocessing</a>
  1. <a href="#step1">Scrape neighbourhood data</a>
  2. <a href="#step2">Extract required data<a>
  3. <a href="#step3">Explore and Preprocess the dataset<a>
    
II. <a href="#section2">Fetch Location Data</a>
  1. <a href="#step4">Get location data</a>
  2. <a href="#step5">Add location data to the dataset</a>

III. <a href="#section3">Clustering and Analysis</a>
  1. <a href="#step6">Create a map of Toronto visualizing all the neighborhoods</a>
  2. <a href="#step7">Choose one borough for clustering</a>
  3. <a href="#step8">Get the top venues in each neighborhood from the chosen borough</a>
  4. <a href="#step9">Analyze each neighborhood</a>
  5. <a href="#step10">Cluster the neighborhoods</a>
  6. <a href="#step11">Examine the clusters</a>

## <a id="section1" style="text-decoration:none; color: #000;">I. Data Collection and Preprocessing</a>

### <a id="step1" style="text-decoration:none; color: #000;">1. Scrape neighbourhood data</a>
Let's start by getting the data for the neighbourhoods in Toronto.

The data we need can be found here: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M 

The wikipedia page has the neighborhood data displayed in a table. We will scrape this table and then extract the text content.

There are many python libraries and packages for web scraping. We will use one of the most common ones, BeautifulSoup. The installation details and documentation can be found here: https://beautiful-soup-4.readthedocs.io/en/latest/

In [1]:
# Import the libraries
from bs4 import BeautifulSoup
import pandas as pd
import requests

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

Get the html content from the web page and pass it to the BeautifulSoup constructor.

In [2]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

page = requests.get(url).text
soup = BeautifulSoup(page, "lxml")

# Print the title of the web page
print(soup.title.string)

List of postal codes of Canada: M - Wikipedia


The BeautifulSoup constructor also takes a parser argument. There are different parsers available. We will use lxml for it's speed. 

The soup object is an object which represents the html document as a tree. This can then be used to find elements by type, id, class or any other attributes.

### <a id="step2" style="text-decoration:none; color: #000;">2. Extract required data<a>

The html table element has the css classes wikitable and sortable. We can pass these as arguments to the soup object to get the table.

We will then loop through each row and extract the text content of each cell.

In [3]:
# Extract the table
postal_table = soup.find("table", {"class": "wikitable sortable"})

postal_data = []

# Get the table headers
headers = postal_table.findAll("th")
headers = [h.string.replace("\n", "") for h in headers]

# Loop through the table rows and extract the text of the elements
for row in postal_table.findAll("tr"):
    columns = row.findAll("td")
    if len(columns) > 0:
        post = {}
        for index in range(len(columns)):
            link = columns[index].find("a")
            if link is not None:
                post[headers[index]] = link.string.replace("\n", "")
            else:
                post[headers[index]] = columns[index].string.replace("\n", "")
        postal_data.append(post)

In [4]:
postal_data[0:5]

[{'Borough': 'Not assigned',
  'Neighbourhood': 'Not assigned',
  'Postcode': 'M1A'},
 {'Borough': 'Not assigned',
  'Neighbourhood': 'Not assigned',
  'Postcode': 'M2A'},
 {'Borough': 'North York', 'Neighbourhood': 'Parkwoods', 'Postcode': 'M3A'},
 {'Borough': 'North York',
  'Neighbourhood': 'Victoria Village',
  'Postcode': 'M4A'},
 {'Borough': 'Downtown Toronto',
  'Neighbourhood': 'Harbourfront',
  'Postcode': 'M5A'}]

Now that we have the table content in a list, let's convert it into a pandas dataframe.

In [5]:
postal_df = pd.DataFrame(postal_data)
postal_df.columns = ["Borough", "Neighborhood", "PostalCode"]

# Sort the values first by PostalCode and then by Neighborhood
postal_df = postal_df.sort_values(by=["PostalCode", "Neighborhood"]).reset_index(drop=True)

# Make PostalCode the first column
fixed_columns = [postal_df.columns[-1]] + list(postal_df.columns[:-1])
postal_df = postal_df[fixed_columns]

### <a id="step3" style="text-decoration:none; color: #000;">3. Explore and Preprocess the dataset<a>

Let's explore the dataset and fix any inconsistencies.

In [6]:
postal_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 289 entries, 0 to 288
Data columns (total 3 columns):
PostalCode      289 non-null object
Borough         289 non-null object
Neighborhood    289 non-null object
dtypes: object(3)
memory usage: 6.9+ KB


In [7]:
postal_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M1B,Scarborough,Malvern
2,M1B,Scarborough,Rouge
3,M1C,Scarborough,Highland Creek
4,M1C,Scarborough,Port Union


The first row has neither borough nor neighborhood. 

In [8]:
postal_df.describe(include="all")

Unnamed: 0,PostalCode,Borough,Neighborhood
count,289,289,289
unique,180,12,210
top,M8Y,Not assigned,Not assigned
freq,8,77,78


There are 77 boroughs and 78 neighborhoods with the value "Not assigned". 

Let's drop the rows without borough names. 

In [9]:
print("Original size of the dataset: {0}, {1}".format(postal_df.shape[0], postal_df.shape[1]))

postal_df = postal_df[postal_df["Borough"] != "Not assigned"]

print("New size of the dataset: {0}, {1}".format(postal_df.shape[0], postal_df.shape[1]))

Original size of the dataset: 289, 3
New size of the dataset: 212, 3


In [10]:
unique_neighborhoods = postal_df["Neighborhood"].unique().tolist()
print("There are {} unique neighborhoods".format(len(unique_neighborhoods)))

print("\n\nNumber of unassigned neighborhoods: {0}\n".format(unique_neighborhoods.count("Not assigned")))

postal_df[postal_df["Neighborhood"] == "Not assigned"]

There are 210 unique neighborhoods


Number of unassigned neighborhoods: 1



Unnamed: 0,PostalCode,Borough,Neighborhood
195,M7A,Queen's Park,Not assigned


One row has an assigned Borough but no Neighborhood. We will set the value of the Borough to the Neighborhood.

In [11]:
postal_df.loc[195, :]["Neighborhood"] = postal_df.loc[195, :]["Borough"]
postal_df.loc[195, :]

PostalCode               M7A
Borough         Queen's Park
Neighborhood    Queen's Park
Name: 195, dtype: object

In [12]:
postal_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
1,M1B,Scarborough,Malvern
2,M1B,Scarborough,Rouge
3,M1C,Scarborough,Highland Creek
4,M1C,Scarborough,Port Union
5,M1C,Scarborough,Rouge Hill


Some of the postal codes have multiple neighborhoods. For example, Highland Creek, Port Union and Rouge Hill have the postal code M1C. We will combine these into a single row with the neighborhood names separated by commas.

In [13]:
grouped_df = postal_df.groupby(["Borough", "PostalCode"])["Neighborhood"].apply(lambda x: ', '.join(x)).reset_index()
grouped_df.head(15)

Unnamed: 0,Borough,PostalCode,Neighborhood
0,Central Toronto,M4N,Lawrence Park
1,Central Toronto,M4P,Davisville North
2,Central Toronto,M4R,North Toronto West
3,Central Toronto,M4S,Davisville
4,Central Toronto,M4T,"Moore Park, Summerhill East"
5,Central Toronto,M4V,"Deer Park, Forest Hill SE, Rathnelly, South Hi..."
6,Central Toronto,M5N,Roselawn
7,Central Toronto,M5P,"Forest Hill North, Forest Hill West"
8,Central Toronto,M5R,"North Midtown, The Annex, Yorkville"
9,Downtown Toronto,M4W,Rosedale


In [14]:
grouped_df.shape

(103, 3)

## <a id="section2" style="text-decoration:none; color: #000;">II. Fetch Location Data</a>

We will be using the Foursquare API to get information about the different neighborhoods. For this, we need to get the latitude and longitude of each neighborhood.

### <a id="step4" style="text-decoration:none; color: #000;">1. Get location data</a>
The geocoder python package can be used to get location data for each neighborhood in the dataset. It takes in an address and returns the latitude and longitude. Documentation for the packages can be found here: https://geocoder.readthedocs.io/index.html. 

The geocoder API does not always return the location data. So we will use the following csv file containing the location data for the neighborhoods: https://cocl.us/Geospatial_data

In [15]:
loc_url = "https://cocl.us/Geospatial_data"

loc_df = pd.read_csv(loc_url)
loc_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


### <a id="step5" style="text-decoration:none; color: #000;">2. Add location data to the dataset</a>


Now we can merge both the datasets on postal code. The name for this column is different in each dataframe. So we use the left_on and right_on parameters to the merge function and drop the duplicate column.

In [16]:
toronto_df = grouped_df.merge(loc_df, left_on="PostalCode", right_on="Postal Code")
toronto_df.drop("Postal Code", axis=1, inplace=True)
toronto_df.head()

Unnamed: 0,Borough,PostalCode,Neighborhood,Latitude,Longitude
0,Central Toronto,M4N,Lawrence Park,43.72802,-79.38879
1,Central Toronto,M4P,Davisville North,43.712751,-79.390197
2,Central Toronto,M4R,North Toronto West,43.715383,-79.405678
3,Central Toronto,M4S,Davisville,43.704324,-79.38879
4,Central Toronto,M4T,"Moore Park, Summerhill East",43.689574,-79.38316


Rearraging the columns,

In [17]:
fixed_columns = [toronto_df.columns[1], toronto_df.columns[0]] + list(toronto_df.columns[2:])
toronto_df = toronto_df[fixed_columns]
toronto_df.head(12)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
1,M4P,Central Toronto,Davisville North,43.712751,-79.390197
2,M4R,Central Toronto,North Toronto West,43.715383,-79.405678
3,M4S,Central Toronto,Davisville,43.704324,-79.38879
4,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316
5,M4V,Central Toronto,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",43.686412,-79.400049
6,M5N,Central Toronto,Roselawn,43.711695,-79.416936
7,M5P,Central Toronto,"Forest Hill North, Forest Hill West",43.696948,-79.411307
8,M5R,Central Toronto,"North Midtown, The Annex, Yorkville",43.67271,-79.405678
9,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529


## <a id="section3" style="text-decoration:none; color: #000;">III. Clustering and Analysis</a>
### <a id="step6" style="text-decoration:none; color: #000;">1. Create a map of Toronto visualizing all the neighborhoods</a>

We will use <a href="http://python-visualization.github.io/folium/quickstart.html">Folium</a>, which is a python package that can be used to create interactive maps. 

We will also use the geopy package. It takes in an address and returns the latitude and longitude values for that place. We will use it fetch the coordinates of Toronto and use them to create a map with the neighborhoods from our dataset.

In [18]:
# Install and import Nominatim from geopy
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

!conda install -c conda-forge folium=0.5.0 --yes
import folium

Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
geopy                     1.18.1                     py_0    conda-forge
Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
folium                    0.5.0                      py_0    conda-forge


In [19]:
from geopy.extra.rate_limiter import RateLimiter

address = "Toronto, Ontario"

geolocator = Nominatim(user_agent="my_app")
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)
location = geocode(address)
latitude = location.latitude
longitude = location.longitude
print("The geograpical coordinates of Toronto are {}, {}.".format(latitude, longitude))

The geograpical coordinates of Toronto are 43.653963, -79.387207.


In [20]:
# Create a map of Toronto using latitude and longitude values
toronto_map = folium.Map(location=[latitude, longitude], zoom_start=11)

# Add markers to the map
for lat, lng, borough, neighborhood in zip(toronto_df["Latitude"], toronto_df["Longitude"], toronto_df["Borough"], toronto_df["Neighborhood"]):
    label = "{}, {}".format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color="blue",
        fill=True,
        fill_color="#3186cc",
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_map)  
    
toronto_map

You can zoom in or out and click on the cirle markers to reveal the name of the neighborhoods.

### <a id="step7" style="text-decoration:none; color: #000;">2. Choose one borough for clustering</a>
Let's focus on the neighborhoods in West, East, Central and Downtown Toronto for this analysis.

In [21]:
toronto_subset_df = toronto_df[toronto_df["Borough"].apply(lambda x: x.count("Toronto") > 0)].reset_index(drop=True)

print("Unique boroughs:")
print(toronto_subset_df["Borough"].unique().tolist())

toronto_subset_df.head()

Unique boroughs:
['Central Toronto', 'Downtown Toronto', 'East Toronto', 'West Toronto']


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
1,M4P,Central Toronto,Davisville North,43.712751,-79.390197
2,M4R,Central Toronto,North Toronto West,43.715383,-79.405678
3,M4S,Central Toronto,Davisville,43.704324,-79.38879
4,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316


In [22]:
address = "Toronto, Toronto, Ontario"

geolocator = Nominatim(user_agent="my_app")
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)
location = geocode(address)
latitude = location.latitude
longitude = location.longitude
print("The geograpical coordinates are {}, {}.".format(latitude, longitude))

The geograpical coordinates are 43.653963, -79.387207.


In [23]:
# Create map of Manhattan using latitude and longitude values
toronto_subset_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# Add markers to map
for lat, lng, label in zip(toronto_subset_df["Latitude"], toronto_subset_df["Longitude"], toronto_subset_df["Neighborhood"]):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color="blue",
        fill=True,
        fill_color="#3186cc",
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_subset_map)  
    
toronto_subset_map

### <a id="step8" style="text-decoration:none; color: #000;">3. Get the top venues in each neighborhood from the chosen borough</a>

We will use the Foursquare API to find the top venues in each of the neighborhoods in out dataset. Foursquare is a recommendation based website that let's users post reviews and photos about places they have visited. You can enter the name of a place and the type of place you are looking for, like restaurants, cafes, etc. and get recommendations based on other user's experiences. You can find more information on their website: https://foursquare.com/

Our API calls will need a client id and secret key for authentication. We get these when we open a developer account. There are call limits based on the type of account that you create. We will use the default free version. You can open an account using this link: https://foursquare.com/developers/signup

Let's start by defining our API credentials and version.

In [24]:
CLIENT_ID = "your-client-id" # your Foursquare ID
CLIENT_SECRET = "your-secret" # your Foursquare Secret
VERSION = "20190110" # Foursquare API version

In [25]:
# The code was removed by Watson Studio for sharing.

Let's get the top 100 venues within a 500 metre radius of each neighborhood.

In [26]:
def get_venues(limit, radius, neighborhoods, latitudes, longitudes):
    venues = []
    
    print("Requesting data for:")
    
    for neighborhood, lat, lon in zip(neighborhoods, latitudes, longitudes):
        print(neighborhood)
        
        # API request url
        url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}".format(
            CLIENT_ID, CLIENT_SECRET, lat, lon, VERSION, radius, limit)
        
        # Make a GET request
        response = requests.get(url).json()["response"]["groups"][0]["items"]
        
        # Append relevant data from response to venues list
        venues.append([{
            "Neighborhood": neighborhood,
            "Neighborhood Latitude": lat,
            "Neighborhood Longitude": lon,
            "Venue": r["venue"]["name"], 
            "Venue Latitude": r["venue"]["location"]["lat"], 
            "Venue Longitude": r["venue"]["location"]["lng"],  
            "Venue Category": r["venue"]["categories"][0]["name"]} for r in response])
    
    venues_df = pd.DataFrame([v for venue in venues for v in venue])
    
    return venues_df

In [28]:
LIMIT = 100
radius = 500
toronto_venues_df = get_venues(LIMIT, radius, toronto_subset_df["Neighborhood"], toronto_subset_df["Latitude"], toronto_subset_df["Longitude"])

Requesting data for:
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Roselawn
Forest Hill North, Forest Hill West
North Midtown, The Annex, Yorkville
Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront, Regent Park
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
Bathurst Quay, CN Tower, Harbourfront West, Island airport, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie
The Beaches
Riverdale, The Danforth West
India Bazaar, The Beaches West
Studio District
Business Reply Mail Processing Centre 969 Eastern
Dovercourt Village, Dufferin
Little Portuga

In [29]:
toronto_venues_df.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Category,Venue Latitude,Venue Longitude
0,Lawrence Park,43.72802,-79.38879,Lawrence Park Ravine,Park,43.726963,-79.394382
1,Lawrence Park,43.72802,-79.38879,Dim Sum Deluxe,Dim Sum Restaurant,43.726953,-79.39426
2,Lawrence Park,43.72802,-79.38879,Zodiac Swim School,Swim School,43.728532,-79.38286
3,Lawrence Park,43.72802,-79.38879,TTC Bus #162 - Lawrence-Donway,Bus Line,43.728026,-79.382805
4,Davisville North,43.712751,-79.390197,Sherwood Park,Park,43.716551,-79.387776


Let's check how many venues were returned for each neighborhood

In [30]:
toronto_venues_df.groupby("Neighborhood").count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Category,Venue Latitude,Venue Longitude
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
"Bathurst Quay, CN Tower, Harbourfront West, Island airport, King and Spadina, Railway Lands, South Niagara",14,14,14,14,14,14
Berczy Park,54,54,54,54,54,54
"Brockton, Exhibition Place, Parkdale Village",19,19,19,19,19,19
Business Reply Mail Processing Centre 969 Eastern,17,17,17,17,17,17
"Cabbagetown, St. James Town",47,47,47,47,47,47
Central Bay Street,82,82,82,82,82,82
"Chinatown, Grange Park, Kensington Market",100,100,100,100,100,100
Christie,15,15,15,15,15,15
Church and Wellesley,87,87,87,87,87,87


Let's find the unique venue categories in the dataframe

In [31]:
print(toronto_venues_df["Venue Category"].unique().tolist())
print("There are {} unique venue categories.".format(len(toronto_venues_df["Venue Category"].unique())))

['Park', 'Dim Sum Restaurant', 'Swim School', 'Bus Line', 'Food & Drink Shop', 'Breakfast Spot', 'Hotel', 'Grocery Store', 'Sandwich Place', 'Gym', 'Burger Joint', 'Clothing Store', 'Dance Studio', 'Diner', 'Yoga Studio', 'Salon / Barbershop', 'Coffee Shop', 'Sporting Goods Shop', 'Mexican Restaurant', 'Spa', 'Chinese Restaurant', 'Fast Food Restaurant', 'Dessert Shop', 'Furniture / Home Store', 'Bagel Shop', 'Rental Car Location', 'Café', 'Indian Restaurant', 'Pizza Place', 'Sushi Restaurant', 'Seafood Restaurant', 'Italian Restaurant', 'Toy / Game Store', 'Thai Restaurant', 'Brewery', 'Restaurant', 'Gourmet Shop', 'Greek Restaurant', 'Farmers Market', 'Pharmacy', 'Flower Shop', 'Discount Store', 'Fried Chicken Joint', 'Tennis Court', 'Playground', 'Convenience Store', 'Supermarket', 'American Restaurant', 'Pub', 'Sports Bar', 'Vietnamese Restaurant', 'Light Rail Station', 'Garden', 'Music Venue', 'Pool', 'Trail', 'Jewelry Store', 'Vegetarian / Vegan Restaurant', 'BBQ Joint', 'History

There is also a category called Neighborhood.

In [32]:
toronto_venues_df[toronto_venues_df["Venue Category"] == "Neighborhood"]

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Category,Venue Latitude,Venue Longitude
656,"Adelaide, King, Richmond",43.650571,-79.384568,Downtown Toronto,Neighborhood,43.653232,-79.385296
743,"Harbourfront East, Toronto Islands, Union Station",43.640816,-79.381752,Harbourfront,Neighborhood,43.639526,-79.380688
1403,The Beaches,43.676357,-79.293031,Upper Beaches,Neighborhood,43.680563,-79.292869
1477,Studio District,43.659526,-79.340923,Leslieville,Neighborhood,43.66207,-79.337856


We will drop these rows as they don't make much sense to avoid conflicts with our original Neighborhood column.

In [33]:
toronto_venues_dropped = toronto_venues_df.drop(toronto_venues_df[toronto_venues_df["Venue Category"] == "Neighborhood"].index, axis=0)
print(toronto_venues_dropped["Venue Category"].unique().tolist())
print("There are {} unique venue categories.".format(len(toronto_venues_dropped["Venue Category"].unique())))

['Park', 'Dim Sum Restaurant', 'Swim School', 'Bus Line', 'Food & Drink Shop', 'Breakfast Spot', 'Hotel', 'Grocery Store', 'Sandwich Place', 'Gym', 'Burger Joint', 'Clothing Store', 'Dance Studio', 'Diner', 'Yoga Studio', 'Salon / Barbershop', 'Coffee Shop', 'Sporting Goods Shop', 'Mexican Restaurant', 'Spa', 'Chinese Restaurant', 'Fast Food Restaurant', 'Dessert Shop', 'Furniture / Home Store', 'Bagel Shop', 'Rental Car Location', 'Café', 'Indian Restaurant', 'Pizza Place', 'Sushi Restaurant', 'Seafood Restaurant', 'Italian Restaurant', 'Toy / Game Store', 'Thai Restaurant', 'Brewery', 'Restaurant', 'Gourmet Shop', 'Greek Restaurant', 'Farmers Market', 'Pharmacy', 'Flower Shop', 'Discount Store', 'Fried Chicken Joint', 'Tennis Court', 'Playground', 'Convenience Store', 'Supermarket', 'American Restaurant', 'Pub', 'Sports Bar', 'Vietnamese Restaurant', 'Light Rail Station', 'Garden', 'Music Venue', 'Pool', 'Trail', 'Jewelry Store', 'Vegetarian / Vegan Restaurant', 'BBQ Joint', 'History

### <a id="step9" style="text-decoration:none; color: #000;">4. Analyze each neighborhood</a>

Let's use one hot encoding and convert venue category strings to numerical values

In [34]:
ohe_toronto_df = pd.get_dummies(toronto_venues_dropped[["Venue Category"]], prefix="", prefix_sep="")

# Add the neighborhood column to the new dataframe
ohe_toronto_df["Neighborhood"] = toronto_venues_dropped["Neighborhood"]

# Set neighborhood as the first column
fixed_columns = [ohe_toronto_df.columns[-1]] + list(ohe_toronto_df.columns[:-1])
ohe_toronto_df = ohe_toronto_df[fixed_columns]

ohe_toronto_df.head()

Unnamed: 0,Neighborhood,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Arepa Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Butcher,Café,Cajun / Creole Restaurant,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Gym,College Rec Center,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hospital,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Malay Restaurant,Market,Martial Arts Dojo,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Movie Theater,Museum,Music Store,Music Venue,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Piano Bar,Pizza Place,Plane,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Lawrence Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Lawrence Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Lawrence Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Lawrence Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Davisville North,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [35]:
print("Size of the new dataframe: {0}, {1}".format(ohe_toronto_df.shape[0], ohe_toronto_df.shape[1]))

Size of the new dataframe: 1703, 238


Let's group the neighborhoods and get the mean of the frequency of category occurences

In [36]:
toronto_grouped = ohe_toronto_df.groupby("Neighborhood").mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Arepa Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Butcher,Café,Cajun / Creole Restaurant,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Gym,College Rec Center,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hospital,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Malay Restaurant,Market,Martial Arts Dojo,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Movie Theater,Museum,Music Store,Music Venue,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Piano Bar,Pizza Place,Plane,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040404,0.0,0.0,0.0,0.010101,0.010101,0.0,0.030303,0.0,0.0,0.0,0.0,0.030303,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020202,0.0,0.010101,0.020202,0.0,0.0,0.010101,0.020202,0.010101,0.0,0.0,0.050505,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.010101,0.060606,0.0,0.0,0.0,0.0,0.0,0.020202,0.0,0.020202,0.0,0.0,0.0,0.0,0.0,0.010101,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020202,0.0,0.0,0.010101,0.0,0.0,0.010101,0.0,0.010101,0.0,0.030303,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.010101,0.010101,0.0,0.0,0.0,0.0,0.020202,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.010101,0.0,0.010101,0.0,0.0,0.0,0.0,0.010101,0.0,0.010101,0.010101,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020202,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.010101,0.0,0.0,0.030303,0.0,0.010101,0.010101,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.040404,0.0,0.0,0.0,0.020202,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040404,0.010101,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.010101,0.0,0.010101,0.0
1,"Bathurst Quay, CN Tower, Harbourfront West, Is...",0.0,0.0,0.071429,0.071429,0.071429,0.142857,0.142857,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.037037,0.0,0.0,0.0,0.018519,0.018519,0.037037,0.0,0.018519,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.018519,0.055556,0.074074,0.0,0.0,0.0,0.018519,0.0,0.018519,0.0,0.018519,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.037037,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.018519,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Brockton, Exhibition Place, Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.105263,0.0,0.052632,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824
5,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.021277,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.042553,0.0,0.021277,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.085106,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.021277,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.042553,0.021277,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.021277,0.021277,0.0,0.042553,0.0,0.021277,0.021277,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.085106,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012195,0.0,0.0,0.0,0.0,0.012195,0.0,0.0,0.0,0.0,0.0,0.0,0.012195,0.0,0.036585,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.036585,0.0,0.0,0.0,0.073171,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.012195,0.0,0.158537,0.0,0.0,0.0,0.0,0.012195,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012195,0.012195,0.0,0.012195,0.012195,0.0,0.0,0.012195,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012195,0.0,0.0,0.012195,0.0,0.0,0.0,0.012195,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012195,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.02439,0.0,0.0,0.0,0.04878,0.012195,0.0,0.0,0.0,0.0,0.012195,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012195,0.0,0.0,0.012195,0.012195,0.012195,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012195,0.0,0.0,0.0,0.0,0.012195,0.0,0.0,0.0,0.0,0.0,0.012195,0.0,0.0,0.0,0.012195,0.0,0.0,0.012195,0.0,0.0,0.012195,0.0,0.0,0.0,0.012195,0.0,0.02439,0.0,0.02439,0.0,0.0,0.012195,0.0,0.0,0.0,0.0,0.0,0.012195,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.012195,0.0,0.0,0.0,0.012195,0.0,0.0,0.0,0.0,0.0,0.0,0.012195,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.012195,0.0,0.0,0.012195,0.0,0.0,0.012195
7,"Chinatown, Grange Park, Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.07,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.02,0.01,0.0,0.0,0.07,0.0,0.02,0.01,0.03,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.01,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.05,0.0,0.04,0.01,0.0,0.0,0.0
8,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,0.011494,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.011494,0.0,0.022989,0.0,0.034483,0.011494,0.0,0.0,0.022989,0.0,0.011494,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.057471,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.011494,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.022989,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022989,0.045977,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.011494,0.0,0.0,0.011494,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.011494,0.0,0.0,0.011494,0.011494,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022989,0.022989,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.011494,0.0,0.0,0.011494,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022989,0.011494,0.0,0.0,0.0,0.034483,0.011494,0.0,0.011494,0.0,0.0,0.011494,0.011494,0.011494,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.011494,0.011494,0.0,0.0,0.057471,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.011494,0.011494,0.0,0.011494,0.0,0.011494


In [37]:
print("Size of the new grouped dataframe: {0}, {1}".format(toronto_grouped.shape[0], toronto_grouped.shape[1]))

Size of the new grouped dataframe: 38, 238


Let's get the top 10 venues in each neighborhood.

In [38]:
def get_most_common_venues(row, num_of_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_of_venues]

In [39]:
import numpy as np

num_of_venues = 10

indicators = ['st', 'nd', 'rd']

# Create columns according to the number of top venues
columns = ["Neighborhood"]
for ind in np.arange(num_of_venues):
    try:
        columns.append("{}{} Most Common Venue".format(ind + 1, indicators[ind]))
    except:
        columns.append("{}th Most Common Venue".format(ind + 1))

# Create a new dataframe
toronto_top10_venues = pd.DataFrame(columns=columns)
toronto_top10_venues["Neighborhood"] = toronto_grouped["Neighborhood"]

for ind in np.arange(toronto_grouped.shape[0]):
    toronto_top10_venues.iloc[ind, 1:] = get_most_common_venues(toronto_grouped.iloc[ind, :], num_of_venues)

toronto_top10_venues

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Steakhouse,Thai Restaurant,American Restaurant,Clothing Store,Gym,Hotel,Bakery,Bar
1,"Bathurst Quay, CN Tower, Harbourfront West, Is...",Airport Lounge,Airport Service,Airport Terminal,Boat or Ferry,Sculpture Garden,Plane,Airport,Airport Food Court,Airport Gate,Harbor / Marina
2,Berczy Park,Coffee Shop,Restaurant,Cocktail Bar,Café,Farmers Market,Pub,Seafood Restaurant,Cheese Shop,Beer Bar,Italian Restaurant
3,"Brockton, Exhibition Place, Parkdale Village",Breakfast Spot,Coffee Shop,Café,Burrito Place,Stadium,Bar,Caribbean Restaurant,Furniture / Home Store,Climbing Gym,Italian Restaurant
4,Business Reply Mail Processing Centre 969 Eastern,Yoga Studio,Garden,Pizza Place,Park,Recording Studio,Restaurant,Burrito Place,Brewery,Skate Park,Smoke Shop
5,"Cabbagetown, St. James Town",Restaurant,Coffee Shop,Bakery,Pizza Place,Market,Pub,Italian Restaurant,Café,Gastropub,Bank
6,Central Bay Street,Coffee Shop,Café,Italian Restaurant,Bar,Burger Joint,Thai Restaurant,Sandwich Place,Salad Place,Indian Restaurant,Ice Cream Shop
7,"Chinatown, Grange Park, Kensington Market",Bar,Café,Vegetarian / Vegan Restaurant,Bakery,Vietnamese Restaurant,Coffee Shop,Dumpling Restaurant,Chinese Restaurant,Mexican Restaurant,Dim Sum Restaurant
8,Christie,Café,Grocery Store,Park,Diner,Coffee Shop,Nightclub,Restaurant,Italian Restaurant,Baby Store,Convenience Store
9,Church and Wellesley,Japanese Restaurant,Coffee Shop,Sushi Restaurant,Gay Bar,Restaurant,Burger Joint,Gastropub,Fast Food Restaurant,Men's Store,Café


### <a id="step10" style="text-decoration:none; color: #000;">5. Cluster the neighborhoods</a>

Let's cluster these neighborhoods. We will be using the k-means algorithm from the scikit-learn package. Scikit-learn is a machine learning python package containing implementations of various machine learning algorithms. You can find installation details and documentation here: https://scikit-learn.org/stable/index.html

In [40]:
from sklearn.cluster import KMeans

num_of_clusters = 3

# Drop neighborhood column
toronto_venues = toronto_grouped.drop("Neighborhood", axis=1)

kmeans = KMeans(n_clusters=num_of_clusters).fit(toronto_venues)

# Print the cluster labels for each row
kmeans.labels_[0:10]

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

Let's create dataframe with the top 10 venues and their assigned clusters.

In [41]:
toronto_merged = toronto_subset_df

# Add generated cluster labels
toronto_merged["Cluster Labels"] = kmeans.labels_

# Merge toronto_merged with toronto_top10_venues to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(toronto_top10_venues.set_index("Neighborhood"), on="Neighborhood")

# Drop postal code as we won't be needing it
toronto_merged.drop("PostalCode", axis=1, inplace=True)

toronto_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,Lawrence Park,43.72802,-79.38879,1,Park,Swim School,Dim Sum Restaurant,Bus Line,Yoga Studio,Doner Restaurant,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Falafel Restaurant
1,Central Toronto,Davisville North,43.712751,-79.390197,1,Dance Studio,Burger Joint,Breakfast Spot,Gym,Hotel,Grocery Store,Sandwich Place,Park,Food & Drink Shop,Clothing Store
2,Central Toronto,North Toronto West,43.715383,-79.405678,1,Sporting Goods Shop,Coffee Shop,Clothing Store,Yoga Studio,Chinese Restaurant,Mexican Restaurant,Spa,Sandwich Place,Salon / Barbershop,Bagel Shop
3,Central Toronto,Davisville,43.704324,-79.38879,1,Sandwich Place,Pizza Place,Dessert Shop,Seafood Restaurant,Italian Restaurant,Sushi Restaurant,Café,Coffee Shop,Gym,Discount Store
4,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316,1,Playground,Park,Tennis Court,Gym,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant


Let's visualize the clusters on a map.

In [42]:
# Create a map
toronto_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
rainbow = ['#8000ff', '#00b5eb', '#b03304', '#ffb060', '#ff0000']

# add markers to the map
markers_colors = []
for lat, lon, neigh, cluster in zip(toronto_merged["Latitude"], toronto_merged["Longitude"], toronto_merged["Neighborhood"], toronto_merged["Cluster Labels"]):
    label = folium.Popup(str(neigh) + " [Cluster " + str(cluster) + "]", parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(toronto_clusters)
       
toronto_clusters

### <a id="step11" style="text-decoration:none; color: #000;">6. Examine the clusters</a>

Let's examine each of the clusters.

#### Cluster 1

In [43]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
24,Stn A PO Boxes 25 The Esplanade,Coffee Shop,Restaurant,Café,Cocktail Bar,Hotel,Pub,Beer Bar,Italian Restaurant,Seafood Restaurant,Cosmetics Shop
26,Christie,Café,Grocery Store,Park,Diner,Coffee Shop,Nightclub,Restaurant,Italian Restaurant,Baby Store,Convenience Store
31,Business Reply Mail Processing Centre 969 Eastern,Yoga Studio,Garden,Pizza Place,Park,Recording Studio,Restaurant,Burrito Place,Brewery,Skate Park,Smoke Shop


#### Cluster 2

In [44]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Lawrence Park,Park,Swim School,Dim Sum Restaurant,Bus Line,Yoga Studio,Doner Restaurant,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Falafel Restaurant
1,Davisville North,Dance Studio,Burger Joint,Breakfast Spot,Gym,Hotel,Grocery Store,Sandwich Place,Park,Food & Drink Shop,Clothing Store
2,North Toronto West,Sporting Goods Shop,Coffee Shop,Clothing Store,Yoga Studio,Chinese Restaurant,Mexican Restaurant,Spa,Sandwich Place,Salon / Barbershop,Bagel Shop
3,Davisville,Sandwich Place,Pizza Place,Dessert Shop,Seafood Restaurant,Italian Restaurant,Sushi Restaurant,Café,Coffee Shop,Gym,Discount Store
4,"Moore Park, Summerhill East",Playground,Park,Tennis Court,Gym,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant
5,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",Pub,Coffee Shop,Convenience Store,Light Rail Station,Supermarket,Fried Chicken Joint,Sports Bar,Sushi Restaurant,American Restaurant,Vietnamese Restaurant
6,Roselawn,Pool,Music Venue,Garden,Yoga Studio,Dog Run,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant
7,"Forest Hill North, Forest Hill West",Trail,Sushi Restaurant,Bus Line,Jewelry Store,Yoga Studio,Donut Shop,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Farmers Market
8,"North Midtown, The Annex, Yorkville",Café,Sandwich Place,Coffee Shop,Pizza Place,Pharmacy,Jewish Restaurant,Furniture / Home Store,BBQ Joint,Pub,Indian Restaurant
9,Rosedale,Park,Playground,Trail,Dog Run,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space


#### Cluster 3

In [45]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,"Dovercourt Village, Dufferin",Bakery,Supermarket,Pharmacy,Discount Store,Middle Eastern Restaurant,Bank,Fast Food Restaurant,Bar,Music Venue,Café


#### Observations

Cluster 2 has the most neighborhoods. These seem to be located in commercial areas with restaurants, stores and theatres. Restaurants are not as popular in the neighborhoods from clusters 1 and 3 as they are in cluster 2 with the exception of the first neighborhood from cluster 1 (Stn A PO Boxes 25 The Esplanade).