# Introduction

We are going to explore Toronto, Ontario using Foursquare API and other python libraries.
Using Foursquare API, we are going to gather information about the area including venue nearby and its respective venue category. 
Finally, we are going to cluster the neighborhood based on venue (i.e. coffee shops, parks, restaurants, etc.) present in each neighborhood.

## Data Gathering

We are going to scrape the webpage, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M. 
The data will consist of postal codes, borough and neighborhood. Then we will use another data from different website to fill in the coordinate information to each postal codes. 

Let us first download and install all the dependencies that we will need.

In [1]:
# Install the required packages
#!conda install anaconda bs4 -y
#!conda install -c conda-forge folium=0.5.0 --yes
#!conda install -c conda-forge geopy --yes

In [2]:
# Import required libraries
from bs4 import BeautifulSoup
from requests import get
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize
from sklearn.cluster import KMeans

import pandas as pd
import numpy as np
import json
import requests
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium

print ('Libraries imported')

Libraries imported


### 1. Website Scraping and Preprocessing

Create a beautiful soup object and use a html parser to parse the data. Then we are going to scrape the data from the website and store it to a list.

In [3]:
# Define the url of website and create beautifulsoup object
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
response = get(url)
html_soup = BeautifulSoup(response.text, 'html.parser')

# Store the table contents to a list
contents = []

table = html_soup.find('table')   
table_body = table.find('tbody')

rows = table_body.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [element.text.strip() for element in cols]
    if cols:
        contents.append([element for element in cols if element])

# Print samples
print (contents[0:5])

[['M1A', 'Not assigned', 'Not assigned'], ['M2A', 'Not assigned', 'Not assigned'], ['M3A', 'North York', 'Parkwoods'], ['M4A', 'North York', 'Victoria Village'], ['M5A', 'Downtown Toronto', 'Regent Park, Harbourfront']]


Import the data list to a pandas dataframe. Then preprocess the data by removing rows with 'Not assigned' borough values, and for neighborhood with 'Not assigned' values but with borough value that is not 'Not assigned', we are going to use its borough value as neighborhood value.

In [4]:
# Create dataframe
df = pd.DataFrame(data=contents, columns=['Postal Code', 'Borough', 'Neighborhood'])
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [5]:
# Remove rows with borough of 'Not assigned' value
df = df[df.Borough != 'Not assigned']
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [6]:
# Check the data for multiple postal code values
df.describe()

Unnamed: 0,Postal Code,Borough,Neighborhood
count,103,103,103
unique,103,10,99
top,M1R,North York,Downsview
freq,1,24,4


From the above, we can see that there are 103 unique values for 103 items for the postal code columns. No duplicate Postal Codes.

In [7]:
# For neighborhood with 'Not assigned' value, we are going to use its borough value as neighborhood value.
df.loc[df['Neighborhood'] == 'Not assigned', 'Neighborhood'] = df['Borough']
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


Check the dataframe shape

In [8]:
df.shape

(103, 3)

### 2. Geographical Coordinates Input

Let us download the coordinates data from http://cocl.us/Geospatial_data.

In [9]:
!wget -q -O 'coordinates.csv' http://cocl.us/Geospatial_data
print('Data downloaded!')

Data downloaded!


Now using the coordinates data, let us fill up the Postal Codes with their corresponding latitude and longitude values.

In [10]:
# Set df index to Postal Code
df.set_index('Postal Code', inplace=True)
print (df.head())

# Read coordinates table and set index to Postal Code
df_coord = pd.read_csv('coordinates.csv', index_col='Postal Code')
print (df_coord.head())

                      Borough                                 Neighborhood
Postal Code                                                               
M3A                North York                                    Parkwoods
M4A                North York                             Victoria Village
M5A          Downtown Toronto                    Regent Park, Harbourfront
M6A                North York             Lawrence Manor, Lawrence Heights
M7A          Downtown Toronto  Queen's Park, Ontario Provincial Government
              Latitude  Longitude
Postal Code                      
M1B          43.806686 -79.194353
M1C          43.784535 -79.160497
M1E          43.763573 -79.188711
M1G          43.770992 -79.216917
M1H          43.773136 -79.239476


In [11]:
# Take latitude and longitude values for each given Postal Code and update the dataframe with the given values.
for ind in df.index:
    df.loc[ind, 'latitude'] = df_coord.loc[ind, 'Latitude']
    df.loc[ind, 'longitude'] = df_coord.loc[ind, 'Longitude']

Briefly check the output dataframe.

In [12]:
df.head()

Unnamed: 0_level_0,Borough,Neighborhood,latitude,longitude
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
M3A,North York,Parkwoods,43.753259,-79.329656
M4A,North York,Victoria Village,43.725882,-79.315572
M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


It looks nice.
Now, let's find the coordinates of Toronto, Ontario using geolocator. 

### 3. Geographical Map Creation

Let us find the geographical coordinates on Toronto, Ontario using geolocator.

In [13]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City is ({}, {})'.format(latitude, longitude))

The geograpical coordinate of Toronto City is (43.6534817, -79.3839347)


Using the coordinates we fetched from above, let's plot the map of Toronto, Ontario using folium. We will also superimpose markers to the map indicating each Neighborhood, Borough.

In [14]:
# create map of Toronto
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['latitude'], df['longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### 4. Data Gathering Using Foursquare

Store API credentials to variables.

In [None]:
CLIENT_ID = '========================================' # your Foursquare ID
CLIENT_SECRET = '========================================' # your Foursquare Secret
VERSION = '20200703' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Let us define a function to get venues within 1000m radius from the list of neighborhoods.

In [16]:
# Explore neighborhood
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            100)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Let us use the function above to get the venues within 1000m from the neighborhoods in our dataframe.

In [17]:
toronto_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['latitude'],
                                   longitudes=df['longitude']
                                  )

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

Quickly check the dataframe of venues gathered from the above step.

In [18]:
print(toronto_venues.shape)
toronto_venues.head()

(4899, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Allwyn's Bakery,43.75984,-79.324719,Caribbean Restaurant
1,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
2,Parkwoods,43.753259,-79.329656,Tim Hortons,43.760668,-79.326368,Café
3,Parkwoods,43.753259,-79.329656,A&W,43.760643,-79.326865,Fast Food Restaurant
4,Parkwoods,43.753259,-79.329656,Bruno's valu-mart,43.746143,-79.32463,Grocery Store


As we can see from above, we now have the extra columns for Venue and its corresponding category.
To see it more clearly. let us group the data based on their corresponding neighborhood.

In [19]:
toronto_venues.groupby('Neighborhood').count().head()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,45,45,45,45,45,45
"Alderwood, Long Branch",23,23,23,23,23,23
"Bathurst Manor, Wilson Heights, Downsview North",31,31,31,31,31,31
Bayview Village,16,16,16,16,16,16
"Bedford Park, Lawrence Manor East",41,41,41,41,41,41


It looks nice.
Let's check more information about the above result.

In [20]:
toronto_venues.describe(include='all')

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
count,4899,4899.0,4899.0,4899,4899.0,4899.0,4899
unique,98,,,2799,,,330
top,Studio District,,,Tim Hortons,,,Coffee Shop
freq,100,,,101,,,386
mean,,43.684692,-79.39241,,43.684398,-79.392709,
std,,0.044842,0.068861,,0.044692,0.068764,
min,,43.602414,-79.615819,,43.593866,-79.62696,
25%,,43.651494,-79.41975,,43.650775,-79.419212,
50%,,43.668999,-79.384568,,43.666747,-79.386557,
75%,,43.70906,-79.360636,,43.707287,-79.36064,


It is interesting to note that the most frequent venue category is a coffee shop.

In [21]:
# Check the number of Venue category
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 330 uniques categories.


## Neighborhood Exploration

Check for the top 10 venue categories from the neighborhoods.

In [22]:
NV = toronto_venues[['Venue Category','Neighborhood']].groupby('Venue Category').count().sort_values(by=['Neighborhood'], ascending=False)
NV.head(10)

Unnamed: 0_level_0,Neighborhood
Venue Category,Unnamed: 1_level_1
Coffee Shop,386
Café,207
Pizza Place,151
Park,149
Restaurant,144
Italian Restaurant,111
Bakery,106
Grocery Store,99
Japanese Restaurant,88
Sandwich Place,87


Let us one-hot encode the venue category so we can use it later to our clustering algorithm.

In [23]:
# one hot encoding 'Venue Category'
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
toronto_onehot.head()

Unnamed: 0,Accessories Store,Afghan Restaurant,African Restaurant,Airport,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Art Gallery,...,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [24]:
# Check the dataframe
toronto_onehot.describe()

Unnamed: 0,Accessories Store,Afghan Restaurant,African Restaurant,Airport,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Art Gallery,...,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
count,4899.0,4899.0,4899.0,4899.0,4899.0,4899.0,4899.0,4899.0,4899.0,4899.0,...,4899.0,4899.0,4899.0,4899.0,4899.0,4899.0,4899.0,4899.0,4899.0,4899.0
mean,0.000612,0.000408,0.000204,0.000408,0.007757,0.000408,0.000204,0.000612,0.000612,0.00592,...,0.000612,0.006328,0.000204,0.000204,0.002041,0.000612,0.000612,0.001021,0.006328,0.000204
std,0.024741,0.020203,0.014287,0.020203,0.087739,0.020203,0.014287,0.024741,0.024741,0.076719,...,0.024741,0.079304,0.014287,0.014287,0.045138,0.024741,0.024741,0.031934,0.079304,0.014287
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [25]:
# add neighborhood column values back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood']

# Group by neighborhood and display mean for each venue category.
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,African Restaurant,Airport,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,...,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0


In [26]:
# Check the size
toronto_grouped.shape

(98, 330)

Let us display the top 5 venues in each neighborhood.

In [27]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                  venue  freq
0    Chinese Restaurant  0.11
1         Shopping Mall  0.09
2                Bakery  0.04
3        Sandwich Place  0.04
4  Caribbean Restaurant  0.04


----Alderwood, Long Branch----
               venue  freq
0     Discount Store  0.13
1        Pizza Place  0.09
2           Pharmacy  0.09
3  Convenience Store  0.04
4                Gym  0.04


----Bathurst Manor, Wilson Heights, Downsview North----
           venue  freq
0           Park  0.06
1    Coffee Shop  0.06
2           Bank  0.06
3  Deli / Bodega  0.03
4    Bridal Shop  0.03


----Bayview Village----
                 venue  freq
0                 Bank  0.12
1  Japanese Restaurant  0.12
2        Grocery Store  0.12
3          Gas Station  0.12
4   Chinese Restaurant  0.06


----Bedford Park, Lawrence Manor East----
                venue  freq
0  Italian Restaurant  0.07
1         Coffee Shop  0.07
2                Bank  0.05
3          Restaurant  0.05
4      Sandwich Place  0.05

                venue  freq
0            Pharmacy  0.12
1   Mobile Phone Shop  0.06
2         Pizza Place  0.06
3  Chinese Restaurant  0.06
4          Beer Store  0.06


----Lawrence Manor, Lawrence Heights----
                    venue  freq
0          Clothing Store  0.09
1             Coffee Shop  0.06
2  Furniture / Home Store  0.06
3    Fast Food Restaurant  0.06
4   Vietnamese Restaurant  0.04


----Lawrence Park----
                  venue  freq
0          College Quad  0.12
1           College Gym  0.12
2  Gym / Fitness Center  0.12
3             Bookstore  0.12
4                 Trail  0.12


----Leaside----
                    venue  freq
0             Coffee Shop  0.07
1  Furniture / Home Store  0.05
2           Grocery Store  0.05
3     Sporting Goods Shop  0.05
4       Electronics Store  0.05


----Little Portugal, Trinity----
                           venue  freq
0                           Café  0.09
1                            Bar  0.06
2  Vegetarian / Vegan Restauran

                venue  freq
0         Pizza Place  0.18
1         Gas Station  0.12
2      Ice Cream Shop  0.06
3  Chinese Restaurant  0.06
4      Sandwich Place  0.06


----Weston----
               venue  freq
0        Coffee Shop  0.13
1      Train Station  0.13
2  Convenience Store  0.07
3           Pharmacy  0.07
4          Gift Shop  0.07


----Wexford, Maryvale----
                       venue  freq
0  Middle Eastern Restaurant  0.10
1              Grocery Store  0.10
2                Pizza Place  0.10
3               Burger Joint  0.07
4                 Smoke Shop  0.03


----Willowdale, Newtonbrook----
                       venue  freq
0          Korean Restaurant  0.12
1                       Café  0.09
2  Middle Eastern Restaurant  0.09
3                Coffee Shop  0.06
4                      Diner  0.06


----Willowdale, Willowdale East----
                 venue  freq
0          Coffee Shop  0.07
1  Japanese Restaurant  0.06
2      Bubble Tea Shop  0.06
3     Ramen Resta

In [28]:
# Create a function to fetch the most common venues 

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create a dataframe of top 10 venue categories in each neighborhood.

In [29]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        
# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Chinese Restaurant,Shopping Mall,Bakery,Pizza Place,Coffee Shop,Caribbean Restaurant,Sandwich Place,Pool,Mediterranean Restaurant,Sri Lankan Restaurant
1,"Alderwood, Long Branch",Discount Store,Pharmacy,Pizza Place,Grocery Store,Donut Shop,Shopping Mall,Dance Studio,Intersection,Sandwich Place,Coffee Shop
2,"Bathurst Manor, Wilson Heights, Downsview North",Bank,Coffee Shop,Park,Convenience Store,Supermarket,Diner,Chinese Restaurant,Fried Chicken Joint,Sushi Restaurant,Sandwich Place
3,Bayview Village,Grocery Store,Bank,Gas Station,Japanese Restaurant,Intersection,Chinese Restaurant,Park,Restaurant,Skating Rink,Dog Run
4,"Bedford Park, Lawrence Manor East",Italian Restaurant,Coffee Shop,Bank,Restaurant,Sandwich Place,Bridal Shop,Skating Rink,Intersection,Sushi Restaurant,Juice Bar


Check for the 10 most common venues categories.

In [30]:
neighborhoods_venues_sorted.describe()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,98,98,98,98,98,98,98,98,98,98,98
unique,98,27,36,41,43,55,55,61,60,62,67
top,Glencairn,Coffee Shop,Coffee Shop,Coffee Shop,Pizza Place,Pizza Place,Bakery,Bakery,Bank,Sushi Restaurant,Restaurant
freq,1,40,18,11,7,7,6,7,5,4,6


## K-nearest neighbor Clustering

Finally, let us use K-nearest neighbor algorithm to cluster the neighborhoods based on the neighborhood venues (categories) available.

In [31]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', True)
toronto_grouped_clustering.head(5)

Unnamed: 0,Accessories Store,Afghan Restaurant,African Restaurant,Airport,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Art Gallery,...,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0


In [32]:
# look into the data
toronto_grouped_clustering.describe()

Unnamed: 0,Accessories Store,Afghan Restaurant,African Restaurant,Airport,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Art Gallery,...,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
count,98.0,98.0,98.0,98.0,98.0,98.0,98.0,98.0,98.0,98.0,...,98.0,98.0,98.0,98.0,98.0,98.0,98.0,98.0,98.0,98.0
mean,0.000541,0.000364,0.00034,0.000883,0.006121,0.000204,0.000102,0.000306,0.000306,0.003061,...,0.000842,0.004565,0.000182,0.000102,0.001183,0.000969,0.000757,0.001291,0.003862,0.000102
std,0.003205,0.003608,0.003367,0.00736,0.011696,0.00202,0.00101,0.002249,0.001732,0.006594,...,0.006451,0.011394,0.001804,0.00101,0.004015,0.006737,0.004411,0.007121,0.007936,0.00101
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,0.021739,0.035714,0.033333,0.071429,0.0625,0.02,0.01,0.02,0.01,0.03,...,0.0625,0.075758,0.017857,0.01,0.023256,0.0625,0.032258,0.052632,0.035088,0.01


In [33]:
# fit the data into the model
kmeans = KMeans(n_clusters=kclusters, random_state=3).fit(toronto_grouped_clustering)

In [34]:
# check for the cluster labels samples
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 1, 1, 1, 1, 2, 1], dtype=int32)

In [35]:
# check the dataframe before adding the generated cluster labels.
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Chinese Restaurant,Shopping Mall,Bakery,Pizza Place,Coffee Shop,Caribbean Restaurant,Sandwich Place,Pool,Mediterranean Restaurant,Sri Lankan Restaurant
1,"Alderwood, Long Branch",Discount Store,Pharmacy,Pizza Place,Grocery Store,Donut Shop,Shopping Mall,Dance Studio,Intersection,Sandwich Place,Coffee Shop
2,"Bathurst Manor, Wilson Heights, Downsview North",Bank,Coffee Shop,Park,Convenience Store,Supermarket,Diner,Chinese Restaurant,Fried Chicken Joint,Sushi Restaurant,Sandwich Place
3,Bayview Village,Grocery Store,Bank,Gas Station,Japanese Restaurant,Intersection,Chinese Restaurant,Park,Restaurant,Skating Rink,Dog Run
4,"Bedford Park, Lawrence Manor East",Italian Restaurant,Coffee Shop,Bank,Restaurant,Sandwich Place,Bridal Shop,Skating Rink,Intersection,Sushi Restaurant,Juice Bar


In [36]:
# add the clustering labels to the dataframe
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# create a copy of the dataframe containing old data (Postal Code, Neighborhood, Borough)
toronto_merged = df

In [37]:
# check the dataframe after adding the generated cluster labels.
neighborhoods_venues_sorted.head()

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,Agincourt,Chinese Restaurant,Shopping Mall,Bakery,Pizza Place,Coffee Shop,Caribbean Restaurant,Sandwich Place,Pool,Mediterranean Restaurant,Sri Lankan Restaurant
1,0,"Alderwood, Long Branch",Discount Store,Pharmacy,Pizza Place,Grocery Store,Donut Shop,Shopping Mall,Dance Studio,Intersection,Sandwich Place,Coffee Shop
2,0,"Bathurst Manor, Wilson Heights, Downsview North",Bank,Coffee Shop,Park,Convenience Store,Supermarket,Diner,Chinese Restaurant,Fried Chicken Joint,Sushi Restaurant,Sandwich Place
3,0,Bayview Village,Grocery Store,Bank,Gas Station,Japanese Restaurant,Intersection,Chinese Restaurant,Park,Restaurant,Skating Rink,Dog Run
4,1,"Bedford Park, Lawrence Manor East",Italian Restaurant,Coffee Shop,Bank,Restaurant,Sandwich Place,Bridal Shop,Skating Rink,Intersection,Sushi Restaurant,Juice Bar


In [38]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head()

Unnamed: 0_level_0,Borough,Neighborhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
M3A,North York,Parkwoods,43.753259,-79.329656,2.0,Park,Convenience Store,Bus Stop,Pharmacy,Shopping Mall,Pizza Place,Road,Fish & Chips Shop,Food & Drink Shop,Shop & Service
M4A,North York,Victoria Village,43.725882,-79.315572,1.0,Coffee Shop,Hockey Arena,Gym / Fitness Center,Playground,Pizza Place,Portuguese Restaurant,Men's Store,French Restaurant,Lounge,Golf Course
M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1.0,Coffee Shop,Theater,Park,Café,Diner,Pub,Bakery,Breakfast Spot,Italian Restaurant,Restaurant
M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,0.0,Clothing Store,Fast Food Restaurant,Furniture / Home Store,Coffee Shop,Vietnamese Restaurant,Restaurant,Sushi Restaurant,Fried Chicken Joint,Dessert Shop,Women's Store
M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1.0,Coffee Shop,Park,Café,Sushi Restaurant,Pizza Place,Italian Restaurant,Middle Eastern Restaurant,Yoga Studio,Ice Cream Shop,Diner


Check the data.

In [39]:
toronto_merged.info()

<class 'pandas.core.frame.DataFrame'>
Index: 103 entries, M3A to M8Z
Data columns (total 15 columns):
Borough                   103 non-null object
Neighborhood              103 non-null object
latitude                  103 non-null float64
longitude                 103 non-null float64
Cluster Labels            102 non-null float64
1st Most Common Venue     102 non-null object
2nd Most Common Venue     102 non-null object
3rd Most Common Venue     102 non-null object
4th Most Common Venue     102 non-null object
5th Most Common Venue     102 non-null object
6th Most Common Venue     102 non-null object
7th Most Common Venue     102 non-null object
8th Most Common Venue     102 non-null object
9th Most Common Venue     102 non-null object
10th Most Common Venue    102 non-null object
dtypes: float64(3), object(12)
memory usage: 17.9+ KB


From above, we can see that there is one null object. This means that there is one neighborhood with no found venue within the set radius of 1000m. We will drop this neighborhood.

In [40]:
toronto_merged.dropna(inplace=True)

In [41]:
# check the dataframe again.
toronto_merged.info()

<class 'pandas.core.frame.DataFrame'>
Index: 102 entries, M3A to M8Z
Data columns (total 15 columns):
Borough                   102 non-null object
Neighborhood              102 non-null object
latitude                  102 non-null float64
longitude                 102 non-null float64
Cluster Labels            102 non-null float64
1st Most Common Venue     102 non-null object
2nd Most Common Venue     102 non-null object
3rd Most Common Venue     102 non-null object
4th Most Common Venue     102 non-null object
5th Most Common Venue     102 non-null object
6th Most Common Venue     102 non-null object
7th Most Common Venue     102 non-null object
8th Most Common Venue     102 non-null object
9th Most Common Venue     102 non-null object
10th Most Common Venue    102 non-null object
dtypes: float64(3), object(12)
memory usage: 12.8+ KB


## Visualization

Let's create map that shows the clusters on neighborhoods based on their available venue.

In [42]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['latitude'], toronto_merged['longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels'].astype(int)):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<a id='item5'></a>

## Examine the Clusters

Let us examine each cluster.

#### Cluster 1

In [43]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0_level_0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
M6A,"Lawrence Manor, Lawrence Heights",Clothing Store,Fast Food Restaurant,Furniture / Home Store,Coffee Shop,Vietnamese Restaurant,Restaurant,Sushi Restaurant,Fried Chicken Joint,Dessert Shop,Women's Store
M1B,"Malvern, Rouge",Fast Food Restaurant,Trail,Coffee Shop,Spa,Restaurant,Construction & Landscaping,Martial Arts Dojo,Supermarket,Caribbean Restaurant,Bank
M4B,"Parkview Hill, Woodbine Gardens",Brewery,Pizza Place,Fast Food Restaurant,Bank,Intersection,Breakfast Spot,Café,Fabric Shop,Gastropub,Bakery
M6B,Glencairn,Grocery Store,Fast Food Restaurant,Coffee Shop,Gas Station,Pizza Place,Park,Sushi Restaurant,Latin American Restaurant,Mediterranean Restaurant,Metro Station
M6C,Humewood-Cedarvale,Pizza Place,Coffee Shop,Convenience Store,Grocery Store,Gastropub,Tennis Court,Bagel Shop,Field,Bank,Sandwich Place
M1E,"Guildwood, Morningside, West Hill",Pizza Place,Bank,Coffee Shop,Fast Food Restaurant,Convenience Store,Greek Restaurant,Beer Store,Liquor Store,Discount Store,Supermarket
M4G,Leaside,Coffee Shop,Electronics Store,Grocery Store,Sporting Goods Shop,Furniture / Home Store,Burger Joint,Sports Bar,Brewery,Restaurant,Sandwich Place
M1H,Cedarbrae,Coffee Shop,Bank,Gas Station,Pharmacy,Indian Restaurant,Bakery,Intersection,Burger Joint,Fast Food Restaurant,Martial Arts Dojo
M3H,"Bathurst Manor, Wilson Heights, Downsview North",Bank,Coffee Shop,Park,Convenience Store,Supermarket,Diner,Chinese Restaurant,Fried Chicken Joint,Sushi Restaurant,Sandwich Place
M4H,Thorncliffe Park,Coffee Shop,Grocery Store,Indian Restaurant,Shopping Mall,Burger Joint,Sandwich Place,Pizza Place,Supermarket,Turkish Restaurant,Bank


Seems like convenient place with many grocery stores, near trasportation, and many pizza stores. 

#### Cluster 2

In [44]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0_level_0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
M4A,Victoria Village,Coffee Shop,Hockey Arena,Gym / Fitness Center,Playground,Pizza Place,Portuguese Restaurant,Men's Store,French Restaurant,Lounge,Golf Course
M5A,"Regent Park, Harbourfront",Coffee Shop,Theater,Park,Café,Diner,Pub,Bakery,Breakfast Spot,Italian Restaurant,Restaurant
M7A,"Queen's Park, Ontario Provincial Government",Coffee Shop,Park,Café,Sushi Restaurant,Pizza Place,Italian Restaurant,Middle Eastern Restaurant,Yoga Studio,Ice Cream Shop,Diner
M3B,Don Mills,Restaurant,Coffee Shop,Japanese Restaurant,Gym,Bank,Supermarket,Burger Joint,Asian Restaurant,Pizza Place,Beer Store
M5B,"Garden District, Ryerson",Coffee Shop,Japanese Restaurant,Gastropub,Italian Restaurant,Restaurant,Cosmetics Shop,Pizza Place,Hotel,Seafood Restaurant,Café
M3C,Don Mills,Restaurant,Coffee Shop,Japanese Restaurant,Gym,Bank,Supermarket,Burger Joint,Asian Restaurant,Pizza Place,Beer Store
M5C,St. James Town,Café,Coffee Shop,Restaurant,Hotel,Italian Restaurant,Cosmetics Shop,Japanese Restaurant,Seafood Restaurant,Gastropub,Furniture / Home Store
M4E,The Beaches,Coffee Shop,Pub,Pizza Place,Breakfast Spot,Beach,Japanese Restaurant,Bakery,Burger Joint,Tea Room,Caribbean Restaurant
M5E,Berczy Park,Coffee Shop,Café,Hotel,Japanese Restaurant,Restaurant,Park,Beer Bar,Bakery,Gastropub,Art Gallery
M5G,Central Bay Street,Coffee Shop,Café,Clothing Store,Park,Hotel,Theater,Bubble Tea Shop,Burrito Place,Ramen Restaurant,Plaza


Neighborhood with many coffee shops, café and restaurants.

#### Cluster 3

In [45]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0_level_0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
M3A,Parkwoods,Park,Convenience Store,Bus Stop,Pharmacy,Shopping Mall,Pizza Place,Road,Fish & Chips Shop,Food & Drink Shop,Shop & Service
M9A,"Islington Avenue, Humber Valley Village",Pharmacy,Convenience Store,Bank,Bakery,Golf Course,Shopping Mall,Park,Grocery Store,Café,Skating Rink
M9B,"West Deane Park, Princess Gardens, Martin Grov...",Park,Pizza Place,Hotel,Theater,Gym,Grocery Store,Mexican Restaurant,Fish & Chips Shop,Restaurant,Clothing Store
M1C,"Rouge Hill, Port Union, Highland Creek",Breakfast Spot,Park,Burger Joint,Playground,Italian Restaurant,Falafel Restaurant,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant
M4C,Woodbine Heights,Coffee Shop,Park,Athletics & Sports,Pizza Place,Sandwich Place,Thai Restaurant,Pastry Shop,Beer Store,Liquor Store,Café
M9C,"Eringate, Bloordale Gardens, Old Burnhamthorpe...",Coffee Shop,Fish & Chips Shop,Grocery Store,Convenience Store,Skating Rink,Café,Shopping Plaza,Shopping Mall,Liquor Store,Beer Store
M6E,Caledonia-Fairbanks,Pharmacy,Park,Fast Food Restaurant,Japanese Restaurant,Coffee Shop,Grocery Store,Bakery,Bus Stop,Discount Store,Falafel Restaurant
M1G,Woburn,Park,Coffee Shop,Chinese Restaurant,Indian Restaurant,Fast Food Restaurant,Pharmacy,Mobile Phone Shop,Curling Ice,Falafel Restaurant,Dumpling Restaurant
M2H,Hillcrest Village,Pharmacy,Park,Coffee Shop,Convenience Store,Grocery Store,Shopping Mall,Chinese Restaurant,Korean Restaurant,Sandwich Place,Bank
M6L,"North Park, Maple Leaf Park, Upwood Park",Coffee Shop,Chinese Restaurant,Bakery,Convenience Store,Mediterranean Restaurant,Pizza Place,Gas Station,Park,Dim Sum Restaurant,Athletics & Sports


Seems like a commercial area.

#### Cluster 4

In [46]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0_level_0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
M2L,"York Mills, Silver Hills",Park,Pool,Zoo,Falafel Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space


Seems like a peaceful area and good place for parks, pools and zoo. 

#### Cluster 5

In [47]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0_level_0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
M9W,"Northwest, West Humber - Clairville",Hotel,Dog Run,Coffee Shop,Fish Market,Fish & Chips Shop,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant


Seems like a not so big residential/commercial area.