# Segmenting and Clustering Neighborhoods in Toronto
Week 3 assignment for IBM Data Science Capstone Project

## Part 1: Scraping Wikipedia Page for Postal Code Data
This section covers the scraping of data to create a dataframe in pandas.
First import required packages:

In [42]:
import numpy as np
import pandas as pd
import json
import folium
import requests
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans

Retrieve table from website using pandas and rename columns to match instructions:

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
postal_df = pd.read_html(url, attrs={'class': 'wikitable sortable'})[0]
postal_df.rename(columns={'Postal Code': 'PostalCode'}, inplace=True)
postal_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Remove rows with not borough assigned:

In [16]:
postal_filt_df = postal_df[postal_df['Borough'] != 'Not assigned']
postal_filt_df.reset_index(drop=True, inplace=True)
postal_filt_df.head(12)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


We confirm that there are no duplicates in the PostalCode column so that no merging of this info is necessary:

In [4]:
print('Number of rows in the dataframe:', postal_filt_df.shape[0])
print('Number of unique Postal Codes:', len(pd.unique(postal_filt_df['PostalCode'])))

Number of rows in the dataframe: 103
Number of unique Postal Codes: 103


We confirm that there are no missing neighborhoods in the DataFrame, so we do not need to copy borough names:

In [5]:
print('Number of neighborhoods with NaN value:', postal_filt_df['Neighborhood'].isna().sum())
print("Number of neighborhoods with 'Not assigned' value:", (postal_filt_df['Neighborhood'] == 'Not assigned').sum())

Number of neighborhoods with NaN value: 0
Number of neighborhoods with 'Not assigned' value: 0


We're now ready to show the number of rows in this filtered dataframe:

In [6]:
print('Number of rows in the filtered dataframe:', postal_filt_df.shape[0])

Number of rows in the filtered dataframe: 103


## Part 2: Adding Latitude, Longitude Data to the DataFrame

Get the latitude and longitude for each address and store them in separate lists:

In [7]:
geo_data = pd.read_csv('Geospatial_Coordinates.csv')
geo_data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Add latitude and longitude by finding matching postal codes between dataframes:

In [17]:
loc_df = postal_filt_df.copy()
for ind, pc in enumerate(loc_df['PostalCode']):
    loc_df.loc[ind, 'Latitude'] = geo_data.loc[geo_data['Postal Code'] == pc, 'Latitude'].item()
    loc_df.loc[ind, 'Longitude'] = geo_data.loc[geo_data['Postal Code'] == pc, 'Longitude'].item()
loc_df.head(12)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


## Part 3: Segmenting and Clustering Neighborhoods
We'll follow the example of our previous NY lab. We'll begin by keeping only postal codes whose borough contains the word 'Toronto'.

In [23]:
tor_df = loc_df[loc_df['Borough'].str.contains('Toronto')]
tor_df.reset_index(drop=True, inplace=True)
tor_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031


Declare Foursquare credentials for use:

In [27]:
CLIENT_ID = '1HSOI055ARYN0FAGG1Y0POLIPU04ZEM1ELOJM2BTU1DU1IRY' # your Foursquare ID
CLIENT_SECRET = 'CN1D45E54UE4OKLG2L1S5CSO10BQKIT3CILDDCFAFUVJ3PRQ' # your Foursquare Secret
VERSION = '20180605'
LIMIT = 100

We use the getNearbyVenues function as in the New York lab to find venues near the postal codes in our dataframe:

In [33]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['PostalCode', 
                  'PostalCode Latitude', 
                  'PostalCode Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [34]:
toronto_venues = getNearbyVenues(names=tor_df['PostalCode'], latitudes=tor_df['Latitude'], longitudes=tor_df['Longitude'])

M5A
M7A
M5B
M5C
M4E
M5E
M5G
M6G
M5H
M6H
M5J
M6J
M4K
M5K
M6K
M4L
M5L
M4M
M4N
M5N
M4P
M5P
M6P
M4R
M5R
M6R
M4S
M5S
M6S
M4T
M5T
M4V
M5V
M4W
M5W
M4X
M5X
M4Y
M7Y


We check how many venues are in each postal code:

In [35]:
toronto_venues.groupby('PostalCode').count()

Unnamed: 0_level_0,PostalCode Latitude,PostalCode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
PostalCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
M4E,4,4,4,4,4,4
M4K,42,42,42,42,42,42
M4L,20,20,20,20,20,20
M4M,40,40,40,40,40,40
M4N,3,3,3,3,3,3
M4P,7,7,7,7,7,7
M4R,20,20,20,20,20,20
M4S,35,35,35,35,35,35
M4T,2,2,2,2,2,2
M4V,16,16,16,16,16,16


We remove postal codes that have a low venue count as it may be hard to get statistical significance from them. We keep only postal codes that have 20 or more venues:

In [123]:
filtered = toronto_venues.groupby('PostalCode')['PostalCode Latitude'].filter(lambda x: len(x) > 19)
print(pd.unique(toronto_venues[toronto_venues['PostalCode Latitude'].isin(filtered)]['PostalCode']))
toronto_venues = toronto_venues[toronto_venues['PostalCode Latitude'].isin(filtered)]
toronto_venues.groupby('PostalCode').count()

['M5A' 'M7A' 'M5B' 'M5C' 'M5E' 'M5G' 'M5H' 'M5J' 'M6J' 'M4K' 'M5K' 'M6K'
 'M4L' 'M5L' 'M4M' 'M6P' 'M4R' 'M5R' 'M4S' 'M5S' 'M6S' 'M5T' 'M5W' 'M4X'
 'M5X' 'M4Y']


Unnamed: 0_level_0,PostalCode Latitude,PostalCode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
PostalCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
M4K,42,42,42,42,42,42
M4L,20,20,20,20,20,20
M4M,40,40,40,40,40,40
M4R,20,20,20,20,20,20
M4S,35,35,35,35,35,35
M4X,45,45,45,45,45,45
M4Y,73,73,73,73,73,73
M5A,48,48,48,48,48,48
M5B,100,100,100,100,100,100
M5C,76,76,76,76,76,76


We break down each postal code by venue category types:

In [124]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['PostalCode'] = toronto_venues['PostalCode'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,PostalCode,Afghan Restaurant,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Group rows by postal code and find the frequency of occurrence for each category:

In [125]:
toronto_grouped = toronto_onehot.groupby('PostalCode').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,PostalCode,Afghan Restaurant,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,M4K,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.02381
1,M4L,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,M4M,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.025
3,M4R,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05
4,M4S,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [126]:
toronto_grouped.shape

(26, 210)

Examine the top 5 most common venues for each postal code:

In [127]:
num_top_venues = 5

for pc in toronto_grouped['PostalCode']:
    print("----"+pc+"----")
    temp = toronto_grouped[toronto_grouped['PostalCode'] == pc].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----M4K----
                venue  freq
0    Greek Restaurant  0.21
1  Italian Restaurant  0.07
2         Coffee Shop  0.07
3      Ice Cream Shop  0.05
4           Bookstore  0.05


----M4L----
                  venue  freq
0  Fast Food Restaurant  0.10
1                  Park  0.10
2         Burrito Place  0.05
3      Sushi Restaurant  0.05
4           Pizza Place  0.05


----M4M----
                 venue  freq
0                 Café  0.10
1          Coffee Shop  0.08
2  American Restaurant  0.05
3            Gastropub  0.05
4               Bakery  0.05


----M4R----
                 venue  freq
0       Clothing Store  0.15
1          Coffee Shop  0.10
2          Yoga Studio  0.05
3  Sporting Goods Shop  0.05
4                Diner  0.05


----M4S----
              venue  freq
0              Café  0.09
1      Dessert Shop  0.09
2       Pizza Place  0.09
3    Sandwich Place  0.09
4  Sushi Restaurant  0.06


----M4X----
                venue  freq
0         Coffee Shop  0.09
1         

We'll cluster postal codes using kmeans with k=5 based on the frequency of venue type:

In [128]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('PostalCode', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

We create a dataframe that includes the clustering labels and the top 10 most common venues so we can analyze these together.

In [129]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['PostalCode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
pc_venues_sorted = pd.DataFrame(columns=columns)
pc_venues_sorted['PostalCode'] = toronto_grouped['PostalCode']

for ind in np.arange(toronto_grouped.shape[0]):
    pc_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

pc_venues_sorted.head()

Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4K,Greek Restaurant,Coffee Shop,Italian Restaurant,Furniture / Home Store,Bookstore,Ice Cream Shop,Brewery,Pizza Place,Spa,Yoga Studio
1,M4L,Park,Fast Food Restaurant,Coffee Shop,Ice Cream Shop,Burrito Place,Brewery,Fish & Chips Shop,Restaurant,Italian Restaurant,Pub
2,M4M,Café,Coffee Shop,Brewery,American Restaurant,Gastropub,Bakery,Coworking Space,Seafood Restaurant,Sandwich Place,Cheese Shop
3,M4R,Clothing Store,Coffee Shop,Yoga Studio,Restaurant,Chinese Restaurant,Mexican Restaurant,Shoe Store,Sporting Goods Shop,Diner,Dessert Shop
4,M4S,Dessert Shop,Sandwich Place,Café,Pizza Place,Sushi Restaurant,Gym,Italian Restaurant,Coffee Shop,Pharmacy,Restaurant


In [130]:
# add clustering labels
pc_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = tor_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(pc_venues_sorted.set_index('PostalCode'), on='PostalCode')

# lose the postal codes that were removed for having too few venues
toronto_merged.dropna(inplace=True)
toronto_merged.reset_index(drop=True, inplace=True)
toronto_merged = toronto_merged.astype({'Cluster Labels': 'int32'})
toronto_merged # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0,Coffee Shop,Pub,Park,Bakery,Restaurant,Café,Theater,Breakfast Spot,Event Space,Ice Cream Shop
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0,Coffee Shop,Sushi Restaurant,Yoga Studio,Diner,Café,Sandwich Place,Restaurant,College Auditorium,Creperie,Distribution Center
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0,Clothing Store,Coffee Shop,Café,Italian Restaurant,Middle Eastern Restaurant,Cosmetics Shop,Japanese Restaurant,Bubble Tea Shop,Restaurant,Bookstore
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,1,Coffee Shop,Café,Gastropub,American Restaurant,Cocktail Bar,Restaurant,Beer Bar,Italian Restaurant,Gym,Clothing Store
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,1,Coffee Shop,Cocktail Bar,Cheese Shop,Beer Bar,Seafood Restaurant,Bakery,Restaurant,Café,Park,Concert Hall
5,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,0,Coffee Shop,Italian Restaurant,Café,Sandwich Place,Thai Restaurant,Ice Cream Shop,Bubble Tea Shop,Burger Joint,Salad Place,Bar
6,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,1,Coffee Shop,Café,Restaurant,Gym,Hotel,Thai Restaurant,Clothing Store,Deli / Bodega,Cosmetics Shop,Seafood Restaurant
7,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,0,Coffee Shop,Aquarium,Café,Hotel,Restaurant,Fried Chicken Joint,Brewery,Scenic Lookout,Sporting Goods Shop,Italian Restaurant
8,M6J,West Toronto,"Little Portugal, Trinity",43.647927,-79.41975,1,Bar,Restaurant,Café,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Men's Store,Asian Restaurant,Yoga Studio,Cuban Restaurant,Record Shop
9,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,2,Greek Restaurant,Coffee Shop,Italian Restaurant,Furniture / Home Store,Bookstore,Ice Cream Shop,Brewery,Pizza Place,Spa,Yoga Studio


Let's visualise this on a map

In [131]:
# create map
map_clusters = folium.Map(location=[43.70011, -79.4163], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['PostalCode'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Now let's look at each of the five clusters to look for patterns:

#### Cluster 0

In [133]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[0] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,PostalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,0,Coffee Shop,Pub,Park,Bakery,Restaurant,Café,Theater,Breakfast Spot,Event Space,Ice Cream Shop
1,M7A,0,Coffee Shop,Sushi Restaurant,Yoga Studio,Diner,Café,Sandwich Place,Restaurant,College Auditorium,Creperie,Distribution Center
2,M5B,0,Clothing Store,Coffee Shop,Café,Italian Restaurant,Middle Eastern Restaurant,Cosmetics Shop,Japanese Restaurant,Bubble Tea Shop,Restaurant,Bookstore
5,M5G,0,Coffee Shop,Italian Restaurant,Café,Sandwich Place,Thai Restaurant,Ice Cream Shop,Bubble Tea Shop,Burger Joint,Salad Place,Bar
7,M5J,0,Coffee Shop,Aquarium,Café,Hotel,Restaurant,Fried Chicken Joint,Brewery,Scenic Lookout,Sporting Goods Shop,Italian Restaurant
16,M4R,0,Clothing Store,Coffee Shop,Yoga Studio,Restaurant,Chinese Restaurant,Mexican Restaurant,Shoe Store,Sporting Goods Shop,Diner,Dessert Shop


Cluster 0 is dominated by coffee shops and cafes, but also features shopping, yoga, and restaurants.

#### Cluster 1

In [134]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[0] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,PostalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,M5C,1,Coffee Shop,Café,Gastropub,American Restaurant,Cocktail Bar,Restaurant,Beer Bar,Italian Restaurant,Gym,Clothing Store
4,M5E,1,Coffee Shop,Cocktail Bar,Cheese Shop,Beer Bar,Seafood Restaurant,Bakery,Restaurant,Café,Park,Concert Hall
6,M5H,1,Coffee Shop,Café,Restaurant,Gym,Hotel,Thai Restaurant,Clothing Store,Deli / Bodega,Cosmetics Shop,Seafood Restaurant
8,M6J,1,Bar,Restaurant,Café,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Men's Store,Asian Restaurant,Yoga Studio,Cuban Restaurant,Record Shop
10,M5K,1,Coffee Shop,Café,Hotel,Restaurant,Japanese Restaurant,American Restaurant,Italian Restaurant,Seafood Restaurant,Salad Place,Sushi Restaurant
11,M6K,1,Café,Coffee Shop,Breakfast Spot,Yoga Studio,Stadium,Gym,Furniture / Home Store,Intersection,Italian Restaurant,Nightclub
13,M5L,1,Coffee Shop,Restaurant,Café,Hotel,American Restaurant,Gym,Deli / Bodega,Seafood Restaurant,Italian Restaurant,Japanese Restaurant
14,M4M,1,Café,Coffee Shop,Brewery,American Restaurant,Gastropub,Bakery,Coworking Space,Seafood Restaurant,Sandwich Place,Cheese Shop
15,M6P,1,Café,Thai Restaurant,Mexican Restaurant,Bakery,Bar,Diner,Italian Restaurant,Fried Chicken Joint,Fast Food Restaurant,Cajun / Creole Restaurant
19,M5S,1,Café,Restaurant,Japanese Restaurant,Italian Restaurant,Bookstore,Bakery,Bar,Sushi Restaurant,Sandwich Place,Pub


Cluster 1 is also dominated by cafes and coffee shops, but also features more bars and pubs than Cluster 0.

#### Cluster 2

In [135]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[0] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,PostalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,M4K,2,Greek Restaurant,Coffee Shop,Italian Restaurant,Furniture / Home Store,Bookstore,Ice Cream Shop,Brewery,Pizza Place,Spa,Yoga Studio


Cluster 2 is more focused on restaurants than coffee.

#### Cluster 3

In [136]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[0] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,PostalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,M5R,3,Sandwich Place,Café,Coffee Shop,Burger Joint,Furniture / Home Store,Park,Pharmacy,Pizza Place,Pub,Donut Shop
18,M4S,3,Dessert Shop,Sandwich Place,Café,Pizza Place,Sushi Restaurant,Gym,Italian Restaurant,Coffee Shop,Pharmacy,Restaurant


Cluster 3 is a good place to get a sandwich.

#### Cluster 4

In [137]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[0] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,PostalCode,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,M4L,4,Park,Fast Food Restaurant,Coffee Shop,Ice Cream Shop,Burrito Place,Brewery,Fish & Chips Shop,Restaurant,Italian Restaurant,Pub


Cluster 4 is a good place to go for a walk and is surrounded by parks.