---
# Coursera Capstone Project
---
## Segmenting and Clustering Neighborhoods in the City of Toronto
by Malte Jörg

In [31]:
import pandas as pd
import numpy as np
import geocoder

---
## 1. Scraping and Preprocessing Data

*In this part of the project I will scrape a DataFrame from a Wikipedia page, safe the information into a pandas DataFrame and preprocess the data for further analysis*

Read DataFrame from given URL

In [32]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df = pd.read_html(url)[0]
df

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


Drop cells that have 'Not assigned' borough and check if a neighborhood is 'Not assigned'. If so, assign borough as neighborhood.

In [33]:
df = df[~df['Borough'].isin(['Not assigned'])].reset_index(drop=True)
for i in range(len(df.index)):
    if df['Neighborhood'].loc[i] == 'Not assigned':
       df['Neighborhood'].loc[i] = df['Borough'].loc[i]
    else:
        pass

*If there is no borough assigned then there is no neighborhood assigned. The first line of code drops all rows which are not relevant. The For-loop here is just for clarification and the possibility if there is another DataFrame loaded into the model.*

In [34]:
df

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


*There are no multiple Postal Codes in the DataFrame, so no grouping has to be done. For each Postal Code exist multiple neighborhoods and each Borough has multiple Postal Codes*

Print the shape of the preprocessed DatFrame:

In [35]:
df.shape

(103, 3)

Copy the geographical coordinates of each postal code to the DataFrame

In [36]:
df_geo = pd.read_csv('Geospatial_Coordinates.csv')
latitude = []
longitude = []
for postal_code in df['Postal Code']:
    i = list(df_geo['Postal Code']).index(postal_code)
    latitude.append(df_geo['Latitude'].loc[i])
    longitude.append(df_geo['Longitude'].loc[i])  
df['Latitude'] = latitude
df['Longitude'] = longitude
df


Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


In [7]:
print('Postal Codes: ', len(df['Postal Code'].unique()))
print('Boroughs: ', len(df['Borough'].unique()))
print('Neighborhoods: ', len(df['Neighborhood'].unique()))

Postal Codes:  103
Boroughs:  10
Neighborhoods:  99


### *Analysis*
*There are 103 postal codes and geographic coordinates for 99 unique neighborhoods and 10 boroughs in Toronto. The further analysis of Toronto venues is made by postal code and to them assigned neighborhoods.*

---
## 2. Exploration and Clustering of Toronto's Neighborhoods

*The next step is to explore Toronto's boroughs. Therefore I will get *

In [8]:
import json
import requests
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium

Create a map of Toronto with markers of the boroughs and neighborhoods:

In [9]:
toronto_lat = 43.651070
toronto_long = -79.347015
map_toronto = folium.Map(location=[toronto_lat, toronto_long], zoom_start=10)
for lat, lng, borough, neighborhoods in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(borough, neighborhoods)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Define Foursquare Credentials and Version

In [10]:
CLIENT_ID = '22T4CBV0WIVKXKQERQRP05B0Q2J534DO4MQHL54U3J0MPMMA'
CLIENT_SECRET = '1ODAI15QLXWHQDRJQDEDPNECDMUJZQNSJRB3SLHLHKVUQUAS'
VERSION = '20180605'

Get venues from each borough in Toronto and assign them the containing Neighborhoods and Boroughs

In [11]:
def getVenues(postal_codes, borough_names, neighborhood_names, latitudes, longitudes, radius, LIMIT): 
    venues_list=[]
    del_list = []
    for postal_code, bname, nname, lat, lng in zip(postal_codes, borough_names, neighborhood_names, latitudes, longitudes):
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
                CLIENT_ID, 
                CLIENT_SECRET, 
                VERSION, 
                lat, 
                lng, 
                radius, 
                LIMIT)

        if requests.get(url).json()["response"]['totalResults'] == 0:
            del_list.append(postal_codes[postal_codes == postal_code].index[0]) #append index value of postal code in df
            
        else:
            results = requests.get(url).json()["response"]['groups'][0]['items']
            venues_list.append([(
                postal_code,
                bname,
                nname, 
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name']) for v in results])

    venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    venues.columns = ['Postal Code',
                  'Borough',
                  'Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    print('done')    
    return(venues, del_list)

In [12]:
limit = 100         #max 100 venues
radius = 500        #500m radius
toronto_venues, del_postals = getVenues(postal_codes=df['Postal Code'],
                            borough_names=df['Borough'],
                            neighborhood_names=df['Neighborhood'],
                            latitudes=df['Latitude'],
                            longitudes=df['Longitude'],
                            radius=radius,
                            LIMIT=limit
                            )

done


In [13]:
del_postals

[5, 11, 95]

### *Analysis:*
*There are three Postal Codes in the DataFrame that don't have any venues in a 500m radius. Therefore these postal codes has to be deleted fro the DatFrame to achieve equal DataFrame shapes in the further analysis.*

In [37]:
df.drop(del_postals, axis=0, inplace=True)

In [39]:
df.reset_index(drop=True, inplace=True)
df.shape

(100, 5)

Analyze Torontos venues by category:

In [15]:
print('There are {} uniques venue categories.'.format(len(toronto_venues['Venue Category'].unique())))
toronto_venues.groupby('Venue Category').count().sort_values(by=['Borough'], ascending=False)

There are 271 uniques venue categories.


Unnamed: 0_level_0,Postal Code,Borough,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Coffee Shop,176,176,176,176,176,176,176,176
Café,101,101,101,101,101,101,101,101
Restaurant,67,67,67,67,67,67,67,67
Pizza Place,52,52,52,52,52,52,52,52
Park,48,48,48,48,48,48,48,48
...,...,...,...,...,...,...,...,...
Flea Market,1,1,1,1,1,1,1,1
Other Great Outdoors,1,1,1,1,1,1,1,1
Organic Grocery,1,1,1,1,1,1,1,1
Optical Shop,1,1,1,1,1,1,1,1


### *Analysis:*
*In Toronto there are 271 unique venues. Coffee Shops, Café's and Restraunts are the three most categories. The least venues in Toronto are for example a Hostel, Recording Studio and College Gym, which are pretty unique and most likely belong to a more general category*

To analyze each neighborhood in Toronto and to check out which venue category is the most popular in the neighborhood, create a DataFrame venue category dummies

In [40]:
for i in range(len(toronto_venues.index)):
    if toronto_venues['Venue Category'].loc[i] == 'Neighborhood':
       toronto_venues['Venue Category'].loc[i] = 'Quarter'
    else:
        pass
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
toronto_onehot['Postal Code'] = toronto_venues['Postal Code']
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]
toronto_onehot.head()

Unnamed: 0,Postal Code,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,M3A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M3A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M3A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M4A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M4A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### *Comment:*
*In the Toronto Venues Categories data are three venues which have the category 'Neighborhood'. To proceed with the analysis these venues were renamed to a 'Quarter' to avoid analytical and technical misunderstandings.*

In [41]:
len(toronto_onehot['Postal Code'].unique())

100

Group the rows of the onehot-DataFrame and take the mean to evaluate which venue is the most common in each neighborhood.

In [42]:

toronto_grouped = toronto_onehot.groupby('Postal Code').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Postal Code,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,M1B,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M1C,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,M1E,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M1G,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M1H,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [47]:
toronto_grouped.shape

(100, 269)

Calculate the Top 5 venue categories for each postal code in Toronto.

In [48]:
num_top_venues = 5

for hood in toronto_grouped['Postal Code']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Postal Code'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----M1B----
                      venue  freq
0      Fast Food Restaurant   0.5
1                Print Shop   0.5
2            Massage Studio   0.0
3            Medical Center   0.0
4  Mediterranean Restaurant   0.0


----M1C----
                 venue  freq
0                  Bar   1.0
1    Accessories Store   0.0
2   Miscellaneous Shop   0.0
3  Moroccan Restaurant   0.0
4  Monument / Landmark   0.0


----M1E----
                 venue  freq
0         Intersection  0.14
1   Mexican Restaurant  0.14
2                 Bank  0.14
3  Rental Car Location  0.14
4    Electronics Store  0.14


----M1G----
                 venue  freq
0          Coffee Shop  0.67
1    Korean Restaurant  0.33
2    Accessories Store  0.00
3   Miscellaneous Shop  0.00
4  Moroccan Restaurant  0.00


----M1H----
                  venue  freq
0      Hakka Restaurant  0.11
1    Athletics & Sports  0.11
2  Caribbean Restaurant  0.11
3   Fried Chicken Joint  0.11
4                  Bank  0.11


----M1J----
            

Get the 10 most common venues for each neighborhood!

In [49]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Postal Code']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Postal Code'] = toronto_grouped['Postal Code']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Print Shop,Fast Food Restaurant,Yoga Studio,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
1,M1C,Bar,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore,Farmers Market
2,M1E,Electronics Store,Rental Car Location,Breakfast Spot,Medical Center,Intersection,Bank,Mexican Restaurant,Yoga Studio,Dog Run,Discount Store
3,M1G,Coffee Shop,Korean Restaurant,Drugstore,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Yoga Studio
4,M1H,Caribbean Restaurant,Gas Station,Bank,Bakery,Fried Chicken Joint,Athletics & Sports,Thai Restaurant,Lounge,Hakka Restaurant,Cosmetics Shop
...,...,...,...,...,...,...,...,...,...,...,...
95,M9N,Park,Yoga Studio,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop
96,M9P,Pizza Place,Intersection,Sandwich Place,Coffee Shop,Chinese Restaurant,Discount Store,Distribution Center,Dim Sum Restaurant,Diner,Dog Run
97,M9R,Pizza Place,Bus Line,Mobile Phone Shop,Sandwich Place,Distribution Center,Dim Sum Restaurant,Diner,Discount Store,Dog Run,College Rec Center
98,M9V,Grocery Store,Liquor Store,Fried Chicken Joint,Pizza Place,Sandwich Place,Beer Store,Fast Food Restaurant,Pharmacy,General Entertainment,Curling Ice


### *Analysis:*
*As already seen in the analysis of the most common venue category, the Coffee Shop is in multiple neighborhoods the most common venue. Also you can see that there are multiple categories for Restraunts, which are specified in the type of the restraunts. In this case there should be a more general category to get better results in the analysis of a neighborhood.*

---
## 3. Cluster Postal Codes
In this part of the project I will cluster the venues of each postal code into 10 clusters.

In [50]:
from sklearn.cluster import KMeans

Create cluster labels with kmeans clustering

In [104]:
kclusters = 10
toronto_grouped_clustering = toronto_grouped.drop('Postal Code', 1)
toronto_grouped_clustering.columns
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)
kmeans.labels_

array([6, 4, 6, 6, 0, 0, 6, 0, 6, 0, 0, 0, 6, 0, 7, 0, 6, 6, 0, 1, 2, 6,
       7, 0, 7, 0, 6, 6, 6, 7, 0, 5, 0, 6, 0, 0, 0, 6, 0, 1, 6, 0, 6, 7,
       0, 6, 0, 3, 6, 7, 0, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 8, 7, 0, 6,
       6, 6, 6, 6, 6, 0, 7, 7, 0, 0, 6, 6, 7, 6, 0, 6, 6, 6, 6, 6, 0, 6,
       0, 7, 5, 0, 0, 0, 5, 1, 0, 0, 0, 9])

Create a toronto merged DataFrame with the coordinates, cluster labes and 10 most popular venues of each postal code

In [105]:
#neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_.astype(int))
neighborhoods_venues_sorted['Cluster Labels'] = kmeans.labels_
toronto_merged = df.groupby('Postal Code').sum()
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Postal Code'), on='Postal Code')

toronto_merged

Unnamed: 0_level_0,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
M1B,43.806686,-79.194353,6,Print Shop,Fast Food Restaurant,Yoga Studio,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
M1C,43.784535,-79.160497,4,Bar,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore,Farmers Market
M1E,43.763573,-79.188711,6,Electronics Store,Rental Car Location,Breakfast Spot,Medical Center,Intersection,Bank,Mexican Restaurant,Yoga Studio,Dog Run,Discount Store
M1G,43.770992,-79.216917,6,Coffee Shop,Korean Restaurant,Drugstore,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Yoga Studio
M1H,43.773136,-79.239476,0,Caribbean Restaurant,Gas Station,Bank,Bakery,Fried Chicken Joint,Athletics & Sports,Thai Restaurant,Lounge,Hakka Restaurant,Cosmetics Shop
...,...,...,...,...,...,...,...,...,...,...,...,...,...
M9N,43.706876,-79.518188,1,Park,Yoga Studio,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop
M9P,43.696319,-79.532242,0,Pizza Place,Intersection,Sandwich Place,Coffee Shop,Chinese Restaurant,Discount Store,Distribution Center,Dim Sum Restaurant,Diner,Dog Run
M9R,43.688905,-79.554724,0,Pizza Place,Bus Line,Mobile Phone Shop,Sandwich Place,Distribution Center,Dim Sum Restaurant,Diner,Discount Store,Dog Run,College Rec Center
M9V,43.739416,-79.588437,0,Grocery Store,Liquor Store,Fried Chicken Joint,Pizza Place,Sandwich Place,Beer Store,Fast Food Restaurant,Pharmacy,General Entertainment,Curling Ice


Map the clusters in a folium map with colored marks

In [106]:
map_clusters = folium.Map(location=[toronto_lat, toronto_long], zoom_start=11)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged.index, toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

---
## 4. Examine the Clusters
To create a full analysis of the clusters, I will examine each cluster and check out the most common venues.

### Cluster 1

In [107]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[3] + list(range(4, toronto_merged.shape[1]-5))]]

Unnamed: 0_level_0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
M1H,Caribbean Restaurant,Gas Station,Bank,Bakery,Fried Chicken Joint
M1J,Spa,Playground,Donut Shop,Dim Sum Restaurant,Diner
M1L,Bus Line,Bakery,Ice Cream Shop,Metro Station,Park
M1N,College Stadium,Skating Rink,Café,General Entertainment,Farm
M1P,Indian Restaurant,Vietnamese Restaurant,Furniture / Home Store,Chinese Restaurant,Brewery
M1R,Auto Garage,Sandwich Place,Smoke Shop,Breakfast Spot,Bakery
M1T,Pizza Place,Pharmacy,Bank,Fried Chicken Joint,Italian Restaurant
M1W,Fast Food Restaurant,Chinese Restaurant,Grocery Store,Bank,Supermarket
M2K,Café,Bank,Chinese Restaurant,Japanese Restaurant,Yoga Studio
M2R,Pizza Place,Bank,Discount Store,Pharmacy,Coffee Shop


### Cluster 2

In [108]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[3] + list(range(4, toronto_merged.shape[1]-5))]]

Unnamed: 0_level_0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
M2L,Park,Yoga Studio,Doner Restaurant,Dessert Shop,Dim Sum Restaurant
M4J,Park,Convenience Store,Donut Shop,Dim Sum Restaurant,Diner
M9N,Park,Yoga Studio,Doner Restaurant,Dessert Shop,Dim Sum Restaurant


### Cluster 3

In [109]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[3] + list(range(4, toronto_merged.shape[1]-5))]]

Unnamed: 0_level_0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
M2M,Piano Bar,Deli / Bodega,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant


### Cluster 4

In [110]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[3] + list(range(4, toronto_merged.shape[1]-5))]]

Unnamed: 0_level_0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
M4T,Tennis Court,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner


### Cluster 5

In [111]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[3] + list(range(4, toronto_merged.shape[1]-5))]]

Unnamed: 0_level_0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
M1C,Bar,Yoga Studio,Donut Shop,Diner,Discount Store


### Cluster 6

In [112]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 5, toronto_merged.columns[[3] + list(range(4, toronto_merged.shape[1]-5))]]

Unnamed: 0_level_0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
M3M,Baseball Field,Food Truck,Yoga Studio,Diner,Discount Store
M8Y,Baseball Field,Deli / Bodega,Yoga Studio,Discount Store,Distribution Center
M9M,Baseball Field,Yoga Studio,Diner,Discount Store,Distribution Center


### Cluster 7

In [113]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 6, toronto_merged.columns[[3] + list(range(4, toronto_merged.shape[1]-5))]]

Unnamed: 0_level_0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
M1B,Print Shop,Fast Food Restaurant,Yoga Studio,Dim Sum Restaurant,Diner
M1E,Electronics Store,Rental Car Location,Breakfast Spot,Medical Center,Intersection
M1G,Coffee Shop,Korean Restaurant,Drugstore,Diner,Discount Store
M1K,Discount Store,Train Station,Coffee Shop,Bus Station,Chinese Restaurant
M1M,Motel,American Restaurant,Dim Sum Restaurant,Diner,Discount Store
M1S,Lounge,Latin American Restaurant,Skating Rink,Breakfast Spot,Donut Shop
M2H,Athletics & Sports,Golf Course,Pool,Mediterranean Restaurant,Dog Run
M2J,Clothing Store,Coffee Shop,Fast Food Restaurant,Restaurant,Toy / Game Store
M2N,Ramen Restaurant,Sandwich Place,Coffee Shop,Café,Pizza Place
M3C,Restaurant,Gym,Asian Restaurant,Coffee Shop,Beer Store


### Cluster 8

In [114]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 7, toronto_merged.columns[[3] + list(range(4, toronto_merged.shape[1]-5))]]

Unnamed: 0_level_0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
M1V,Playground,Park,Yoga Studio,Doner Restaurant,Dessert Shop
M2P,Park,Construction & Landscaping,Convenience Store,Yoga Studio,Doner Restaurant
M3A,Park,Construction & Landscaping,Food & Drink Shop,Yoga Studio,Doner Restaurant
M3K,Snack Place,Airport,Park,Electronics Store,Doner Restaurant
M4N,Bus Line,Park,Swim School,Yoga Studio,Doner Restaurant
M4W,Park,Playground,Trail,Yoga Studio,Dog Run
M5P,Trail,Park,Sushi Restaurant,Jewelry Store,Yoga Studio
M6C,Trail,Park,Field,Hockey Arena,Yoga Studio
M6E,Park,Women's Store,Pool,Doner Restaurant,Dessert Shop
M6L,Park,Bakery,Construction & Landscaping,Yoga Studio,Donut Shop


### Cluster 9

In [115]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 8, toronto_merged.columns[[3] + list(range(4, toronto_merged.shape[1]-5))]]

Unnamed: 0_level_0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
M5N,Garden,Yoga Studio,Doner Restaurant,Dim Sum Restaurant,Diner


### Cluster 10

In [116]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 9, toronto_merged.columns[[3] + list(range(4, toronto_merged.shape[1]-5))]]

Unnamed: 0_level_0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
M9W,Rental Car Location,Drugstore,Yoga Studio,Dog Run,Dessert Shop


## *Analysis:*
*Due to the great amount of unique venues in the data set, there has to be a lot of clusters to get reasonable results. As you can see in the examination of the clusters there are about five very unique clusters with only one postal code. Cluster 7 is the biggest cluster and mostly locate in downtown Toronto. This cluster consist mostly of coffee shops, cafés, pubs and restraunts, which is carachteristic for a downtown area. On the other hand, cluster 1 is spread out all over Toronto, but more likely around the downtown area. Here the most common venues are more diverse, but characteristic for a more lighter density of the city population.* 