# Segmenting and Clustering Neighborhoods in Toronto

## Part 1: Scraping the Wikipedia page

We start by retrieving the necessary data from Wikipedia. Setup:

In [148]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

webpage = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

Now we can do the scraping:

In [162]:
response = requests.get(url = webpage)

soup = BeautifulSoup(response.content, 'html.parser')

postal_codes = pd.DataFrame([], columns = ["Postal Code", "Borough", "Neighbourhood"])
i = 0

for tr in soup.find("table").find_all("tr"):
    if tr.find_all('td') != []:
        postal_codes.loc[i] = [tr.find_all('td')[0].string[:-1],
                               tr.find_all('td')[1].string[:-1],
                               tr.find_all('td')[2].string[:-1]]
        i += 1

            

In [163]:
postal_codes.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [164]:
postal_codes.shape

(180, 3)

We need to clean the data a bit:

In [165]:
postal_codes = postal_codes.query("Borough != 'Not assigned'")

def conc_nbhd(nbhds):
    if(len(nbhds) == 1):
        return nbhds
    else:
        output = ""
        for nbhd in nbhds:
            output += nbhd
            output += ", "
        return(output[:-2])
    
postal_codes = postal_codes.groupby(["Postal Code", "Borough"]).agg({"Neighbourhood" : (lambda x : conc_nbhd(x))}).reset_index()

postal_codes['Neighbourhood'] = np.where(postal_codes["Neighbourhood"] == "Not assigned",
                                         postal_codes["Borough"],
                                         postal_codes["Neighbourhood"])

Data looks like this now:

In [166]:
postal_codes.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [167]:
postal_codes.shape

(103, 3)

## Part 2: Joining in latitude and longitude

We read in the .csv data with latitudes and longitudes:

In [168]:
geospatial = pd.read_csv("Geospatial_Coordinates.csv")
geospatial.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Now we just join it to the postal codes:

In [169]:
postal_codes = pd.merge(postal_codes, geospatial, how = "left", on = "Postal Code")
postal_codes.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


## Part 3: Clustering and Data Exploring

Since I live in Munich, I thought it would be more fun to do it for Munich instead - I've never even been in Toronto. The neighbourhood data is already available on https://github.com/zauberware/postal-codes-json-xml-csv, and the neighbourhood names are here https://www.muenchen.de/leben/service/postleitzahlen.html, I'll just load everything and get the data ready:

In [203]:
pc_de = pd.read_csv("postal_codes_de.csv")
pc_de = pc_de.query("place == 'München'")
pc_de.head()

Unnamed: 0,country_code,zipcode,place,state,state_code,province,province_code,community,community_code,latitude,longitude
2869,DE,80331,München,Bayern,BY,Upper Bavaria,91,München,9162,48.1345,11.571
2870,DE,80333,München,Bayern,BY,Upper Bavaria,91,München,9162,48.1452,11.5668
2871,DE,80335,München,Bayern,BY,Upper Bavaria,91,München,9162,48.1427,11.5552
2872,DE,80336,München,Bayern,BY,Upper Bavaria,91,München,9162,48.1345,11.559
2873,DE,80337,München,Bayern,BY,Upper Bavaria,91,München,9162,48.1224,11.5449


In [204]:
stadtteile = pd.read_csv("stadtteile.tsv", sep = "\t")
stadtteile = pd.concat([pd.Series(row['Stadtteil'], row['Postleitzahl'].split(','))              
                    for _, row in stadtteile.iterrows()]).reset_index().rename(columns = {"index" : "zipcode", 0 : "Neighbourhood"})
stadtteile.zipcode = stadtteile.zipcode.apply(pd.to_numeric)
stadtteile.head()

Unnamed: 0,zipcode,Neighbourhood
0,80995,Allach-Untermenzing
1,80997,Allach-Untermenzing
2,80999,Allach-Untermenzing
3,81247,Allach-Untermenzing
4,81249,Allach-Untermenzing


In [205]:
pc_de = pd.merge(pc_de, stadtteile, how = "left", on = "zipcode")
pc_de = pc_de.reset_index()[["zipcode", "Neighbourhood", "latitude", "longitude"]]
pc_de.head()

Unnamed: 0,zipcode,Neighbourhood,latitude,longitude
0,80331,Altstadt-Lehel,48.1345,11.571
1,80333,Altstadt-Lehel,48.1452,11.5668
2,80333,Maxvorstadt,48.1452,11.5668
3,80335,Altstadt-Lehel,48.1427,11.5552
4,80335,Ludwigsvorstadt-Isarvorstadt,48.1427,11.5552


It's not really ideal, since zipcodes and neighbourhoods overlap, but let's see what comes out of it. Here's a map:

### Data Analysis

In [216]:
import folium

# create map of New York using latitude and longitude values
map_munich = folium.Map(location=[48.1351, 11.5820], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(pc_de['latitude'], pc_de['longitude'], pc_de['zipcode'], pc_de['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_munich)  
    
map_munich

I disagree with some of the markers, but let's not question the data, ok?

We move on to Foursquare and run the same analysis as in the Lab. Hopefully the cell with the API won't be shared publicly.

In [217]:
CLIENT_ID = 'UZSSZWSN5S1CICBZM2RI2QIYJKUWUODBRTIFOGLMBWU1TEB5' # your Foursquare ID
CLIENT_SECRET = 'HGUQ35X4OZKUQ4JXNNOWJNXE1K2MAVKENFC2XCKZA3WPERH4' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: UZSSZWSN5S1CICBZM2RI2QIYJKUWUODBRTIFOGLMBWU1TEB5
CLIENT_SECRET:HGUQ35X4OZKUQ4JXNNOWJNXE1K2MAVKENFC2XCKZA3WPERH4


Two useful functions:

In [221]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [243]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Aaaaand we load all the venues:

In [244]:
munich_venues = getNearbyVenues(names=pc_de['Neighbourhood'],
                                latitudes=pc_de['latitude'],
                                longitudes=pc_de['longitude'])

Altstadt-Lehel
Altstadt-Lehel
Maxvorstadt
Altstadt-Lehel
Ludwigsvorstadt-Isarvorstadt
Maxvorstadt
Schwanthalerhöhe
Altstadt-Lehel
Ludwigsvorstadt-Isarvorstadt
Sendling
Ludwigsvorstadt-Isarvorstadt
Sendling
Schwanthalerhöhe
Altstadt-Lehel
Ludwigsvorstadt-Isarvorstadt
Sendling
Altstadt-Lehel
Schwabing-Freimann
Altstadt-Lehel
Maxvorstadt
Neuhausen-Nymphenburg
Maxvorstadt
Neuhausen-Nymphenburg
Moosach
Neuhausen-Nymphenburg
Moosach
Neuhausen-Nymphenburg
Neuhausen-Nymphenburg
Laim
Sendling-Westpark
Laim
Pasing-Obermenzing
Hadern
Laim
Pasing-Obermenzing
Schwabing-West
Maxvorstadt
Schwabing-West
Maxvorstadt
Schwabing-West
Maxvorstadt
Schwabing-West
Maxvorstadt
Schwabing-Freimann
Schwabing-West
Maxvorstadt
Schwabing-Freimann
Schwabing-Freimann
Schwabing-West
Schwabing-Freimann
Schwabing-West
Schwabing-Freimann
Milbertshofen-Am Hart
Schwabing-Freimann
Milbertshofen-Am Hart
Schwabing-West
Feldmoching-Hasenbergl
Feldmoching-Hasenbergl
Milbertshofen-Am Hart
Milbertshofen-Am Hart
Schwabing-Freimann


In [245]:
print(munich_venues.shape)
munich_venues.head()

(3409, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Altstadt-Lehel,48.1345,11.571,Asamkirche (St. Johann Nepomuk),48.135053,11.569746,Church
1,Altstadt-Lehel,48.1345,11.571,The High,48.133101,11.572939,Cocktail Bar
2,Altstadt-Lehel,48.1345,11.571,Ringlers,48.134097,11.568302,Sandwich Place
3,Altstadt-Lehel,48.1345,11.571,TeeGschwendner,48.135398,11.569455,Tea Room
4,Altstadt-Lehel,48.1345,11.571,Kleinschmecker,48.134659,11.573565,German Restaurant


In [246]:
print('There are {} uniques categories.'.format(len(munich_venues['Venue Category'].unique())))

There are 232 uniques categories.


Now for the clustering:

In [249]:
# one hot encoding
munich_onehot = pd.get_dummies(munich_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
munich_onehot['Neighborhood'] = munich_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [munich_onehot.columns[-1]] + list(munich_onehot.columns[:-1])
munich_onehot = munich_onehot[fixed_columns]

munich_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Austrian Restaurant,...,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vape Store,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Altstadt-Lehel,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Altstadt-Lehel,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Altstadt-Lehel,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Altstadt-Lehel,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Altstadt-Lehel,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [250]:
munich_onehot.shape

(3409, 233)

In [251]:
munich_grouped = munich_onehot.groupby('Neighborhood').mean().reset_index()
munich_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Austrian Restaurant,...,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vape Store,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Allach-Untermenzing,0.0,0.0,0.0,0.0,0.0,0.0,0.024691,0.0,0.012346,...,0.0,0.0,0.0,0.0,0.0,0.024691,0.0,0.0,0.0,0.0
1,Altstadt-Lehel,0.009091,0.002273,0.0,0.006818,0.009091,0.002273,0.022727,0.0,0.002273,...,0.004545,0.0,0.002273,0.0,0.004545,0.015909,0.006818,0.004545,0.0,0.0
2,Au-Haidhausen,0.0,0.0,0.0,0.0,0.0,0.0,0.008658,0.0,0.0,...,0.017316,0.0,0.008658,0.0,0.004329,0.012987,0.004329,0.004329,0.0,0.0
3,Aubing-Lochhausen-Langwied,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.015152,...,0.0,0.0,0.0,0.0,0.0,0.015152,0.0,0.0,0.0,0.0
4,Berg am Laim,0.0,0.0,0.0,0.0,0.0,0.0,0.060606,0.0,0.0,...,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Bogenhausen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.010309,0.010309,0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.0
6,Feldmoching-Hasenbergl,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Hadern,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Laim,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Ludwigsvorstadt-Isarvorstadt,0.012766,0.004255,0.0,0.004255,0.0,0.0,0.025532,0.0,0.004255,...,0.004255,0.0,0.004255,0.0,0.004255,0.034043,0.008511,0.008511,0.0,0.0


In [252]:
munich_grouped.shape

(25, 233)

In [258]:
num_top_venues = 5

for hood in munich_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = munich_grouped[munich_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Allach-Untermenzing----
                venue  freq
0                Café  0.10
1  Italian Restaurant  0.09
2                 Bar  0.06
3            Bus Stop  0.06
4              Bakery  0.05


----Altstadt-Lehel----
                venue  freq
0                Café  0.10
1               Hotel  0.07
2  Italian Restaurant  0.05
3                 Bar  0.04
4               Plaza  0.03


----Au-Haidhausen----
                venue  freq
0  Italian Restaurant  0.09
1   German Restaurant  0.06
2         Supermarket  0.05
3                 Bar  0.04
4              Bakery  0.04


----Aubing-Lochhausen-Langwied----
                venue  freq
0                Café  0.12
1                 Bar  0.08
2  Italian Restaurant  0.08
3      Ice Cream Shop  0.05
4              Bakery  0.05


----Berg am Laim----
              venue  freq
0       Supermarket  0.15
1            Bakery  0.12
2         Drugstore  0.09
3  Asian Restaurant  0.06
4             Hotel  0.06


----Bogenhausen----
             

In [253]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [256]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = munich_grouped['Neighborhood']

for ind in np.arange(munich_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(munich_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allach-Untermenzing,Café,Italian Restaurant,Bar,Bus Stop,Bakery,Supermarket,Ice Cream Shop,Hotel,Asian Restaurant,Sushi Restaurant
1,Altstadt-Lehel,Café,Hotel,Italian Restaurant,Bar,Plaza,Asian Restaurant,Coffee Shop,Cocktail Bar,German Restaurant,Burger Joint
2,Au-Haidhausen,Italian Restaurant,German Restaurant,Supermarket,Bar,Bakery,Café,Plaza,Hotel,Pizza Place,Drugstore
3,Aubing-Lochhausen-Langwied,Café,Bar,Italian Restaurant,Ice Cream Shop,Bakery,Hotel,Sushi Restaurant,Asian Restaurant,Gym / Fitness Center,Plaza
4,Berg am Laim,Supermarket,Bakery,Drugstore,Asian Restaurant,Hotel,Italian Restaurant,Fast Food Restaurant,Sandwich Place,Metro Station,Motel


This is where we do the KMeans thing:

In [262]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

munich_grouped_clustering = munich_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(munich_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 3, 2, 0, 1, 2, 4, 4, 1, 3], dtype=int32)

In [277]:
munich_merged = pc_de

munich_merged = munich_merged.merge(neighborhoods_venues_sorted.set_index('Neighborhood'), left_on='Neighbourhood', right_on = "Neighborhood")

munich_merged.head() # check the last columns!

Unnamed: 0,zipcode,Neighbourhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,80331,Altstadt-Lehel,48.1345,11.571,3,Café,Hotel,Italian Restaurant,Bar,Plaza,Asian Restaurant,Coffee Shop,Cocktail Bar,German Restaurant,Burger Joint
1,80333,Altstadt-Lehel,48.1452,11.5668,3,Café,Hotel,Italian Restaurant,Bar,Plaza,Asian Restaurant,Coffee Shop,Cocktail Bar,German Restaurant,Burger Joint
2,80335,Altstadt-Lehel,48.1427,11.5552,3,Café,Hotel,Italian Restaurant,Bar,Plaza,Asian Restaurant,Coffee Shop,Cocktail Bar,German Restaurant,Burger Joint
3,80336,Altstadt-Lehel,48.1345,11.559,3,Café,Hotel,Italian Restaurant,Bar,Plaza,Asian Restaurant,Coffee Shop,Cocktail Bar,German Restaurant,Burger Joint
4,80469,Altstadt-Lehel,48.1299,11.5732,3,Café,Hotel,Italian Restaurant,Bar,Plaza,Asian Restaurant,Coffee Shop,Cocktail Bar,German Restaurant,Burger Joint


In [275]:
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[48.1351, 11.5820], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(munich_merged['latitude'], munich_merged['longitude'], munich_merged['Neighbourhood'], munich_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Results

Not every neighbourhood has a unique name, so let's drop duplicates and have a more readable output:

In [287]:
munich_2 = munich_merged.drop(["latitude", "longitude", "zipcode"], axis = 1).drop_duplicates()

We'll call the Cluster 0 "Living the good life":

In [304]:
munich_2.loc[munich_2['Cluster Labels'] == 0, munich_2.columns[[0,1] + list(range(2, munich_2.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Maxvorstadt,0,Café,Italian Restaurant,Bar,Hotel,Ice Cream Shop,Plaza,Asian Restaurant,Sushi Restaurant,Restaurant,Chinese Restaurant
29,Schwabing-Freimann,0,Café,Italian Restaurant,Bar,Hotel,Bakery,German Restaurant,Sushi Restaurant,Restaurant,Ice Cream Shop,Plaza
64,Schwabing-West,0,Café,Italian Restaurant,Bar,Ice Cream Shop,Sushi Restaurant,Hotel,Plaza,Asian Restaurant,Chinese Restaurant,Bagel Shop
79,Allach-Untermenzing,0,Café,Italian Restaurant,Bar,Bus Stop,Bakery,Supermarket,Ice Cream Shop,Hotel,Asian Restaurant,Sushi Restaurant
84,Aubing-Lochhausen-Langwied,0,Café,Bar,Italian Restaurant,Ice Cream Shop,Bakery,Hotel,Sushi Restaurant,Asian Restaurant,Gym / Fitness Center,Plaza
87,Thalkirchen-Obersendling-Fürstenried-Forstenri...,0,Café,Italian Restaurant,Bar,Supermarket,Bus Stop,Ice Cream Shop,Bakery,Bagel Shop,Sushi Restaurant,Pizza Place
96,Ramersdorf-Perlach,0,Italian Restaurant,Café,Supermarket,Bus Stop,Bar,Ice Cream Shop,Bakery,Lounge,Plaza,Asian Restaurant


Cluster 1 has to be "Enough to survive":

In [305]:
munich_2.loc[munich_2['Cluster Labels'] == 1, munich_2.columns[[0,1] + list(range(2, munich_2.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
42,Moosach,1,Supermarket,Bakery,Bus Stop,Café,Hotel,Electronics Store,Italian Restaurant,Drugstore,Asian Restaurant,Clothing Store
47,Laim,1,Supermarket,Bank,Bakery,Doner Restaurant,Sporting Goods Shop,Sandwich Place,Tram Station,Tanning Salon,Drugstore,Mobile Phone Shop
50,Sendling-Westpark,1,Supermarket,Bakery,Bus Stop,Hotel,Bank,Drugstore,Ice Cream Shop,Café,Doner Restaurant,German Restaurant
55,Pasing-Obermenzing,1,Bakery,Supermarket,Drugstore,Café,Italian Restaurant,Coffee Shop,Bus Stop,Hotel,German Restaurant,Gym / Fitness Center
72,Milbertshofen-Am Hart,1,Bakery,Museum,Metro Station,Rental Car Location,German Restaurant,Gift Shop,Supermarket,Bus Stop,Drugstore,Burger Joint
113,Berg am Laim,1,Supermarket,Bakery,Drugstore,Asian Restaurant,Hotel,Italian Restaurant,Fast Food Restaurant,Sandwich Place,Metro Station,Motel
123,Trudering-Riem,1,Bus Stop,Hotel,Supermarket,Drugstore,Bakery,Indian Restaurant,Golf Course,German Restaurant,Mobile Phone Shop,Motel


Cluster 2 would be "Not for tourists":

In [306]:
munich_2.loc[munich_2['Cluster Labels'] == 2, munich_2.columns[[0,1] + list(range(2, munich_2.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,Neuhausen-Nymphenburg,2,German Restaurant,Café,Indian Restaurant,Italian Restaurant,Bakery,Hotel,Plaza,Tram Station,Supermarket,Drugstore
92,Obergiesing,2,Supermarket,Italian Restaurant,Bus Stop,Bar,German Restaurant,Ice Cream Shop,Bakery,Pizza Place,Plaza,Drugstore
103,Au-Haidhausen,2,Italian Restaurant,German Restaurant,Supermarket,Bar,Bakery,Café,Plaza,Hotel,Pizza Place,Drugstore
110,Untergiesing-Harlaching,2,Italian Restaurant,German Restaurant,Plaza,Café,Drugstore,Park,Taverna,Bar,Greek Restaurant,Doner Restaurant
117,Bogenhausen,2,Italian Restaurant,Supermarket,Bus Stop,Plaza,German Restaurant,Tram Station,Drugstore,Bakery,Café,Burger Joint


While Cluster 3 is definitively "Tourist traps":

In [307]:
munich_2.loc[munich_2['Cluster Labels'] == 3, munich_2.columns[[0,1] + list(range(2, munich_2.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Altstadt-Lehel,3,Café,Hotel,Italian Restaurant,Bar,Plaza,Asian Restaurant,Coffee Shop,Cocktail Bar,German Restaurant,Burger Joint
16,Ludwigsvorstadt-Isarvorstadt,3,Hotel,Café,Bar,Vietnamese Restaurant,Italian Restaurant,Asian Restaurant,Supermarket,Pizza Place,Cocktail Bar,Middle Eastern Restaurant
20,Schwanthalerhöhe,3,Hotel,Asian Restaurant,Bakery,Thai Restaurant,Doner Restaurant,Bavarian Restaurant,Coffee Shop,Wine Bar,Drugstore,Ice Cream Shop
22,Sendling,3,Hotel,Café,Supermarket,Italian Restaurant,Bakery,Vietnamese Restaurant,Bar,Gastropub,German Restaurant,Pizza Place


Cluster 4 is "I live here because the rent is comparably lower":

In [308]:
munich_2.loc[munich_2['Cluster Labels'] == 4, munich_2.columns[[0,1] + list(range(2, munich_2.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
61,Hadern,4,Bus Stop,Supermarket,Bakery,Residential Building (Apartment / Condo),Greek Restaurant,Ice Cream Shop,Asian Restaurant,Chinese Restaurant,Drugstore,Metro Station
76,Feldmoching-Hasenbergl,4,Bus Stop,Supermarket,Indian Restaurant,Bakery,Food & Drink Shop,Greek Restaurant,Beer Garden,Lottery Retailer,Gastropub,Korean Restaurant
