# Segmenting and Clustering Neighborhoods in Toronto

## Part 1: Scraping the Wikipedia page

We start by retrieving the necessary data from Wikipedia. Setup:

In [2]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

webpage = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

Now we can do the scraping:

In [3]:
response = requests.get(url = webpage)

soup = BeautifulSoup(response.content, 'html.parser')

postal_codes = pd.DataFrame([], columns = ["Postal Code", "Borough", "Neighbourhood"])
i = 0

for tr in soup.find("table").find_all("tr"):
    if tr.find_all('td') != []:
        postal_codes.loc[i] = [tr.find_all('td')[0].string[:-1],
                               tr.find_all('td')[1].string[:-1],
                               tr.find_all('td')[2].string[:-1]]
        i += 1

            

In [4]:
postal_codes.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [5]:
postal_codes.shape

(180, 3)

We need to clean the data a bit:

In [6]:
postal_codes = postal_codes.query("Borough != 'Not assigned'")

def conc_nbhd(nbhds):
    if(len(nbhds) == 1):
        return nbhds
    else:
        output = ""
        for nbhd in nbhds:
            output += nbhd
            output += ", "
        return(output[:-2])
    
postal_codes = postal_codes.groupby(["Postal Code", "Borough"]).agg({"Neighbourhood" : (lambda x : conc_nbhd(x))}).reset_index()

postal_codes['Neighbourhood'] = np.where(postal_codes["Neighbourhood"] == "Not assigned",
                                         postal_codes["Borough"],
                                         postal_codes["Neighbourhood"])

Data looks like this now:

In [7]:
postal_codes.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [8]:
postal_codes.shape

(103, 3)

## Part 2: Joining in latitude and longitude

We read in the .csv data with latitudes and longitudes:

In [9]:
geospatial = pd.read_csv("Geospatial_Coordinates.csv")
geospatial.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Now we just join it to the postal codes:

In [10]:
postal_codes = pd.merge(postal_codes, geospatial, how = "left", on = "Postal Code")
postal_codes.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


## Part 3: Clustering and Data Exploring

Since I live in Munich, I thought it would be more fun to do it for Munich instead - I've never even been in Toronto. The neighbourhood data is already available on https://github.com/zauberware/postal-codes-json-xml-csv, and the neighbourhood names are here https://www.muenchen.de/leben/service/postleitzahlen.html, I'll just load everything and get the data ready:

In [224]:
pc_de = pd.read_csv("postal_codes_de.csv")
pc_de = pc_de.query("place == 'München'")
pc_de.head()

Unnamed: 0,country_code,zipcode,place,state,state_code,province,province_code,community,community_code,latitude,longitude
2869,DE,80331,München,Bayern,BY,Upper Bavaria,91,München,9162,48.1345,11.571
2870,DE,80333,München,Bayern,BY,Upper Bavaria,91,München,9162,48.1452,11.5668
2871,DE,80335,München,Bayern,BY,Upper Bavaria,91,München,9162,48.1427,11.5552
2872,DE,80336,München,Bayern,BY,Upper Bavaria,91,München,9162,48.1345,11.559
2873,DE,80337,München,Bayern,BY,Upper Bavaria,91,München,9162,48.1224,11.5449


In [225]:
pc_de.latitude.max(), pc_de.latitude.min(), pc_de.longitude.max(), pc_de.longitude.min()

(48.2225, 48.0827, 11.6972, 11.4141)

In [226]:
stadtteile = pd.read_csv("stadtteile.tsv", sep = "\t")
stadtteile = pd.concat([pd.Series(row['Stadtteil'], row['Postleitzahl'].split(','))              
                    for _, row in stadtteile.iterrows()]).reset_index().rename(columns = {"index" : "zipcode", 0 : "Neighbourhood"})
stadtteile.zipcode = stadtteile.zipcode.apply(pd.to_numeric)
stadtteile.head()

Unnamed: 0,zipcode,Neighbourhood
0,80995,Allach-Untermenzing
1,80997,Allach-Untermenzing
2,80999,Allach-Untermenzing
3,81247,Allach-Untermenzing
4,81249,Allach-Untermenzing


In [227]:
pc_de = pd.merge(pc_de, stadtteile, how = "left", on = "zipcode")
pc_de = pc_de.reset_index()[["zipcode", "Neighbourhood", "latitude", "longitude"]]
pc_de.head()

Unnamed: 0,zipcode,Neighbourhood,latitude,longitude
0,80331,Altstadt-Lehel,48.1345,11.571
1,80333,Altstadt-Lehel,48.1452,11.5668
2,80333,Maxvorstadt,48.1452,11.5668
3,80335,Altstadt-Lehel,48.1427,11.5552
4,80335,Ludwigsvorstadt-Isarvorstadt,48.1427,11.5552


In [229]:
pc_de.to_csv("postal_codes_munich_2.csv", index = False)

It's not really ideal, since zipcodes and neighbourhoods overlap, so let's modify the neighbourhoods to include the zip code information:

In [120]:
pc_de["Neighbourhood"] = pc_de["zipcode"].astype("str")

In [121]:
pc_de.head()

Unnamed: 0,country_code,zipcode,place,state,state_code,province,province_code,community,community_code,latitude,longitude,Neighbourhood
2869,DE,80331,München,Bayern,BY,Upper Bavaria,91,München,9162,48.1345,11.571,80331
2870,DE,80333,München,Bayern,BY,Upper Bavaria,91,München,9162,48.1452,11.5668,80333
2871,DE,80335,München,Bayern,BY,Upper Bavaria,91,München,9162,48.1427,11.5552,80335
2872,DE,80336,München,Bayern,BY,Upper Bavaria,91,München,9162,48.1345,11.559,80336
2873,DE,80337,München,Bayern,BY,Upper Bavaria,91,München,9162,48.1224,11.5449,80337


Here's a map:

In [122]:
import folium

# create map of New York using latitude and longitude values
map_munich = folium.Map(location=[48.1351, 11.5820], zoom_start=12)

# add markers to map
for lat, lng, borough in zip(pc_de['latitude'], pc_de['longitude'], pc_de['Neighbourhood']):
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_munich)  
    
map_munich

I disagree with some of the markers, but let's not question the data, ok?

We move on to Foursquare and run the same analysis as in the Lab. Hopefully the cell with the API won't be shared publicly.

In [123]:
CLIENT_ID = 'UZSSZWSN5S1CICBZM2RI2QIYJKUWUODBRTIFOGLMBWU1TEB5' # your Foursquare ID
CLIENT_SECRET = 'HGUQ35X4OZKUQ4JXNNOWJNXE1K2MAVKENFC2XCKZA3WPERH4' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: UZSSZWSN5S1CICBZM2RI2QIYJKUWUODBRTIFOGLMBWU1TEB5
CLIENT_SECRET:HGUQ35X4OZKUQ4JXNNOWJNXE1K2MAVKENFC2XCKZA3WPERH4


In [124]:
neighborhood_latitude = 48.1345
neighborhood_longitude = 11.5710
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
#create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=UZSSZWSN5S1CICBZM2RI2QIYJKUWUODBRTIFOGLMBWU1TEB5&client_secret=HGUQ35X4OZKUQ4JXNNOWJNXE1K2MAVKENFC2XCKZA3WPERH4&v=20180605&ll=48.1345,11.571&radius=500&limit=100'

In [125]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '60016f3846ba4d41989f4283'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Altstadt',
  'headerFullLocation': 'Altstadt, Munich',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 160,
  'suggestedBounds': {'ne': {'lat': 48.139000004500005,
    'lng': 11.577730159481835},
   'sw': {'lat': 48.1299999955, 'lng': 11.564269840518165}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4e33e9a9e4cdf7a42cab7ac9',
       'name': 'Asamkirche (St. Johann Nepomuk)',
       'location': {'address': 'Sendlinger Str. 32',
        'lat': 48.135053450258326,
        'lng': 11.569746277160712,
        'labeledLatLngs': [{'label': 'display',
       

In [126]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [127]:
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,name,categories,lat,lng
0,Asamkirche (St. Johann Nepomuk),Church,48.135053,11.569746
1,The High,Cocktail Bar,48.133101,11.572939
2,Ringlers,Sandwich Place,48.134097,11.568302
3,TeeGschwendner,Tea Room,48.135398,11.569455
4,Kleinschmecker,German Restaurant,48.134659,11.573565


In [128]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


In [129]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [130]:
munich_venues = getNearbyVenues(names=pc_de['Neighbourhood'],
                                latitudes=pc_de['latitude'],
                                longitudes=pc_de['longitude'])

80331
80333
80335
80336
80337
80339
80469
80538
80539
80634
80636
80637
80638
80639
80686
80687
80689
80796
80797
80798
80799
80801
80802
80803
80804
80805
80807
80809
80933
80935
80937
80939
80992
80993
80995
80997
80999
81241
81243
81245
81247
81249
81369
81371
81373
81375
81377
81379
81475
81476
81477
81479
81539
81541
81543
81545
81547
81549
81667
81669
81671
81673
81675
81677
81679
81735
81737
81739
81825
81827
81829
81925
81927
81929


In [131]:
print(munich_venues.shape)
munich_venues.head()

(1899, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,80331,48.1345,11.571,Asamkirche (St. Johann Nepomuk),48.135053,11.569746,Church
1,80331,48.1345,11.571,The High,48.133101,11.572939,Cocktail Bar
2,80331,48.1345,11.571,Ringlers,48.134097,11.568302,Sandwich Place
3,80331,48.1345,11.571,TeeGschwendner,48.135398,11.569455,Tea Room
4,80331,48.1345,11.571,Kleinschmecker,48.134659,11.573565,German Restaurant


In [132]:
munich_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
80331,100,100,100,100,100,100
80333,65,65,65,65,65,65
80335,37,37,37,37,37,37
80336,73,73,73,73,73,73
80337,38,38,38,38,38,38
...,...,...,...,...,...,...
81827,10,10,10,10,10,10
81829,9,9,9,9,9,9
81925,7,7,7,7,7,7
81927,7,7,7,7,7,7


In [133]:
print('There are {} uniques categories.'.format(len(munich_venues['Venue Category'].unique())))

There are 232 uniques categories.


In [134]:
# one hot encoding
munich_onehot = pd.get_dummies(munich_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
munich_onehot['Neighborhood'] = munich_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [munich_onehot.columns[-1]] + list(munich_onehot.columns[:-1])
munich_onehot = munich_onehot[fixed_columns]

munich_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Austrian Restaurant,...,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vape Store,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,80331,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,80331,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,80331,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,80331,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,80331,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [135]:
munich_onehot.shape

(1899, 233)

In [136]:
munich_grouped = munich_onehot.groupby('Neighborhood').mean().reset_index()
munich_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Austrian Restaurant,...,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vape Store,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,80331,0.010000,0.0,0.0,0.000000,0.000000,0.01,0.010000,0.0,0.000000,...,0.000000,0.0,0.000000,0.0,0.010000,0.000000,0.010000,0.010000,0.0,0.0
1,80333,0.000000,0.0,0.0,0.015385,0.046154,0.00,0.030769,0.0,0.000000,...,0.000000,0.0,0.000000,0.0,0.000000,0.015385,0.000000,0.000000,0.0,0.0
2,80335,0.027027,0.0,0.0,0.000000,0.000000,0.00,0.027027,0.0,0.000000,...,0.027027,0.0,0.000000,0.0,0.000000,0.000000,0.027027,0.000000,0.0,0.0
3,80336,0.000000,0.0,0.0,0.013699,0.000000,0.00,0.013699,0.0,0.000000,...,0.000000,0.0,0.013699,0.0,0.013699,0.013699,0.000000,0.000000,0.0,0.0
4,80337,0.000000,0.0,0.0,0.000000,0.000000,0.00,0.026316,0.0,0.026316,...,0.000000,0.0,0.000000,0.0,0.000000,0.078947,0.000000,0.026316,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
68,81827,0.000000,0.0,0.0,0.000000,0.000000,0.00,0.000000,0.0,0.000000,...,0.000000,0.0,0.000000,0.1,0.000000,0.000000,0.000000,0.000000,0.0,0.0
69,81829,0.000000,0.0,0.0,0.000000,0.000000,0.00,0.000000,0.0,0.000000,...,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.0
70,81925,0.000000,0.0,0.0,0.000000,0.000000,0.00,0.000000,0.0,0.000000,...,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.0
71,81927,0.000000,0.0,0.0,0.000000,0.000000,0.00,0.000000,0.0,0.000000,...,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.0


In [137]:
munich_grouped.shape

(73, 233)

In [138]:
num_top_venues = 5

for hood in munich_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = munich_grouped[munich_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----80331----
                venue  freq
0                Café  0.08
1               Plaza  0.05
2         Coffee Shop  0.04
3  Italian Restaurant  0.04
4               Hotel  0.04


----80333----
            venue  freq
0            Café  0.17
1  History Museum  0.08
2       Nightclub  0.06
3    Burger Joint  0.05
4           Plaza  0.05


----80335----
                 venue  freq
0                Hotel  0.16
1  Bavarian Restaurant  0.05
2               Bakery  0.05
3          Coffee Shop  0.05
4    Afghan Restaurant  0.03


----80336----
                       venue  freq
0                      Hotel  0.26
1                       Café  0.05
2  Middle Eastern Restaurant  0.05
3                  Nightclub  0.04
4               Camera Store  0.04


----80337----
                   venue  freq
0  Vietnamese Restaurant  0.08
1                   Café  0.08
2     Italian Restaurant  0.08
3            Supermarket  0.05
4              Gastropub  0.05


----80339----
              venue  fre

                venue  freq
0            Bus Stop  0.17
1      Ice Cream Shop  0.17
2  Italian Restaurant  0.17
3             Taverna  0.17
4       Metro Station  0.17


----81549----
                  venue  freq
0           Supermarket  0.25
1                Lounge  0.25
2             Drugstore  0.12
3  Fast Food Restaurant  0.12
4          Tennis Court  0.12


----81667----
                venue  freq
0  Italian Restaurant  0.10
1   German Restaurant  0.08
2   French Restaurant  0.06
3                Café  0.06
4               Plaza  0.06


----81669----
                venue  freq
0         Supermarket  0.09
1  Italian Restaurant  0.06
2                 Gym  0.03
3        Burger Joint  0.03
4           Nightclub  0.03


----81671----
               venue  freq
0       Soccer Field  0.25
1        Supermarket  0.25
2  German Restaurant  0.25
3             Bakery  0.25
4  Afghan Restaurant  0.00


----81673----
                venue  freq
0         Supermarket  0.13
1    Asian Restaur

In [139]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [216]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = munich_grouped['Neighborhood']

for ind in np.arange(munich_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(munich_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,80331,Café,Plaza,Coffee Shop,Italian Restaurant,Hotel,German Restaurant,Clothing Store,Church,Cocktail Bar,Bavarian Restaurant
1,80333,Café,History Museum,Nightclub,Burger Joint,Plaza,Art Museum,Restaurant,Italian Restaurant,Botanical Garden,Asian Restaurant
2,80335,Hotel,Bavarian Restaurant,Bakery,Coffee Shop,Afghan Restaurant,Spa,Mediterranean Restaurant,Sushi Restaurant,Brewery,Supermarket
3,80336,Hotel,Café,Middle Eastern Restaurant,Nightclub,Camera Store,Italian Restaurant,Mexican Restaurant,Burger Joint,Hostel,Indie Movie Theater
4,80337,Vietnamese Restaurant,Café,Italian Restaurant,Supermarket,Gastropub,Deli / Bodega,Seafood Restaurant,Gym,Grocery Store,German Restaurant


In [217]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 10

munich_grouped_clustering = munich_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(munich_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 0, 2, 2, 2, 2, 0, 2, 0, 2], dtype=int32)

In [218]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

munich_merged = pc_de

munich_merged = munich_merged.merge(neighborhoods_venues_sorted.set_index('Neighborhood'), left_on='Neighbourhood', right_on = "Neighborhood")

munich_merged.head() # check the last columns!

Unnamed: 0,country_code,zipcode,place,state,state_code,province,province_code,community,community_code,latitude,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,DE,80331,München,Bayern,BY,Upper Bavaria,91,München,9162,48.1345,...,Café,Plaza,Coffee Shop,Italian Restaurant,Hotel,German Restaurant,Clothing Store,Church,Cocktail Bar,Bavarian Restaurant
1,DE,80333,München,Bayern,BY,Upper Bavaria,91,München,9162,48.1452,...,Café,History Museum,Nightclub,Burger Joint,Plaza,Art Museum,Restaurant,Italian Restaurant,Botanical Garden,Asian Restaurant
2,DE,80335,München,Bayern,BY,Upper Bavaria,91,München,9162,48.1427,...,Hotel,Bavarian Restaurant,Bakery,Coffee Shop,Afghan Restaurant,Spa,Mediterranean Restaurant,Sushi Restaurant,Brewery,Supermarket
3,DE,80336,München,Bayern,BY,Upper Bavaria,91,München,9162,48.1345,...,Hotel,Café,Middle Eastern Restaurant,Nightclub,Camera Store,Italian Restaurant,Mexican Restaurant,Burger Joint,Hostel,Indie Movie Theater
4,DE,80337,München,Bayern,BY,Upper Bavaria,91,München,9162,48.1224,...,Vietnamese Restaurant,Café,Italian Restaurant,Supermarket,Gastropub,Deli / Bodega,Seafood Restaurant,Gym,Grocery Store,German Restaurant


In [219]:
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[48.1351, 11.5820], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(munich_merged['latitude'], munich_merged['longitude'], munich_merged['Neighbourhood'], munich_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [154]:
munich_2 = munich_merged.drop(["latitude", "longitude", "zipcode"], axis = 1).drop_duplicates()

In [155]:
munich_2.loc[munich_2['Cluster Labels'] == 0, munich_2.columns[[0,1] + list(range(2, munich_2.shape[1]))]]

Unnamed: 0,country_code,place,state,state_code,province,province_code,community,community_code,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,80331,0,Café,Plaza,Coffee Shop,Italian Restaurant,Hotel,German Restaurant,Clothing Store,Church,Cocktail Bar,Bavarian Restaurant
1,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,80333,0,Café,History Museum,Nightclub,Burger Joint,Plaza,Art Museum,Restaurant,Italian Restaurant,Botanical Garden,Asian Restaurant
2,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,80335,0,Hotel,Bavarian Restaurant,Bakery,Coffee Shop,Afghan Restaurant,Spa,Mediterranean Restaurant,Sushi Restaurant,Brewery,Supermarket
3,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,80336,0,Hotel,Café,Middle Eastern Restaurant,Nightclub,Camera Store,Italian Restaurant,Mexican Restaurant,Burger Joint,Hostel,Indie Movie Theater
4,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,80337,0,Vietnamese Restaurant,Café,Italian Restaurant,Supermarket,Gastropub,Deli / Bodega,Seafood Restaurant,Gym,Grocery Store,German Restaurant
5,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,80339,0,Hotel,Asian Restaurant,Thai Restaurant,Doner Restaurant,Pizza Place,Bakery,Drugstore,Tapas Restaurant,Boutique,Bus Stop
6,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,80469,0,Café,Bar,Cocktail Bar,Vietnamese Restaurant,Asian Restaurant,Ice Cream Shop,Pizza Place,Restaurant,Coffee Shop,French Restaurant
7,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,80538,0,Italian Restaurant,German Restaurant,Bar,Indian Restaurant,Market,Outdoor Sculpture,Pastry Shop,Performing Arts Venue,Persian Restaurant,Plaza
8,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,80539,0,Café,Italian Restaurant,Bar,Ice Cream Shop,Bagel Shop,Chinese Restaurant,Sushi Restaurant,Asian Restaurant,Plaza,River
9,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,80634,0,Indian Restaurant,Café,Hotel,Drugstore,Bakery,German Restaurant,Sushi Restaurant,Supermarket,Ice Cream Shop,Vietnamese Restaurant


In [156]:
munich_2.loc[munich_2['Cluster Labels'] == 1, munich_2.columns[[0,1] + list(range(2, munich_2.shape[1]))]]

Unnamed: 0,country_code,place,state,state_code,province,province_code,community,community_code,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
30,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,80937,1,Bus Stop,Mexican Restaurant,Metro Station,German Restaurant,Afghan Restaurant,Optical Shop,Motel,Movie Theater,Museum,Music School
45,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,81375,1,Bus Stop,Metro Station,Greek Restaurant,German Restaurant,Asian Restaurant,Residential Building (Apartment / Condo),Movie Theater,Museum,Optical Shop,Music School
55,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,81547,1,Bus Stop,Ice Cream Shop,Italian Restaurant,Taverna,Metro Station,German Restaurant,Music Store,Noodle House,Nightclub,Music Venue


In [157]:
munich_2.loc[munich_2['Cluster Labels'] == 2, munich_2.columns[[0,1] + list(range(2, munich_2.shape[1]))]]

Unnamed: 0,country_code,place,state,state_code,province,province_code,community,community_code,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,80686,2,Supermarket,Bank,Bakery,Mobile Phone Shop,Laundromat,Sandwich Place,Organic Grocery,Metro Station,Chinese Restaurant,German Restaurant
16,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,80689,2,Supermarket,Bus Stop,Home Service,Bakery,Shop & Service,Drugstore,Noodle House,Nightclub,Music Venue,Music Store
26,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,80807,2,Bakery,Supermarket,Burger Joint,Drugstore,Cocktail Bar,Food & Drink Shop,Soccer Field,Bus Stop,Gastropub,German Restaurant
29,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,80935,2,Bus Stop,Supermarket,Food & Drink Shop,Greek Restaurant,Bakery,Optical Shop,Monument / Landmark,Motel,Movie Theater,Museum
32,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,80992,2,Supermarket,Ice Cream Shop,Plaza,Hotel,Metro Station,Bakery,Trattoria/Osteria,Afghan Restaurant,Music Store,Noodle House
34,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,80995,2,Indian Restaurant,Bus Stop,Supermarket,Lottery Retailer,Modern Greek Restaurant,Motel,Movie Theater,Museum,Music School,Music Store
35,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,80997,2,Bus Stop,Supermarket,Bakery,Afghan Restaurant,Organic Grocery,Monument / Landmark,Motel,Movie Theater,Museum,Music School
39,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,81245,2,Bakery,Supermarket,German Restaurant,Soccer Field,Post Office,Photography Studio,Hotel,Bistro,Music Venue,Music Store
40,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,81247,2,Bakery,Bus Stop,Hostel,German Restaurant,Supermarket,Tennis Court,Drugstore,Italian Restaurant,Discount Store,Hotel
42,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,81369,2,Supermarket,Hotel,Drugstore,Tunnel,Bakery,Pet Store,Greek Restaurant,Italian Restaurant,Museum,Movie Theater


In [158]:
munich_2.loc[munich_2['Cluster Labels'] == 3, munich_2.columns[[0,1] + list(range(2, munich_2.shape[1]))]]

Unnamed: 0,country_code,place,state,state_code,province,province_code,community,community_code,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
28,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,80933,3,Gastropub,Beer Garden,Afghan Restaurant,Modern European Restaurant,Monument / Landmark,Motel,Movie Theater,Museum,Music School,Music Store


In [159]:
munich_2.loc[munich_2['Cluster Labels'] == 4, munich_2.columns[[0,1] + list(range(2, munich_2.shape[1]))]]

Unnamed: 0,country_code,place,state,state_code,province,province_code,community,community_code,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
38,DE,München,Bayern,BY,Upper Bavaria,91,München,9162,81243,4,Gym / Fitness Center,Bus Stop,Soccer Field,Bakery,Afghan Restaurant,Motel,Movie Theater,Museum,Music School,Music Store
