 # Berlin or Munich 

## Clustering the Neighbourhoods of Berlin and Munich

Rekha Remadevi


08 march 2021

### Introduction

It is quite difficult to find two more dramatically contrasting cities within the same country than Berlin and Munich.

Munich and Berlin are two of the best cities in Germany, and both have a lot to offer. Berlin is not only Germany’s capital and largest city; it is also the cultural hub of the nation. One of the most fascinating cities in Europe, Berlin is vibrant and edgy and is Germany’s centre for fashion, art and culture. Berlin’s nightlife is famously impressive, known as the techno capital of the world. Berlin is a city of culinary delight offering a wide variety of food ranging from traditional German food to American burgers, at an affordable price.

Munich is the wealthy capital of Bavaria and the gateway to the Alps. It is said to be one of the most beautiful and charming cities in all of Germany and is filled with museums and beautiful architecture. It is most famous for being the centre of Oktoberfest festivities, which attracts over 6 million visitors every year.

### Business Problem

A person wants to relocate to Germany, and he is considering moving and buying an apartment within the neighborhoods of Berlin or Munich. In order to make a decision, he aims to obtain some information about the neighborhoods/ districts in Berlin and Munich. This information also helps tourists to choose their destinations depending on the experiences that the neighborhoods have to offer and what they would like to experience. 

### Data Description

In terms of data, it is of paramount importance to obtain geographical location data for both Berlin and Munich. Postal codes in each city serve as a starting point. Using Postal codes, one can find out the neighborhoods, boroughs, venues and their most popular venue categories.

### Berlin

To derive our solution, We scrape our data from  http://www.places-in-germany.com/14356-places-within-a-radius-of-15km-around-berlin.html  



This www.Places-in-Germany.com page has information about some states in Germany

1 Borough: Name of Neighbourhood


2 Zip code: Postal codes for Belin

The data will be scraped with a tool called Beautiful Soup and directly transformed to Pandas and Data Frame. It needs some cleaning while gathering only the boroughs of the city.

### Munich

The postal code and district names of all districts in Munich are required to solve the task. The data published at https://www.muenchen.de/int/en/living/postal-codes.html is used in order to fetch the necessary data. 

The data is fetched by using the pandas library and the built-in pd. read HTML () function. This function scrapes the data available on the website and stores all tables in data frames.

1.District: Districts name

2.Postal Code: Postal code for Districts in Munich 

### Geodata

The python geopy library is used for getting the latitude and longitude values. This library only requires the name of a neighborhood and accepts also a postal code and returns the latitude and longitude values for the given address.

### Venue data

As a next step, the available top 100 venues shall be fetched for each postal code. For this problem, we will get the services of Foursquare API to explore the data of two cities, in terms of their neighborhoods. The data also include the information about the places around each neighborhood like restaurants, hotels, coffee shops, parks, theaters, art galleries, museums and many more. We selected one Borough from each city to analyze their neighborhoods.

We will use machine learning technique, Clustering to segment the neighborhoods with similar objects on the basis of each neighborhood data. These objects will be given priority on the basis of foot traffic (activity) in their respective neighborhoods. This will help to locate the tourist’s areas and hubs, and then we can judge the similarity or dissimilarity between two cities on that basis.

### Methodology

In [80]:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import numpy as np
import json
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize
from bs4 import BeautifulSoup
import requests
import geopy as geo
import geopandas as gpd
import folium 
from sklearn.cluster import KMeans

import matplotlib.cm as cm
import matplotlib.colors as colors

pd.options.mode.chained_assignment = None  
print('Imports done!')

Imports done!


## Exploring Berlin

### Neighbourhoods of Berlin

We begin to start collecting and refining the data needed for the our business solution to work.

### Data Collection

To get the neighbourhoods in Berlin, we start by scraping the list of areas of Berlin using the page http://www.places-in-germany.com/

In [3]:
url='http://www.places-in-germany.com/14356-places-within-a-radius-of-15km-around-berlin.html'
req=requests.get(url)
soup=BeautifulSoup(req.text,"html.parser")
table = soup.find_all('table')
df=pd.read_html(str(table), header=0)[0]

In [4]:
df

Unnamed: 0,Distance,Route,Postal code / Place,Population
0,1.2 km (0.8 miles),,10115 Mitte,79582
1,2.1 km (1.3 miles),,10119 Prenzlauer Berg,140881
2,2.8 km (1.7 miles),,10115 Mitte,333534
3,2.9 km (1.8 miles),,13347 Gesundbrunnen,82110
4,3.3 km (2.0 miles),,10243 Friedrichshain-Kreuzberg,269398
...,...,...,...,...
93,14.6 km (9.1 miles),,12459 Köpenick,59201
94,14.8 km (9.2 miles),,12524 Altglienicke,26101
95,14.9 km (9.3 miles),,16341 Schwanebeck bei Bernau bei Berlin,-
96,15.0 km (9.3 miles),,12305 Lichtenrade,49451


 Selecting only the column with the postal codes

### Data Preprocessing

In [5]:
df.rename({'Postal code / Place':'Borough'},axis=1, inplace=True)

In [6]:
berlin=df.Borough.str.split(expand=True) # Spliting Zip-Code and Place name

In [7]:
# concatenating Borough again
borough=[]
for name, values in berlin.iterrows():
    #print(name, values[0], values[1],values[2])
    if values[2] is None:
        borough.append(values[1])
    else:
        borough.append(values[1] + ' ' + values[2])

In [8]:
berlin['Borough']=borough

In [9]:
berlin.rename({0:'Zipcode'}, axis=1, inplace=True)

In [10]:

berlin

Unnamed: 0,Zipcode,1,2,3,4,5,Borough
0,10115,Mitte,,,,,Mitte
1,10119,Prenzlauer,Berg,,,,Prenzlauer Berg
2,10115,Mitte,,,,,Mitte
3,13347,Gesundbrunnen,,,,,Gesundbrunnen
4,10243,Friedrichshain-Kreuzberg,,,,,Friedrichshain-Kreuzberg
...,...,...,...,...,...,...,...
93,12459,Köpenick,,,,,Köpenick
94,12524,Altglienicke,,,,,Altglienicke
95,16341,Schwanebeck,bei,Bernau,bei,Berlin,Schwanebeck bei
96,12305,Lichtenrade,,,,,Lichtenrade


### Feature Selection

We need only the boroughs,Zipcodes for further steps. 

In [11]:
#Keeping only coloumn with Zip Code and Borough
berlin=berlin[['Zipcode', 'Borough']]
berlin.shape

(98, 2)

In [12]:
#Removing Duplicates
berlin.drop_duplicates(inplace=True)

In [13]:
berlin.shape

(97, 2)

In [14]:
#Dropping empty cells
berlin.dropna(axis=0, inplace=True) 
berlin.shape

(96, 2)

### Geolocations of the Beriln Neighbourhoods

####  latidude and longitude from Geocoders and Nominatim¶

 The python geopy library is used for getting the latitude and longitude values. This library only requires the name of a neighborhood and accepts also a postal code and returns the latitude and longitude values for the given address.

In [17]:
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter

lat=[]
lon=[]

geolocator = Nominatim(user_agent='markus.gorges')

for line, boroughs in berlin.iterrows():
    
    try:
        adress= boroughs[0],' Berlin ',boroughs[1]
        location = geolocator.geocode(adress)
        #print(location)
        lat.append(location.latitude)
        lon.append(location.longitude)
    except:
        lat.append(np.nan)
        lon.append(np.nan)
    
berlin['latitude']=lat
berlin['longitude']=lon

In [18]:
berlin.dropna(axis=0, inplace=True) #Drop empty cells
#berlin.to_csv('Ausgabe.csv', sep=';', decimal=',', index=True)
berlin.shape

(95, 4)

### Visualize the Map of Berlin


To help visualize the Map of Berlin and the neighbourhoods in London, we make use of the folium package.

In [19]:
#Creating the Map of Berlin
address = 'Berlin'

geolocator = Nominatim(user_agent="markus.gorges")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Berlin are {}, {}.'.format(latitude, longitude))



# Creating Folium Map
map_berlin = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough in zip(berlin['latitude'], berlin['longitude'], berlin['Borough']):
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_berlin)  
    
map_berlin

The geograpical coordinate of Berlin are 52.5170365, 13.3888599.


### Importing Venue Data Using  Foursquare API

To proceed with the next part, we need to define Foursquare API credentials.

Using Foursquare API, we are able to get the venue and venue categories around each neighbourhood in Berlin.

In [21]:
CLIENT_ID = 'EX2BNQTN2XIVF1UZY2CX5T13AHTW5BNGQEVHZABLYY3JZPC5' 
CLIENT_SECRET = 'KC0VSTIG5QAT1YXSW3GQP5PFDFIEPA3NCVXWUOXTEP43OOQ3' 
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value
radius=500

Defining a function to get the neraby venues in the neighbourhood. This will help us get venue categories which is important for our analysis

In [23]:
#Def for getting all Venues from Foursquare

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Getting the Venues in Berlin

In [42]:
venues_berlin = getNearbyVenues(names=berlin['Borough'],
                                   latitudes=berlin['latitude'],
                                   longitudes=berlin['longitude']
                                  )

Mitte
Prenzlauer Berg
Gesundbrunnen
Friedrichshain-Kreuzberg
Friedrichshain
Tiergarten
Wedding
Moabit
Kreuzberg
Hansaviertel
Fennpfuhl
Alt-Treptow
Weißensee
Pankow
Heinersdorf
Neukölln
Lichtenberg
Schöneberg
Lichtenberg
Rummelsburg
Niederschönhausen
Tempelhof
Stadtrandsiedlung Malchow
Reinickendorf
Plänterwald
Charlottenburg
Wilhelmsruh
Alt-Hohenschönhausen
Wilmersdorf
Charlottenburg-Nord
Friedenau
Neu-Hohenschönhausen
Charlottenburg-Wilmersdorf
Friedrichsfelde
Malchow
Rosenthal
Baumschulenweg
Tempelhof-Schöneberg
Halensee
Blankenburg
Grunewald
Märkisches Viertel
Britz
Steglitz
Westend
Karlshorst
Wartenberg
Schmargendorf
Wittenau
Siemensstadt
Französisch Buchholz
Marzahn
Mariendorf
Biesdorf
Falkenberg
Tegel
Niederschöneweide
Lübars
Oberschöneweide
Waidmannslust
Lankwitz
Blankenfelde
Dahlem
Karow
Johannisthal
Haselhorst
Gropiusstadt
Buckow
Lichterfelde
Hermsdorf
Kaulsdorf
Marzahn-Hellersdorf
Schildow
Marienfelde
Hellersdorf
Ahrensfelde
Ahrensfelde bei
Eiche bei
Glienicke /
Rudow
Adlersh

#### Sampling The Data

In [43]:
venues_berlin.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Mitte,52.517012,13.388822,Dussmann das KulturKaufhaus,52.518312,13.388708,Bookstore
1,Mitte,52.517012,13.388822,Dussmann English Bookshop,52.518223,13.389239,Bookstore
2,Mitte,52.517012,13.388822,Cookies Cream,52.516569,13.388008,Vegetarian / Vegan Restaurant
3,Mitte,52.517012,13.388822,Freundschaft,52.518294,13.390344,Wine Bar
4,Mitte,52.517012,13.388822,Komische Oper,52.515968,13.386701,Opera House


In [44]:
venues_berlin.shape

(1513, 7)

There are 10567 records exsisting for venues, which will make the clustering very interesting.

### Grouping by Venue Categories

looking for  how many Venue Categories are there for further processing

In [45]:
venues_berlin.groupby('Neighborhood').head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Mitte,52.517012,13.388822,Dussmann das KulturKaufhaus,52.518312,13.388708,Bookstore
1,Mitte,52.517012,13.388822,Dussmann English Bookshop,52.518223,13.389239,Bookstore
2,Mitte,52.517012,13.388822,Cookies Cream,52.516569,13.388008,Vegetarian / Vegan Restaurant
3,Mitte,52.517012,13.388822,Freundschaft,52.518294,13.390344,Wine Bar
4,Mitte,52.517012,13.388822,Komische Oper,52.515968,13.386701,Opera House
...,...,...,...,...,...,...,...
1507,Steglitz-Zehlendorf,52.429205,13.229974,Rossosiena,52.431397,13.231050,Italian Restaurant
1508,Steglitz-Zehlendorf,52.429205,13.229974,Villa Medici,52.429884,13.224213,Italian Restaurant
1509,Steglitz-Zehlendorf,52.429205,13.229974,Trattoria Piazza Siciliana,52.430530,13.229966,Italian Restaurant
1510,Steglitz-Zehlendorf,52.429205,13.229974,H Clauertstraße,52.427399,13.229574,Bus Stop


We can see 424 records. This shows how diverse and vibrant is Berlin and its neighbourhoods.

In [46]:
venues_berlin.groupby('Venue Category').max()

Unnamed: 0_level_0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ATM,Marzahn-Hellersdorf,52.572848,13.587663,Berliner Volksbank,52.571720,13.587203
Adult Boutique,Kreuzberg,52.539773,13.385951,Other Nature,52.535850,13.386317
African Restaurant,Schöneberg,52.482157,13.355190,Sahara Sudanesische Spezialitäten,52.479845,13.351810
American Restaurant,Märkisches Viertel,52.599312,13.450290,Tony Roma's,52.597617,13.453561
Argentinian Restaurant,Schöneberg,52.484340,13.523585,Steakhouse Barbecue,52.486614,13.527388
...,...,...,...,...,...,...
Wine Bar,Schöneberg,52.517012,13.435350,Weinverein Rote Insel,52.518294,13.433147
Wine Shop,Rummelsburg,52.632392,13.483514,Weinladen,52.632129,13.480473
Women's Store,Buckow,52.418662,13.428950,Esprit Gropiuspassagen,52.421385,13.427098
Yoga Studio,Zehlendorf,52.528634,13.450290,sunyoga Friedrichshain,52.529607,13.448852


### Looking for Unique Venues

In [48]:
venues_berlin['Venue Category'].unique()

array(['Bookstore', 'Vegetarian / Vegan Restaurant', 'Wine Bar',
       'Opera House', 'Hotel', 'Cocktail Bar', 'Exhibit',
       'Clothing Store', 'Gourmet Shop', 'Restaurant', 'Souvenir Shop',
       'Gym', 'Chocolate Shop', 'Cosmetics Shop', 'Beer Garden',
       'Modern European Restaurant', 'Coffee Shop', 'Plaza',
       'Sandwich Place', 'Boutique', 'Department Store',
       'Outdoor Sculpture', 'Concert Hall', 'Italian Restaurant',
       'Monument / Landmark', 'Roof Deck', 'Electronics Store',
       'Optical Shop', 'Historic Site', 'Church', 'German Restaurant',
       'Art Museum', 'Drugstore', 'Hotel Bar', 'Theater', 'Memorial Site',
       'Ice Cream Shop', 'Music Venue', 'Pharmacy', 'Art Gallery',
       'Supermarket', 'Beer Bar', 'Café', 'Movie Theater',
       'Gym / Fitness Center', 'Deli / Bodega', 'Cooking School', 'Park',
       'Bistro', 'Yoga Studio', 'Fountain', 'Spa', 'Soup Place',
       'Israeli Restaurant', 'Tea Room', 'Thai Restaurant', 'Dive Bar',
       'H

In [50]:
venues_berlin.loc[:'Venue Category'].value_counts()

Neighborhood       Neighborhood Latitude  Neighborhood Longitude  Venue                   Venue Latitude  Venue Longitude  Venue Category        
Lichtenberg        52.532161              13.511893               Comfort Hotel           52.533387       13.517571        Hotel                     2
                                                                  Möbel Höffner           52.533778       13.506196        Furniture / Home Store    2
                                                                  Globus Baumarkt         52.533271       13.509472        Hardware Store            2
                                                                  H Schalkauer Straße     52.535226       13.510215        Tram Station              2
                                                                  IKEA                    52.534204       13.513304        Furniture / Home Store    2
                                                                                                   

### One Hot Encoding


We need to Encode our venue categories to get a better result for our clustering

In [53]:
berlin_venue_cat = pd.get_dummies(venues_berlin[['Venue Category']], prefix="", prefix_sep="")
berlin_venue_cat

Unnamed: 0,ATM,Adult Boutique,African Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Austrian Restaurant,Auto Dealership,...,Volleyball Court,Warehouse Store,Waterfall,Waterfront,Windmill,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo Exhibit
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1508,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1509,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1510,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1511,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Adding Neighbourhood into the mix.

In [55]:
berlin_venue_cat['Neighborhood'] = venues_berlin['Neighborhood'] 

# moving neighborhood column to the first column
fixed_columns = [berlin_venue_cat.columns[-1]] + list(berlin_venue_cat.columns[:-1])
berlin_venue_catt = berlin_venue_cat[fixed_columns]

berlin_venue_cat.head()

Unnamed: 0,ATM,Adult Boutique,African Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Austrian Restaurant,Auto Dealership,...,Warehouse Store,Waterfall,Waterfront,Windmill,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo Exhibit,Neighborhood
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Mitte
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Mitte
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Mitte
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,Mitte
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Mitte


### Venue categories mean value


We will group the Neighbourhoods and calculate the mean venue categories value in each Neighbourhood

In [56]:
berlin_grouped = berlin_venue_cat.groupby('Neighborhood').mean().reset_index()
berlin_grouped.head()

Unnamed: 0,Neighborhood,ATM,Adult Boutique,African Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Austrian Restaurant,...,Volleyball Court,Warehouse Store,Waterfall,Waterfront,Windmill,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo Exhibit
0,Adlershof,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Ahrensfelde,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Ahrensfelde bei,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Alt-Hohenschönhausen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Alt-Treptow,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Let's make a function to get the top most common venue categories

In [57]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

There are way too many venue categories, we can take the top 15 to cluster the neighbourhoods.

Creating a function to label the columns of the venue correctly

In [61]:

num_top_venues = 15
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = berlin_grouped['Neighborhood']

for ind in np.arange(berlin_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(berlin_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,Adlershof,Supermarket,Italian Restaurant,Trattoria/Osteria,Greek Restaurant,Pizza Place,Drugstore,Steakhouse,Furniture / Home Store,Gaming Cafe,Fried Chicken Joint,Fish Market,French Restaurant,Fountain,Food Court,Food & Drink Shop
1,Ahrensfelde,Supermarket,Gas Station,Zoo Exhibit,Gay Bar,Garden Center,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Fountain,Food Court,Food & Drink Shop,Flower Shop,Fishing Store
2,Ahrensfelde bei,Supermarket,Gas Station,Zoo Exhibit,Gay Bar,Garden Center,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Fountain,Food Court,Food & Drink Shop,Flower Shop,Fishing Store
3,Alt-Hohenschönhausen,Tram Station,Supermarket,Indian Restaurant,Drugstore,Asian Restaurant,Coffee Shop,Greek Restaurant,Big Box Store,Fried Chicken Joint,Food Court,Fountain,French Restaurant,Zoo Exhibit,Flower Shop,Furniture / Home Store
4,Alt-Treptow,Bakery,Italian Restaurant,Nightclub,Mexican Restaurant,Tapas Restaurant,Big Box Store,Garden Center,Bus Stop,Snack Place,Café,Outdoor Sculpture,Seafood Restaurant,Paper / Office Supplies Store,Electronics Store,Drugstore


### Model Building


#### K Means


Let's cluster the city of london to roughly 5 to make it easier to analyze.

We use the K Means clustering technique to do so.

In [63]:
# Set number of clusters
k_num_clusters = 5

berlin_grouped_clustering = berlin_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=k_num_clusters, random_state=0).fit(berlin_grouped_clustering)
kmeans

KMeans(n_clusters=5, random_state=0)

### Labelling Clustered Data

In [64]:
kmeans.labels_[0:100]

array([0, 4, 4, 0, 2, 4, 0, 2, 2, 2, 2, 4, 0, 0, 2, 2, 0, 2, 0, 0, 2, 0,
       2, 2, 0, 2, 0, 0, 2, 2, 2, 0, 4, 2, 0, 2, 0, 4, 0, 0, 2, 2, 0, 2,
       0, 2, 2, 0, 2, 0, 2, 2, 0, 2, 2, 2, 4, 2, 2, 0, 0, 2, 0, 2, 2, 0,
       0, 2, 3, 2, 1, 2, 2, 2, 0, 2, 0, 2, 2, 0, 2, 2, 0, 0, 0, 2, 2, 0,
       2, 2, 2], dtype=int32)

##### Let's add the clustering Label column to the top 15 common venue categories

In [65]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

#### Join Berlin_grouped with combined_data on neighbourhood to add latitude & longitude for each neighborhood to prepare it for plotting

In [68]:
Berlin_merged = venues_berlin

Berlin_merged = Berlin_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

Berlin_merged.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,...,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,Mitte,52.517012,13.388822,Dussmann das KulturKaufhaus,52.518312,13.388708,Bookstore,2,Hotel,Italian Restaurant,...,Department Store,Clothing Store,Cosmetics Shop,Bookstore,Boutique,Opera House,Plaza,Concert Hall,Restaurant,Supermarket
1,Mitte,52.517012,13.388822,Dussmann English Bookshop,52.518223,13.389239,Bookstore,2,Hotel,Italian Restaurant,...,Department Store,Clothing Store,Cosmetics Shop,Bookstore,Boutique,Opera House,Plaza,Concert Hall,Restaurant,Supermarket
2,Mitte,52.517012,13.388822,Cookies Cream,52.516569,13.388008,Vegetarian / Vegan Restaurant,2,Hotel,Italian Restaurant,...,Department Store,Clothing Store,Cosmetics Shop,Bookstore,Boutique,Opera House,Plaza,Concert Hall,Restaurant,Supermarket
3,Mitte,52.517012,13.388822,Freundschaft,52.518294,13.390344,Wine Bar,2,Hotel,Italian Restaurant,...,Department Store,Clothing Store,Cosmetics Shop,Bookstore,Boutique,Opera House,Plaza,Concert Hall,Restaurant,Supermarket
4,Mitte,52.517012,13.388822,Komische Oper,52.515968,13.386701,Opera House,2,Hotel,Italian Restaurant,...,Department Store,Clothing Store,Cosmetics Shop,Bookstore,Boutique,Opera House,Plaza,Concert Hall,Restaurant,Supermarket


Drop all the NaN values to prevent data skew

In [69]:
Berlin_merged_nonan =Berlin_merged.dropna(subset=['Cluster Labels'])

### Plotting the clusters on the map

In [72]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k_num_clusters)
ys = [i + x + (i*x)**2 for i in range(k_num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Berlin_merged_nonan['Venue Latitude'], Berlin_merged_nonan['Venue Longitude'], Berlin_merged_nonan['Neighborhood'], Berlin_merged_nonan['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(int(cluster) +1) + '\n' + str(poi) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)]
        ).add_to(map_clusters)
        
map_clusters

### Let's verify each of our clusters

In [None]:
# cluster 1

In [167]:
cluster0 = Berlin_merged_nonan.loc[Berlin_merged_nonan['Cluster Labels'] == 0, Berlin_merged_nonan.columns[[1] + list(range(5, Berlin_merged_nonan.shape[1]))]]
cluster0['1st Most Common Venue'].value_counts()

Supermarket             139
Bakery                   31
Tram Station             28
Metro Station            19
Platform                 15
Bus Stop                 12
Gastropub                12
Café                     12
Soccer Field             12
Italian Restaurant       10
Drugstore                10
Track                     9
Climbing Gym              9
Light Rail Station        8
German Restaurant         4
Restaurant                3
Fast Food Restaurant      2
Name: 1st Most Common Venue, dtype: int64

In [73]:
Berlin_merged_nonan.loc[Berlin_merged_nonan['Cluster Labels'] == 0,
                         Berlin_merged_nonan.columns[[1] + list(range(5, Berlin_merged_nonan.shape[1]))]]

Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
321,52.550123,13.341583,Ice Cream Shop,0,Track,Gas Station,Tram Station,Park,Bus Stop,Ice Cream Shop,Big Box Store,Supermarket,Tennis Court,Food Court,Food & Drink Shop,Zoo Exhibit,Fountain,French Restaurant,Flower Shop
322,52.550123,13.338853,Park,0,Track,Gas Station,Tram Station,Park,Bus Stop,Ice Cream Shop,Big Box Store,Supermarket,Tennis Court,Food Court,Food & Drink Shop,Zoo Exhibit,Fountain,French Restaurant,Flower Shop
323,52.550123,13.345990,Big Box Store,0,Track,Gas Station,Tram Station,Park,Bus Stop,Ice Cream Shop,Big Box Store,Supermarket,Tennis Court,Food Court,Food & Drink Shop,Zoo Exhibit,Fountain,French Restaurant,Flower Shop
324,52.550123,13.347554,Supermarket,0,Track,Gas Station,Tram Station,Park,Bus Stop,Ice Cream Shop,Big Box Store,Supermarket,Tennis Court,Food Court,Food & Drink Shop,Zoo Exhibit,Fountain,French Restaurant,Flower Shop
325,52.550123,13.338990,Tennis Court,0,Track,Gas Station,Tram Station,Park,Bus Stop,Ice Cream Shop,Big Box Store,Supermarket,Tennis Court,Food Court,Food & Drink Shop,Zoo Exhibit,Fountain,French Restaurant,Flower Shop
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1508,52.429205,13.224213,Italian Restaurant,0,Italian Restaurant,Bus Stop,Liquor Store,Fishing Store,Gas Station,Garden Center,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Fountain,Food Court,Food & Drink Shop,Flower Shop
1509,52.429205,13.229966,Italian Restaurant,0,Italian Restaurant,Bus Stop,Liquor Store,Fishing Store,Gas Station,Garden Center,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Fountain,Food Court,Food & Drink Shop,Flower Shop
1510,52.429205,13.229574,Bus Stop,0,Italian Restaurant,Bus Stop,Liquor Store,Fishing Store,Gas Station,Garden Center,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Fountain,Food Court,Food & Drink Shop,Flower Shop
1511,52.429205,13.224909,Bus Stop,0,Italian Restaurant,Bus Stop,Liquor Store,Fishing Store,Gas Station,Garden Center,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Fountain,Food Court,Food & Drink Shop,Flower Shop


In [None]:
# Cluster 2

In [168]:
cluster1 = Berlin_merged_nonan.loc[Berlin_merged_nonan['Cluster Labels'] == 1, Berlin_merged_nonan.columns[[1] + list(range(5, Berlin_merged_nonan.shape[1]))]]
cluster1['1st Most Common Venue'].value_counts()

Pharmacy    1
Name: 1st Most Common Venue, dtype: int64

In [76]:
Berlin_merged_nonan.loc[Berlin_merged_nonan['Cluster Labels'] == 1,
                         Berlin_merged_nonan.columns[[1] + list(range(5, Berlin_merged_nonan.shape[1]))]]

Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
1501,52.630412,13.508478,Pharmacy,1,Pharmacy,Zoo Exhibit,Fish Market,Gas Station,Garden Center,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Fountain,Food Court,Food & Drink Shop,Flower Shop,Fishing Store


In [None]:
#Cluster 3 

In [169]:
cluster2 = Berlin_merged_nonan.loc[Berlin_merged_nonan['Cluster Labels'] == 2, Berlin_merged_nonan.columns[[1] + list(range(5, Berlin_merged_nonan.shape[1]))]]
cluster2['1st Most Common Venue'].value_counts()

Café                      453
Hotel                     105
Bar                        74
Bakery                     73
Italian Restaurant         69
Coffee Shop                55
Turkish Restaurant         49
Park                       43
Drugstore                  42
German Restaurant          31
Sushi Restaurant           29
Restaurant                 29
Furniture / Home Store     27
Clothing Store             16
Gym / Fitness Center       15
Lounge                     10
Tram Station                7
Nature Preserve             4
Greek Restaurant            4
Zoo Exhibit                 4
Sporting Goods Shop         4
Lake                        3
Name: 1st Most Common Venue, dtype: int64

In [77]:
Berlin_merged_nonan.loc[Berlin_merged_nonan['Cluster Labels'] == 2,
                         Berlin_merged_nonan.columns[[1] + list(range(5, Berlin_merged_nonan.shape[1]))]]

Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,52.517012,13.388708,Bookstore,2,Hotel,Italian Restaurant,Coffee Shop,German Restaurant,Wine Bar,Department Store,Clothing Store,Cosmetics Shop,Bookstore,Boutique,Opera House,Plaza,Concert Hall,Restaurant,Supermarket
1,52.517012,13.389239,Bookstore,2,Hotel,Italian Restaurant,Coffee Shop,German Restaurant,Wine Bar,Department Store,Clothing Store,Cosmetics Shop,Bookstore,Boutique,Opera House,Plaza,Concert Hall,Restaurant,Supermarket
2,52.517012,13.388008,Vegetarian / Vegan Restaurant,2,Hotel,Italian Restaurant,Coffee Shop,German Restaurant,Wine Bar,Department Store,Clothing Store,Cosmetics Shop,Bookstore,Boutique,Opera House,Plaza,Concert Hall,Restaurant,Supermarket
3,52.517012,13.390344,Wine Bar,2,Hotel,Italian Restaurant,Coffee Shop,German Restaurant,Wine Bar,Department Store,Clothing Store,Cosmetics Shop,Bookstore,Boutique,Opera House,Plaza,Concert Hall,Restaurant,Supermarket
4,52.517012,13.386701,Opera House,2,Hotel,Italian Restaurant,Coffee Shop,German Restaurant,Wine Bar,Department Store,Clothing Store,Cosmetics Shop,Bookstore,Boutique,Opera House,Plaza,Concert Hall,Restaurant,Supermarket
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1496,52.453910,13.572462,Tram Station,2,Clothing Store,Gym / Fitness Center,Drugstore,Bakery,Tram Station,Sporting Goods Shop,German Restaurant,Café,Food Court,Health Food Store,Burger Joint,Indian Restaurant,Falafel Restaurant,Discount Store,Garden
1497,52.453910,13.570137,Gym / Fitness Center,2,Clothing Store,Gym / Fitness Center,Drugstore,Bakery,Tram Station,Sporting Goods Shop,German Restaurant,Café,Food Court,Health Food Store,Burger Joint,Indian Restaurant,Falafel Restaurant,Discount Store,Garden
1498,52.453910,13.578013,Health Food Store,2,Clothing Store,Gym / Fitness Center,Drugstore,Bakery,Tram Station,Sporting Goods Shop,German Restaurant,Café,Food Court,Health Food Store,Burger Joint,Indian Restaurant,Falafel Restaurant,Discount Store,Garden
1499,52.453910,13.578351,Food Court,2,Clothing Store,Gym / Fitness Center,Drugstore,Bakery,Tram Station,Sporting Goods Shop,German Restaurant,Café,Food Court,Health Food Store,Burger Joint,Indian Restaurant,Falafel Restaurant,Discount Store,Garden


In [None]:
# cluster 4

In [171]:
cluster3 = Berlin_merged_nonan.loc[Berlin_merged_nonan['Cluster Labels'] == 3, Berlin_merged_nonan.columns[[1] + list(range(5, Berlin_merged_nonan.shape[1]))]]
cluster3['1st Most Common Venue'].value_counts()

Miscellaneous Shop    1
Name: 1st Most Common Venue, dtype: int64

In [78]:
Berlin_merged_nonan.loc[Berlin_merged_nonan['Cluster Labels'] == 3,
                         Berlin_merged_nonan.columns[[1] + list(range(5, Berlin_merged_nonan.shape[1]))]]

Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
1306,52.627445,13.386192,Miscellaneous Shop,3,Miscellaneous Shop,Zoo Exhibit,Gay Bar,Gas Station,Garden Center,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Fountain,Food Court,Food & Drink Shop,Flower Shop,Fishing Store


In [None]:
# cluster 5

In [172]:
cluster4 = Berlin_merged_nonan.loc[Berlin_merged_nonan['Cluster Labels'] == 4, Berlin_merged_nonan.columns[[1] + list(range(5, Berlin_merged_nonan.shape[1]))]]
cluster4['1st Most Common Venue'].value_counts()

Supermarket    30
Name: 1st Most Common Venue, dtype: int64

In [79]:
Berlin_merged_nonan.loc[Berlin_merged_nonan['Cluster Labels'] == 4,
                         Berlin_merged_nonan.columns[[1] + list(range(5, Berlin_merged_nonan.shape[1]))]]

Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
549,52.572848,13.431064,Fish Market,4,Supermarket,ATM,Tram Station,Chinese Restaurant,Fish Market,Doner Restaurant,Donut Shop,Garden Center,Dessert Shop,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Fountain
550,52.572848,13.443919,Supermarket,4,Supermarket,ATM,Tram Station,Chinese Restaurant,Fish Market,Doner Restaurant,Donut Shop,Garden Center,Dessert Shop,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Fountain
551,52.572848,13.437083,Tram Station,4,Supermarket,ATM,Tram Station,Chinese Restaurant,Fish Market,Doner Restaurant,Donut Shop,Garden Center,Dessert Shop,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Fountain
552,52.572848,13.436969,Chinese Restaurant,4,Supermarket,ATM,Tram Station,Chinese Restaurant,Fish Market,Doner Restaurant,Donut Shop,Garden Center,Dessert Shop,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Fountain
553,52.572848,13.44318,ATM,4,Supermarket,ATM,Tram Station,Chinese Restaurant,Fish Market,Doner Restaurant,Donut Shop,Garden Center,Dessert Shop,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Fountain
554,52.572848,13.430937,Supermarket,4,Supermarket,ATM,Tram Station,Chinese Restaurant,Fish Market,Doner Restaurant,Donut Shop,Garden Center,Dessert Shop,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Fountain
889,52.566331,13.516705,Supermarket,4,Supermarket,Movie Theater,Lounge,Zoo Exhibit,Fishing Store,Gas Station,Garden Center,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Fountain,Food Court,Food & Drink Shop
890,52.566331,13.508935,Movie Theater,4,Supermarket,Movie Theater,Lounge,Zoo Exhibit,Fishing Store,Gas Station,Garden Center,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Fountain,Food Court,Food & Drink Shop
891,52.566331,13.519378,Supermarket,4,Supermarket,Movie Theater,Lounge,Zoo Exhibit,Fishing Store,Gas Station,Garden Center,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Fountain,Food Court,Food & Drink Shop
892,52.566331,13.509772,Lounge,4,Supermarket,Movie Theater,Lounge,Zoo Exhibit,Fishing Store,Gas Station,Garden Center,Garden,Gaming Cafe,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Fountain,Food Court,Food & Drink Shop


## Exploring Munich

#### Districts of Munich

#### Data Collection 

 Let's get the postal codes of munich in germany.

In [104]:
url = 'https://www.muenchen.de/int/en/living/postal-codes.html'
munich_data_list = pd.read_html(url)
munich_data = munich_data_list[0]
munich_data

Unnamed: 0,District,Postal Code
0,Allach-Untermenzing,"80995, 80997, 80999, 81247, 81249"
1,Altstadt-Lehel,"80331, 80333, 80335, 80336, 80469, 80538, 80539"
2,Au-Haidhausen,"81541, 81543, 81667, 81669, 81671, 81675, 81677"
3,Aubing-Lochhausen-Langwied,"81243, 81245, 81249"
4,Berg am Laim,"81671, 81673, 81735, 81825"
5,Bogenhausen,"81675, 81677, 81679, 81925, 81927, 81929"
6,Feldmoching-Hasenbergl,"80933, 80935, 80995"
7,Hadern,"80689, 81375, 81377"
8,Laim,"80686, 80687, 80689"
9,Ludwigsvorstadt-Isarvorstadt,"80335, 80336, 80337, 80469"


#### Let's get the geographical coordinates of Munich.

In [81]:
address = 'Munich'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude_munich = location.latitude
longitude_munich = location.longitude
print('The geograpical coordinate of Munich are {}, {}.'.format(latitude_munich, longitude_munich))

The geograpical coordinate of Munich are 48.1371079, 11.5753822.


### Data Preprocessing



##### Let's start to preprocess the data in order to have a acceptable data format

##### First step: Split all places according to their postal codes

In [101]:
munich_data_cleaned = pd.DataFrame(columns=['District', 'Postal Code'])
munich_data_cleaned

Unnamed: 0,District,Postal Code


In [105]:
items = []
for idx, codes in enumerate(munich_data['Postal Code']):
    code_list = codes.split(',')
    district = munich_data['District'][idx]
    for element in code_list:
        element = element.replace(' ', '')
        items.append({'District': district, 'Postal Code': element})

In [106]:
munich_data_clean = munich_data_cleaned.append(items)
munich_data_clean.head()

Unnamed: 0,District,Postal Code
0,Allach-Untermenzing,80995
1,Allach-Untermenzing,80997
2,Allach-Untermenzing,80999
3,Allach-Untermenzing,81247
4,Allach-Untermenzing,81249


#### Let's now fetch all latitude and longitude values for each Postal Code by using the Foursquare API

In [93]:
CLIENT_ID = 'EX2BNQTN2XIVF1UZY2CX5T13AHTW5BNGQEVHZABLYY3JZPC5'
CLIENT_SECRET = 'KC0VSTIG5QAT1YXSW3GQP5PFDFIEPA3NCVXWUOXTEP43OOQ3' 
VERSION = '20200410' # Foursquare API version
LIMIT = 100

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: EX2BNQTN2XIVF1UZY2CX5T13AHTW5BNGQEVHZABLYY3JZPC5
CLIENT_SECRET:KC0VSTIG5QAT1YXSW3GQP5PFDFIEPA3NCVXWUOXTEP43OOQ3


##### Create new dataframe additionally containing the latitude and longitude values of each district and postal code mapping

In [143]:

data_munich = pd.DataFrame(columns=['District', 'Postal Code', 'Latitude', 'Longitude'])

# loop over all entries of old data frame and store according values
items = []
for idx, district in enumerate(munich_data_clean['District']):
    code = munich_data_clean['Postal Code'][idx]
    address = district + ', ' + code # to get format of address

    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    items.append({'District': district, 
                  'Postal Code': code,
                  'Latitude': latitude,
                  'Longitude': longitude})

In [144]:
data_munich = data_munich.append(items)
data_munich.head()

Unnamed: 0,District,Postal Code,Latitude,Longitude
0,Allach-Untermenzing,80995,48.195157,11.462973
1,Allach-Untermenzing,80997,48.195157,11.462973
2,Allach-Untermenzing,80999,48.195157,11.462973
3,Allach-Untermenzing,81247,48.195157,11.462973
4,Allach-Untermenzing,81249,48.195157,11.462973


### Visualize the Map of Munich

 By using the folium package, we visualize the Map of Munich and its districts

In [145]:
# create map of munich using latitude and longitude values
map_munich = folium.Map(location=[data_munich["Latitude"].iloc[0], data_munich["Longitude"].iloc[0]], zoom_start=11)

# add markers to map
for lat, lng, district in zip(data_munich['Latitude'], data_munich['Longitude'], data_munich['District']):
    label = '{}'.format(district)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_munich)  
    
map_munich

### Venues in Munich


 Let's now explore all districts in munich by fetching venues in the near of each district with the help of the foursquare API.

In [146]:
# function for getting all venues of munich
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Getting the Venues in Munich

In [147]:
munich_venues = getNearbyVenues(names=data_munich['District'],
                                   latitudes=data_munich['Latitude'],
                                   longitudes=data_munich['Longitude']
                                  )

Allach-Untermenzing
Allach-Untermenzing
Allach-Untermenzing
Allach-Untermenzing
Allach-Untermenzing
Altstadt-Lehel
Altstadt-Lehel
Altstadt-Lehel
Altstadt-Lehel
Altstadt-Lehel
Altstadt-Lehel
Altstadt-Lehel
Au-Haidhausen
Au-Haidhausen
Au-Haidhausen
Au-Haidhausen
Au-Haidhausen
Au-Haidhausen
Au-Haidhausen
Aubing-Lochhausen-Langwied
Aubing-Lochhausen-Langwied
Aubing-Lochhausen-Langwied
Berg am Laim
Berg am Laim
Berg am Laim
Berg am Laim
Bogenhausen
Bogenhausen
Bogenhausen
Bogenhausen
Bogenhausen
Bogenhausen
Feldmoching-Hasenbergl
Feldmoching-Hasenbergl
Feldmoching-Hasenbergl
Hadern
Hadern
Hadern
Laim
Laim
Laim
Ludwigsvorstadt-Isarvorstadt
Ludwigsvorstadt-Isarvorstadt
Ludwigsvorstadt-Isarvorstadt
Ludwigsvorstadt-Isarvorstadt
Maxvorstadt
Maxvorstadt
Maxvorstadt
Maxvorstadt
Maxvorstadt
Maxvorstadt
Maxvorstadt
Maxvorstadt
Maxvorstadt
Milbertshofen-Am Hart
Milbertshofen-Am Hart
Milbertshofen-Am Hart
Milbertshofen-Am Hart
Moosach
Moosach
Moosach
Moosach
Moosach
Neuhausen-Nymphenburg
Neuhausen-Nym

In [148]:
 #lets get the shape of the new dataframe
munich_venues.shape

(3376, 7)

We have scraped together 3376 records for venues. 

Sampling our data

In [149]:
# Lets visualize the head of the new dataframe
munich_venues.head()

Unnamed: 0,District,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Allach-Untermenzing,48.195157,11.462973,Bäckerei Schuhmair,48.197175,11.459016,Bakery
1,Allach-Untermenzing,48.195157,11.462973,Sport Bittl,48.191447,11.466553,Sporting Goods Shop
2,Allach-Untermenzing,48.195157,11.462973,dm-drogerie markt,48.194118,11.46564,Drugstore
3,Allach-Untermenzing,48.195157,11.462973,Sicilia,48.193331,11.459387,Italian Restaurant
4,Allach-Untermenzing,48.195157,11.462973,Lidl,48.194428,11.465612,Supermarket


#####  Let's find out how many unique categories can be curated from all the returned venues



In [150]:
print('There are {} uniques categories.'.format(len(munich_venues['Venue Category'].unique())))

There are 168 uniques categories.


#### Grouping by Venue Categories
We need to now see how many Venue Categories are there for further processing

In [121]:
# Let's check how many venues were returned for each district

In [151]:
munich_venues.groupby('District').count()

Unnamed: 0_level_0,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allach-Untermenzing,40,40,40,40,40,40
Altstadt-Lehel,700,700,700,700,700,700
Au-Haidhausen,252,252,252,252,252,252
Berg am Laim,23,23,23,23,23,23
Bogenhausen,72,72,72,72,72,72
Feldmoching-Hasenbergl,3,3,3,3,3,3
Hadern,36,36,36,36,36,36
Laim,63,63,63,63,63,63
Ludwigsvorstadt-Isarvorstadt,376,376,376,376,376,376
Maxvorstadt,396,396,396,396,396,396


### One Hot Encoding
We need to Encode our venue categories to get a better result for our clustering

### Analyze each District

Now lets analyze each district in order to get an idea of venues in the districts.

In [152]:
# lets get a one hot encoding of all differen
munich_onehot = pd.get_dummies(munich_venues[['Venue Category']], prefix="", prefix_sep="")
munich_onehot
# add District column to dataframe
munich_onehot.insert(0, 'District', data_munich['District'])
munich_onehot.head()

Unnamed: 0,District,Afghan Restaurant,American Restaurant,Arcade,Art Museum,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Auto Dealership,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Bavarian Restaurant,Beach,Beach Bar,Beer Garden,Beer Store,Big Box Store,Bistro,Board Shop,Bookstore,Boutique,Breakfast Spot,Brewery,Burger Joint,Burrito Place,Bus Line,Bus Stop,Business Service,Café,Candy Store,Chinese Restaurant,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall,Convenience Store,Costume Shop,Cultural Center,Cupcake Shop,Currywurst Joint,Deli / Bodega,Department Store,Dim Sum Restaurant,Diner,Doner Restaurant,Drugstore,Electronics Store,English Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish Market,Food,Food & Drink Shop,Food Court,Fountain,French Restaurant,Garden,Gas Station,Gastropub,General Entertainment,German Restaurant,Gourmet Shop,Greek Restaurant,Grilled Meat Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hawaiian Restaurant,Hill,Historic Site,Hookah Bar,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kebab Restaurant,Lake,Laundry Service,Light Rail Station,Liquor Store,Lounge,Manti Place,Market,Martial Arts School,Men's Store,Metro Station,Middle Eastern Restaurant,Mobile Phone Shop,Modern European Restaurant,Motel,Movie Theater,Museum,Music Venue,Nightclub,Opera House,Optical Shop,Organic Grocery,Outdoor Sculpture,Park,Pastry Shop,Pet Store,Pharmacy,Photography Studio,Pizza Place,Playground,Plaza,Pool,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,River,Rock Club,Roof Deck,Salad Place,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shipping Store,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Soup Place,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Tapas Restaurant,Taverna,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Tram Station,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Water Park,Wine Bar,Wine Shop,Yoga Studio
0,Allach-Untermenzing,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Allach-Untermenzing,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Allach-Untermenzing,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Allach-Untermenzing,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Allach-Untermenzing,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [153]:
munich_grouped = munich_onehot.groupby('District').mean().reset_index()
munich_grouped 
munich_grouped.head(10)

Unnamed: 0,District,Afghan Restaurant,American Restaurant,Arcade,Art Museum,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Auto Dealership,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Bavarian Restaurant,Beach,Beach Bar,Beer Garden,Beer Store,Big Box Store,Bistro,Board Shop,Bookstore,Boutique,Breakfast Spot,Brewery,Burger Joint,Burrito Place,Bus Line,Bus Stop,Business Service,Café,Candy Store,Chinese Restaurant,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comic Shop,Concert Hall,Convenience Store,Costume Shop,Cultural Center,Cupcake Shop,Currywurst Joint,Deli / Bodega,Department Store,Dim Sum Restaurant,Diner,Doner Restaurant,Drugstore,Electronics Store,English Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish Market,Food,Food & Drink Shop,Food Court,Fountain,French Restaurant,Garden,Gas Station,Gastropub,General Entertainment,German Restaurant,Gourmet Shop,Greek Restaurant,Grilled Meat Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hawaiian Restaurant,Hill,Historic Site,Hookah Bar,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kebab Restaurant,Lake,Laundry Service,Light Rail Station,Liquor Store,Lounge,Manti Place,Market,Martial Arts School,Men's Store,Metro Station,Middle Eastern Restaurant,Mobile Phone Shop,Modern European Restaurant,Motel,Movie Theater,Museum,Music Venue,Nightclub,Opera House,Optical Shop,Organic Grocery,Outdoor Sculpture,Park,Pastry Shop,Pet Store,Pharmacy,Photography Studio,Pizza Place,Playground,Plaza,Pool,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Restaurant,River,Rock Club,Roof Deck,Salad Place,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shipping Store,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Soup Place,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Tapas Restaurant,Taverna,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Tram Station,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Water Park,Wine Bar,Wine Shop,Yoga Studio
0,Allach-Untermenzing,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Altstadt-Lehel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Au-Haidhausen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Aubing-Lochhausen-Langwied,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Berg am Laim,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Bogenhausen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Feldmoching-Hasenbergl,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Hadern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Laim,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Ludwigsvorstadt-Isarvorstadt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [154]:
munich_grouped.shape

(25, 169)

Let's make a function to get the top most common venue categories

In [155]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

There are way too many venue categories, we can take the top 15 to cluster the Districts.

Creating a function to label the columns of the venue correctly

In [156]:
num_top_venues = 15

for hood in munich_grouped['District']:
    print("----"+hood+"----")
    temp = munich_grouped[munich_grouped['District'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Allach-Untermenzing----
                  venue  freq
0                Bakery   0.2
1    Italian Restaurant   0.2
2           Supermarket   0.2
3             Drugstore   0.2
4   Sporting Goods Shop   0.2
5     Outdoor Sculpture   0.0
6         Movie Theater   0.0
7                Museum   0.0
8           Music Venue   0.0
9             Nightclub   0.0
10          Opera House   0.0
11         Optical Shop   0.0
12      Organic Grocery   0.0
13    Afghan Restaurant   0.0
14                 Park   0.0


----Altstadt-Lehel----
                  venue  freq
0             Drugstore  0.29
1   Sporting Goods Shop  0.14
2    Italian Restaurant  0.14
3           Supermarket  0.14
4       Automotive Shop  0.14
5                Bakery  0.14
6     Outdoor Sculpture  0.00
7         Movie Theater  0.00
8                Museum  0.00
9           Music Venue  0.00
10            Nightclub  0.00
11          Opera House  0.00
12         Optical Shop  0.00
13      Organic Grocery  0.00
14    Afghan Rest

In [157]:

num_top_venues = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
district_venues_sorted = pd.DataFrame(columns=columns)
district_venues_sorted['District'] = munich_grouped['District']

for ind in np.arange(munich_grouped.shape[0]):
    district_venues_sorted.iloc[ind, 1:] = return_most_common_venues(munich_grouped.iloc[ind, :], num_top_venues)

district_venues_sorted.head()

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allach-Untermenzing,Sporting Goods Shop,Italian Restaurant,Supermarket,Bakery,Drugstore,Deli / Bodega,Currywurst Joint,Food,Fish Market,Fast Food Restaurant
1,Altstadt-Lehel,Drugstore,Sporting Goods Shop,Italian Restaurant,Supermarket,Automotive Shop,Bakery,Department Store,Dim Sum Restaurant,Food & Drink Shop,Food
2,Au-Haidhausen,Supermarket,Drugstore,Sporting Goods Shop,Automotive Shop,Bakery,English Restaurant,Food & Drink Shop,Food,Fish Market,Fast Food Restaurant
3,Aubing-Lochhausen-Langwied,Drugstore,Supermarket,Italian Restaurant,Electronics Store,Food,Fish Market,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space
4,Berg am Laim,Sporting Goods Shop,Bakery,Supermarket,Automotive Shop,Food & Drink Shop,Food,Fish Market,Fast Food Restaurant,Farmers Market,Falafel Restaurant


### Model Building

#### K Means

Let's cluster the city of london to roughly 5 to make it easier to analyze.

We use the K Means clustering technique to do so.

#### Clustering Neighborhoods

Now that we have an overview about the data and made some first explorations, it's time to cluster the neighborhoods in order to get an idea about the types of neighborhoods and which district seems to be similar to which other districts.

In [158]:
num_clusters = 5

X = munich_grouped.drop('District', 1)

kmeans = KMeans(n_clusters=num_clusters, random_state=0).fit(X)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [159]:
# add clustering labels
district_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

munich_merged = data_munich

# merge labels and data about venues to district data and latitude plus longitude data to have all in one dataframe
munich_merged = munich_merged.join(district_venues_sorted.set_index('District'), on='District')

munich_merged.head()

Unnamed: 0,District,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allach-Untermenzing,80995,48.195157,11.462973,1,Sporting Goods Shop,Italian Restaurant,Supermarket,Bakery,Drugstore,Deli / Bodega,Currywurst Joint,Food,Fish Market,Fast Food Restaurant
1,Allach-Untermenzing,80997,48.195157,11.462973,1,Sporting Goods Shop,Italian Restaurant,Supermarket,Bakery,Drugstore,Deli / Bodega,Currywurst Joint,Food,Fish Market,Fast Food Restaurant
2,Allach-Untermenzing,80999,48.195157,11.462973,1,Sporting Goods Shop,Italian Restaurant,Supermarket,Bakery,Drugstore,Deli / Bodega,Currywurst Joint,Food,Fish Market,Fast Food Restaurant
3,Allach-Untermenzing,81247,48.195157,11.462973,1,Sporting Goods Shop,Italian Restaurant,Supermarket,Bakery,Drugstore,Deli / Bodega,Currywurst Joint,Food,Fish Market,Fast Food Restaurant
4,Allach-Untermenzing,81249,48.195157,11.462973,1,Sporting Goods Shop,Italian Restaurant,Supermarket,Bakery,Drugstore,Deli / Bodega,Currywurst Joint,Food,Fish Market,Fast Food Restaurant


### Visualizing the clustered Districts


Finally, let's visualize the resulting clusters

In [160]:
# create map
map_clusters = folium.Map(location=[latitude_munich, longitude_munich], zoom_start=11)

# set color scheme for the clusters
indian_red = '#CD5C5C'
blue = '#2980B9'
purple = '#5B2C6F'
gold = '#F1C40F'
green = '#239B56'
x = np.arange(num_clusters)
rainbow = [indian_red, blue, purple, gold, green]
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(munich_merged['Latitude'], munich_merged['Longitude'], munich_merged['District'], munich_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examining the  Clusters

As last step, each cluster shall be examined according to its most frequent venues and the cluster names shall be given accordingly.

In [162]:
# first: lets examine the green cluster (number zero)
cluster0 = munich_merged.loc[munich_merged['Cluster Labels'] == 0, munich_merged.columns[[1] + list(range(5, munich_merged.shape[1]))]]
cluster0['1st Most Common Venue'].value_counts()

Sporting Goods Shop    4
Plaza                  3
Name: 1st Most Common Venue, dtype: int64

In [163]:
# next: lets examine the indian red cluster (number one)
cluster1 = munich_merged.loc[munich_merged['Cluster Labels'] == 1, munich_merged.columns[[1] + list(range(5, munich_merged.shape[1]))]]
cluster1['1st Most Common Venue'].value_counts()

Drugstore              19
Supermarket             7
Sporting Goods Shop     5
Bakery                  3
Name: 1st Most Common Venue, dtype: int64

In [164]:
# next: lets examine the blue cluster (number two)
cluster2 = munich_merged.loc[munich_merged['Cluster Labels'] == 2, munich_merged.columns[[1] + list(range(5, munich_merged.shape[1]))]]
cluster2['1st Most Common Venue'].value_counts()

Hotel                19
Plaza                17
Coffee Shop          12
German Restaurant     7
Electronics Store     5
Boutique              4
Fountain              4
Café                  4
Name: 1st Most Common Venue, dtype: int64

In [165]:
# Now Lets examine the purple cluster (number three)
cluster3 = munich_merged.loc[munich_merged['Cluster Labels'] == 3, munich_merged.columns[[1] + list(range(5, munich_merged.shape[1]))]]
cluster3['1st Most Common Venue'].value_counts()

Afghan Restaurant    5
Hotel                4
Plaza                3
Name: 1st Most Common Venue, dtype: int64

In [166]:
# Now lets examine the yellow cluster (number four)
cluster4 = munich_merged.loc[munich_merged['Cluster Labels'] == 4, munich_merged.columns[[1] + list(range(5, munich_merged.shape[1]))]]
cluster4['1st Most Common Venue'].value_counts()

Restaurant    2
Name: 1st Most Common Venue, dtype: int64

## Results and Discussion

The neighbourhoods of Berlin are very mulitcultural. Berlin seems to take a step further in this direction by having a lot of supermarket, restaurants, bars, coffee shops, ice cream shops, drug store and clothing store. It has a lot of shopping options too with that of the fish markets, garden center, gaming café, desert shop, book store and sporting goods shop. The main modes of transport seem to be trams and buses. For leisure, the neighbourhoods are set up to have lots of parks, zoo, opera and historic sites. After a further exploration of the clusters, each cluster can get a most common venue. Therefore the clusters can be named by its most common venue.In our case , in the cluster number zero the most common venues  are Supermarket and Bakery. It has the greatest number of tram station, metro station and bus top. This cluster also has café, soccer field, climbing gym and German restaurants. This cloud especially fit to families having kids.The cluster number three has only one shop. In the last cluster the most common venue is supermarket.


Overall, the city of Berlin offers a multicultural, diverse and certainly an entertaining experience.

Munich is only one third the size of Berlin in terms of city area. It has a wide variety of cusines and eateries including Italian, Currywurst joint , Asian, Chinese etc. There are a lot of hangout spots including many Restaurants and Bars. Munich has a lot of Sporting goods shop. As one can see, the blue cluster is the most common cluster in Munich and therefore Munich seems to have a lot of similar districts in the city. In our case the cluster number zero has most Sporting goods shop. In cluster number one drugstore and supermarket are the most common venues and it's therefore the Drugstore+Supermarket Cluster. The cluster number two has a lot of hotels and plazas. This cluster is my favorite, since it has a lot of coffee shops. In the the cluster number three the most common venues are Afghan restaurants. And the last cluster is smallest cluster and it has one restaurant.

## Conclusion

The purpose of this project was to explore the cities of Berlin and Munich and see how attractive it is to potential migrants and tourists. We explored both the cities based on their postal codes and then extrapolated the common venues present in each of the neighbourhoods finally concluding with clustering similar neighbourhoods together.

The neighbourhoods of Berlin and Munich have more like similar venues. We could see that each of the neighbourhoods in both the cities have a wide variety of experiences to offer which is unique in its own way. The dissimilarity exists in terms of some different venues and facilities but not to a larger extent. The cultural diversity is quite evident which also gives the feeling of a sense of inclusion. Overall, it's up to the stakeholders to decide which experience they would prefer more, and which would suit their tastes better.

### References

1.	Munich Neighbourhood clustering using the K – means Algorithm 
https://medium.com/@brus.patrick63/munich-neighborhood-clustering-using-k-means-cde98a6e3199?sk=d55c0f9773160e217f3da572bb594c60 

2.	IBM Capstone Project — The Battle of Neighbourhoods in Berlin: Restaurants 

https://medium.com/@lyan.fu.fly/ibm-capstone-project-the-battle-of-neighborhoods-in-berlin-restaurants-dd326f1bfacb![image.png](attachment:image.png)


3.	A Tale of Two Cities: Clustering Neighbourhoods of London and Paris using Machine Learning
https://medium.com/analytics-vidhya/a-tale-of-two-cities-clustering-neighborhoods-of-london-and-paris-5328f69cd8b6![image-2.png](attachment:image-2.png)

4.	Foursquare API