# A new court in town

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

A small Spanish investment fund has spotted a growing interest in **racquet sports** among Europeans. One of such sports, **padel**, currently most popular in Spain and Hispanic American countries, is now beginning to spread rapidly across Europe and other continents. Padel is easier to play than tennis, so it is easy to convince and engage potential players. However, this sport requires a **special court** to play and it is still hard to find one in most European cities.

  
<center><img src="https://padeladdict.com/wp-content/uploads/2020/01/mejores-puntos-del-2019-world-padel-tour.jpg" alt="Padel" width="500"/></center>

The managers of the fund have analysed different European cities and they have noticed that padel is becoming particulary popular in **Munich, Germany**, where the scarce infrastractures available for this sport seem to be always booked in spite of their high prices. Nevertheless, the managers have never played the sport and they have never been in Munich either. As a result, they ignore what **kind of neighborhood would be fitting for a padel court or where in Munich they would find such a neighborhood**. They suspect that good transport connections are essential. But what would padel players rather have in the proximity? Sports stores or other sports facilities? Supermarkets or restaurants? And once that is settled, which neighborhoods of Munich would be the best match for the padel court?

One of the requirements of the investment fund is for the court to be profitable in the short term, so **expensive central areas should be avoided** at all costs. Also, they want to minimize risks and spare costs by locating their new infrastructure in **a neighborhood similar to those where an equivalent sports facility has proven to be successful** but at the same time they want **to keep clear of competitors in the area**. This report aims to use data science methods to provide a handful of neighborhood candidates to the investment fund.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decision are:
* Location of Munich neighborhoods.
* Public venues in each neighborhood.
* Venue categories near sports facilities.

The city of Munich is divided into 25 boroughs ("Bezirke"), which in turn can be further broken down into 108 neighborhoods ("Bezirksteile"). The neighborhood is in this case the unit selected for exploration.

Following data sources will be needed to extract/generate the required information:
* An **online public list** with the names of the different neighborhoods.
* Centers of candidate neighborhoods will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **GeoPy's Nominatim reverse geocoding**
* Number of public venues and their type and location in every neighborhood will be obtained using **Foursquare API**

### Neighborhoods

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

# uncomment next line if you haven't done this installation already
#!conda install -c conda-forge geocoder --yes
#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# uncomment next line if you haven't done this installation already
#!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

For this report, we will need to find a latitude and a longitude for each of the neighborhoods. Unlike other cities, in Munich there is no one to one relation between neighborhoods and postal codes, so we will rely only on the **names of the neighborhoods** to extract the required data by means of **reverse geocoding**. The names may be found in a csv file in the **open data portal of the city of Munich**:

In [2]:
df = pd.read_csv("https://www.opengov-muenchen.de/dataset/15aa8720-cd0c-45db-ba37-d90924a9dc5c/resource/ffcdc8ec-1fc4-4186-9fbf-9114448e80a7/download/bevoelkerunghws201712stadtbezirksteile.csv")
df = df.rename(columns={"bezirksteil_bezeichnung": "Bezirksteil"})
df = df[["Bezirksteil"]]
df["Bezirksteil"] = df["Bezirksteil"].str[5:]
df = df.drop_duplicates()
print(df.shape)
df.head()

(108, 1)


Unnamed: 0,Bezirksteil
0,Graggenau
1,Angerviertel
2,Hackenviertel
3,Kreuzviertel
4,Lehel


The required data was extracted from the csv after dropping all unnecessary information and possible duplicates.
Next, we will use the dataframe and the Nominatim geolocator to **obtain latitude and longitude** of each of the neighborhoods. In exceptional cases, the geolocator will not be able to find a neighborhood just by the name. These cases (two) are negligible and although it is not ideal, due to the lack of alternative data sources they will be dropped without expecting a major impact on the analysis.

In [3]:
for idx,bezirksteil in zip(df.index,df["Bezirksteil"]):
    locator = Nominatim(user_agent="myGeocoder")
    location = locator.geocode("{}, Munich, Germany".format(bezirksteil))
    try:
        df.loc[idx,"Latitude"] = location.latitude
        df.loc[idx,"Longitude"] = location.longitude
    except:
        df.loc[idx,"Latitude"] = 0
        df.loc[idx,"Longitude"] = 0
df = df.drop(df[df.Latitude == 0].index)
df = df.reset_index(drop=True)
print(df.shape)
df.head()

(106, 3)


Unnamed: 0,Bezirksteil,Latitude,Longitude
0,Graggenau,48.139563,11.580182
1,Angerviertel,48.13367,11.571569
2,Hackenviertel,48.135731,11.569955
3,Kreuzviertel,48.139698,11.573209
4,Lehel,48.139656,11.587921


In the following section of code we will create a **map with our neighborhoods dataframe**. This map will show a high density of small neighborhoods in the center and a few much larger ones in the outskirts of the city, already hinting a variable structure and composition of the neighborhoods of Munich.

In [4]:
# create map of Munich using latitude and longitude values
locat = locator.geocode("Munich, Germany")
map_munich = folium.Map(location=[locat.latitude, locat.longitude], zoom_start=11.4)

# add markers to map
for lat, lng, bezirksteil in zip(df['Latitude'], df['Longitude'], df['Bezirksteil']):
    label = '{}'.format(bezirksteil)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=8,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_munich)  
    
map_munich

### Foursquare API

Next, we are going to start utilizing the **Foursquare API** to explore the neighborhoods and segment them.

In [5]:
CLIENT_ID = 'R3UOVGW5REWU20REXVUAJWB1BSAPHT5V34NQKEW15OLPUMZU' # your Foursquare ID
CLIENT_SECRET = 'Y3VIWHCRGDEESL2GB0DXZMX4C212EAHIDBHKFSQJR2JAWUSG' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: R3UOVGW5REWU20REXVUAJWB1BSAPHT5V34NQKEW15OLPUMZU
CLIENT_SECRET:Y3VIWHCRGDEESL2GB0DXZMX4C212EAHIDBHKFSQJR2JAWUSG


First, we are going to create a **function to explore sistematically all neighborhoods in Munich by the type and amount of venues** within a radius of 500m.

In [6]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Bezirksteil', 
                  'Bezirksteil Latitude', 
                  'Bezirksteil Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [7]:
munich_venues = getNearbyVenues(names=df['Bezirksteil'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )
print(munich_venues.shape)
print('There are {} uniques categories.'.format(len(munich_venues['Venue Category'].unique())))
munich_venues.head()

(2925, 7)
There are 279 uniques categories.


Unnamed: 0,Bezirksteil,Bezirksteil Latitude,Bezirksteil Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Graggenau,48.139563,11.580182,Bayerische Staatsoper,48.139639,11.578933,Opera House
1,Graggenau,48.139563,11.580182,Nationaltheater München,48.139599,11.579207,Opera House
2,Graggenau,48.139563,11.580182,Hotel Vier Jahreszeiten Kempinski,48.138918,11.581775,Hotel
3,Graggenau,48.139563,11.580182,Pusser's,48.1385,11.580426,Cocktail Bar
4,Graggenau,48.139563,11.580182,Louis Vuitton München Residenzpost,48.139248,11.577647,Boutique


The function has found a total of 2925 venues for the 106 neighborhoods, with 279 unique venue categories.

## Methodology <a name="methodology"></a>

In this project we will try to spot the neighborhoods in Munich that are most suitable for the construction of a new padel court. For that matter, we will first find neighborhoods with sports facilities and study their characteristics. Our new padel court should perform well in a similar environment.

In first step we have collected the required **data: location of every neighborhood in Munich**. We have also **identified all public venues in a 500m radius of their center** (according to Foursquare).

Second step in our analysis will be **detecting downtown neighborhoods**. These are deemed too expensive by the managers of the fund and their typical activity and structure are usually not compatible with the operation of a racquet sports court due to its size.

Thirdly, we will try to **find neighborhoods with sports facilities** that we can use as benchmarks for our analysis.

In the fourth and final step we will use **k-means clustering** to identify similar neighborhoods that should give a hint regarding potential successful location areas and which should be a starting point for final 'street level' exploration and search for optimal venue location by the managers.

## Analysis <a name="analysis"></a>

In order for us to be able to compare neighborhoods numerically, we need to **one hot encode** the venue categories:

In [8]:
# one hot encoding
munich_onehot = pd.get_dummies(munich_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
munich_onehot["Bezirksteil"] = munich_venues['Bezirksteil'] 

# move neighborhood column to the first column
fixed_columns = [munich_onehot.columns[-1]] + list(munich_onehot.columns[:-1])
munich_onehot = munich_onehot[fixed_columns]

print(munich_onehot.shape)
munich_onehot.head()

(2925, 280)


Unnamed: 0,Bezirksteil,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Auto Dealership,Auto Garage,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Basketball Court,Bathing Area,Bavarian Restaurant,Beach,Beach Bar,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Shop,Bistro,Board Shop,Boarding House,Bookstore,Bosnian Restaurant,Botanical Garden,Boutique,Bowling Alley,Breakfast Spot,Brewery,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Butcher,Cafeteria,Café,Camera Store,Campground,Candy Store,Casino,Castle,Caucasian Restaurant,Cemetery,Cheese Shop,Chinese Restaurant,Church,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Costume Shop,Cultural Center,Cupcake Shop,Currywurst Joint,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Doctor's Office,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Event Space,Fair,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Fish Market,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Forest,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grilled Meat Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Hill,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hostel,Hot Spring,Hotel,Hotel Bar,Hotel Pool,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Kebab Restaurant,Korean Restaurant,Lake,Laundromat,Laundry Service,Lebanese Restaurant,Light Rail Station,Liquor Store,Lottery Retailer,Lounge,Manti Place,Market,Martial Arts School,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Modern Greek Restaurant,Monument / Landmark,Motel,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music School,Music Store,Music Venue,Neighborhood,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Outdoor Sculpture,Palace,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Café,Pet Store,Pharmacy,Pizza Place,Planetarium,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Post Office,Pub,Public Art,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Rest Area,Restaurant,River,Roof Deck,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Snack Place,Soccer Field,Soup Place,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Street Art,Strip Club,Supermarket,Surf Spot,Sushi Restaurant,Taco Place,Tapas Restaurant,Taverna,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Tiki Bar,Toy / Game Store,Track,Trail,Train Station,Tram Station,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Graggenau,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Graggenau,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Graggenau,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Graggenau,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Graggenau,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


We can then calculate the **frequency of each venue category** in each neighborhood with a GroupBy:

In [9]:
munich_grouped = munich_onehot.groupby('Bezirksteil').mean().reset_index()
print(munich_grouped.shape)
munich_grouped.head()

(106, 280)


Unnamed: 0,Bezirksteil,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Auto Dealership,Auto Garage,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Basketball Court,Bathing Area,Bavarian Restaurant,Beach,Beach Bar,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Shop,Bistro,Board Shop,Boarding House,Bookstore,Bosnian Restaurant,Botanical Garden,Boutique,Bowling Alley,Breakfast Spot,Brewery,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Butcher,Cafeteria,Café,Camera Store,Campground,Candy Store,Casino,Castle,Caucasian Restaurant,Cemetery,Cheese Shop,Chinese Restaurant,Church,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Costume Shop,Cultural Center,Cupcake Shop,Currywurst Joint,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Doctor's Office,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Event Space,Fair,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Fish Market,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Forest,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grilled Meat Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Hill,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hostel,Hot Spring,Hotel,Hotel Bar,Hotel Pool,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Kebab Restaurant,Korean Restaurant,Lake,Laundromat,Laundry Service,Lebanese Restaurant,Light Rail Station,Liquor Store,Lottery Retailer,Lounge,Manti Place,Market,Martial Arts School,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Modern Greek Restaurant,Monument / Landmark,Motel,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music School,Music Store,Music Venue,Neighborhood,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Outdoor Sculpture,Palace,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Café,Pet Store,Pharmacy,Pizza Place,Planetarium,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Post Office,Pub,Public Art,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Rest Area,Restaurant,River,Roof Deck,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Snack Place,Soccer Field,Soup Place,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Street Art,Strip Club,Supermarket,Surf Spot,Sushi Restaurant,Taco Place,Tapas Restaurant,Taverna,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Tiki Bar,Toy / Game Store,Track,Trail,Train Station,Tram Station,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Alt Moosach,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.190476,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Altaubing,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Altbogenhausen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.166667,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0
3,Alte Heide - Hirschau,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Alte Kaserne,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


One important feature of a neighborhood is the **number of venues** found in Foursquare. Downtown areas will for instance present a high number of venues in comparison to neighborhoods in the suburbs:

In [10]:
munich_venues_unique = munich_venues.groupby('Bezirksteil').nunique()
munich_venues_unique.head()

Unnamed: 0_level_0,Bezirksteil,Bezirksteil Latitude,Bezirksteil Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Bezirksteil,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Alt Moosach,1,1,1,21,21,21,18
Altaubing,1,1,1,1,1,1,1
Altbogenhausen,1,1,1,12,12,12,9
Alte Heide - Hirschau,1,1,1,16,16,16,15
Alte Kaserne,1,1,1,20,20,20,20


We will use the frequency dataframe to produce a new dataframe where a **ranking of the ten most most common venues** in each neighborhood is visible. This will help us to understand easily what kind of neighborhoods we are dealing with:

In [11]:
def return_most_common_venues(row, num_top_venues):
    if num_top_venues > 10:
        num_top_venues = 10
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Bezirksteil']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Bezirksteil'] = munich_grouped['Bezirksteil']

for ind in np.arange(munich_grouped.shape[0]):
    num_venues = munich_venues_unique.loc[neighbourhoods_venues_sorted.iloc[ind, 0],"Venue Category"]
    neighbourhoods_venues_sorted.iloc[ind, 1:num_venues+1] = return_most_common_venues(munich_grouped.iloc[ind, :], num_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Bezirksteil,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alt Moosach,Bakery,Italian Restaurant,Asian Restaurant,Fast Food Restaurant,Burrito Place,Food,Light Rail Station,Drugstore,Supermarket,German Restaurant
1,Altaubing,Design Studio,,,,,,,,,
2,Altbogenhausen,Bakery,Italian Restaurant,Plaza,Indian Restaurant,Restaurant,Japanese Restaurant,Vietnamese Restaurant,Café,Sporting Goods Shop,
3,Alte Heide - Hirschau,Soccer Field,Pastry Shop,Athletics & Sports,Bus Stop,Café,Stadium,Supermarket,Gastropub,Italian Restaurant,Track
4,Alte Kaserne,Austrian Restaurant,Asian Restaurant,Post Office,Italian Restaurant,German Restaurant,Bar,Drugstore,Bakery,Greek Restaurant,Indian Restaurant


### Clustering in search of the city center

Downtown neighborhoods usually have **similar characteristics**, for instance large numbers of cafés and restaurants. We could expect that a **k-means clustering algorithm** would possibly find such a pattern. If this happens, we can eliminate these neighborhoods from our pool of candidates right away. We will give it a try with eight clusters:

In [12]:
# set number of clusters
kclusters = 6

munich_grouped_clustering = munich_grouped.drop('Bezirksteil', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(munich_grouped_clustering)

# add clustering labels
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

munich_merged = df

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
munich_merged = munich_merged.join(neighbourhoods_venues_sorted.set_index('Bezirksteil'), on='Bezirksteil')

munich_merged.head()

Unnamed: 0,Bezirksteil,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Graggenau,48.139563,11.580182,2,Bavarian Restaurant,Café,Hotel,Plaza,Boutique,Italian Restaurant,German Restaurant,Clothing Store,Coffee Shop,Restaurant
1,Angerviertel,48.13367,11.571569,2,Café,Plaza,Coffee Shop,Hotel,German Restaurant,Cocktail Bar,Ice Cream Shop,Bavarian Restaurant,Italian Restaurant,Tea Room
2,Hackenviertel,48.135731,11.569955,2,Café,German Restaurant,Clothing Store,Italian Restaurant,Plaza,Church,Boutique,Cocktail Bar,Hotel,Bavarian Restaurant
3,Kreuzviertel,48.139698,11.573209,2,Café,Plaza,Restaurant,Bavarian Restaurant,Clothing Store,German Restaurant,Boutique,Department Store,Hotel,Coffee Shop
4,Lehel,48.139656,11.587921,2,Italian Restaurant,German Restaurant,Bar,Plaza,Hotel,Snack Place,Surf Spot,Sushi Restaurant,Burger Joint,Museum


After clustering the neighboorhoods we can draw a **map** with each cluster in a different color:

In [13]:
# create map
map_clusters = folium.Map(location=[locat.latitude, locat.longitude], zoom_start=11.4)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(munich_merged['Latitude'], munich_merged['Longitude'], munich_merged['Bezirksteil'], munich_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

We can see in the map that **all center neighborhoods belong indeed to the same cluster**. These neighborhoods are mainly full of restaurants and cafés:

In [14]:
munich_merged.loc[munich_merged['Cluster Labels'] == 2, munich_merged.columns[[0] + list(range(4, munich_merged.shape[1]))]].head(10)

Unnamed: 0,Bezirksteil,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Graggenau,Bavarian Restaurant,Café,Hotel,Plaza,Boutique,Italian Restaurant,German Restaurant,Clothing Store,Coffee Shop,Restaurant
1,Angerviertel,Café,Plaza,Coffee Shop,Hotel,German Restaurant,Cocktail Bar,Ice Cream Shop,Bavarian Restaurant,Italian Restaurant,Tea Room
2,Hackenviertel,Café,German Restaurant,Clothing Store,Italian Restaurant,Plaza,Church,Boutique,Cocktail Bar,Hotel,Bavarian Restaurant
3,Kreuzviertel,Café,Plaza,Restaurant,Bavarian Restaurant,Clothing Store,German Restaurant,Boutique,Department Store,Hotel,Coffee Shop
4,Lehel,Italian Restaurant,German Restaurant,Bar,Plaza,Hotel,Snack Place,Surf Spot,Sushi Restaurant,Burger Joint,Museum
5,Englischer Garten Süd,Bus Stop,Café,IT Services,Tram Station,Playground,Bank,Cafeteria,Hotel Pool,Hotel Bar,Hotel
6,Gärtnerplatz,Café,Coffee Shop,Bavarian Restaurant,Italian Restaurant,Pizza Place,Hotel,German Restaurant,Cocktail Bar,Plaza,Ice Cream Shop
7,Deutsches Museum,Café,Hotel,Coffee Shop,Cocktail Bar,Bar,Burger Joint,Indian Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,German Restaurant
8,Glockenbach,Café,Bar,Vietnamese Restaurant,Italian Restaurant,Cocktail Bar,Gay Bar,Asian Restaurant,Coffee Shop,Afghan Restaurant,Indie Movie Theater
9,Dreimühlen,Bar,Café,Italian Restaurant,Plaza,Pub,Supermarket,Cocktail Bar,Seafood Restaurant,Beer Bar,Beach


We can therefore **discard these neighborhoods** and evaluate the venue categories belonging to the remaining ones:

In [15]:
munich_venues_merged = pd.merge(munich_venues,munich_merged[['Bezirksteil','Cluster Labels']])

munich_venues_merged = munich_venues_merged[munich_venues_merged['Cluster Labels']!=2]

print('There are {} uniques categories.'.format(len(munich_venues_merged['Venue Category'].unique())))
munich_venues_merged['Venue Category'].unique()

There are 71 uniques categories.


array(['Bakery', 'Bus Stop', 'Supermarket', 'Asian Restaurant',
       'Rental Car Location', 'Plaza', 'German Restaurant',
       'Trattoria/Osteria', 'Food', 'Drugstore', 'Hotel', 'Metro Station',
       'Tram Station', 'Light Rail Station', 'Burrito Place',
       'Ice Cream Shop', 'Italian Restaurant', 'Fast Food Restaurant',
       'Gym', 'Park', 'Big Box Store', 'American Restaurant', 'Motel',
       'Sandwich Place', 'Gastropub', 'Hardware Store', 'Gas Station',
       'Museum', 'River', 'Turkish Restaurant', 'Restaurant', 'Wine Shop',
       'BBQ Joint', 'Greek Restaurant', 'Pharmacy', 'Indian Restaurant',
       'Vietnamese Restaurant', 'Japanese Restaurant', 'Café',
       'Sporting Goods Shop', 'Bavarian Restaurant', 'Lounge',
       'Liquor Store', 'Office', 'IT Services', 'Gym / Fitness Center',
       'Hookah Bar', 'Shipping Store', 'Burger Joint', 'Pizza Place',
       'Taverna', 'Tennis Court', 'Ramen Restaurant', 'Pool',
       'Farmers Market', 'Market', 'Doner Restau

We see that one the venue categories in the remaining neighborhoods is **tennis court**. This is extremely convenient, since our padel court will have a similar type of users. We study the neighborhoods with tennis courts in the following: 

In [23]:
munich_venues_merged[munich_venues_merged["Venue Category"]=="Tennis Court"]

Unnamed: 0,Bezirksteil,Bezirksteil Latitude,Bezirksteil Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Cluster Labels
2629,Harlaching,48.088293,11.565078,Tennis Kail Harlaching,48.088041,11.569694,Tennis Court,1
2823,Obermenzing,48.16257,11.469372,Tennis-Club Blutenburg e.V.,48.162969,11.467291,Tennis Court,1
2881,Lerchenau West,48.196093,11.550703,MSC Hockeyclub,48.197683,11.556074,Tennis Court,1


Since all neighborhoods are in the same cluster, we can deduce that there could be a similar set of venues in them. We see for instance that **in the proximity of tennis courts, users can find bus stops and small drink and food stores like bakeries and cafés**. This feature could be essential to attract new users to our padel courts:

In [41]:
tennisbezirkelist = munich_venues_merged[munich_venues_merged["Venue Category"]=="Tennis Court"]["Bezirksteil"].values.tolist()
for bezirksteil in tennisbezirkelist:
    dfi = munich_venues_merged[munich_venues_merged["Bezirksteil"]==bezirksteil]
    if bezirksteil == tennisbezirkelist[0]:
        dft = dfi
    else:
        dft = dft.append(dfi)
dft

Unnamed: 0,Bezirksteil,Bezirksteil Latitude,Bezirksteil Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Cluster Labels
2628,Harlaching,48.088293,11.565078,Der Brotladen,48.091002,11.563352,Bakery,1
2629,Harlaching,48.088293,11.565078,Tennis Kail Harlaching,48.088041,11.569694,Tennis Court,1
2630,Harlaching,48.088293,11.565078,Authariplatz,48.091771,11.562718,Plaza,1
2631,Harlaching,48.088293,11.565078,Café Hölzl,48.091418,11.561277,Café,1
2819,Obermenzing,48.16257,11.469372,Limoncello Ristorante,48.164885,11.4683,Italian Restaurant,1
2820,Obermenzing,48.16257,11.469372,Rossmann,48.164531,11.470029,Drugstore,1
2821,Obermenzing,48.16257,11.469372,EDEKA,48.164489,11.471864,Supermarket,1
2822,Obermenzing,48.16257,11.469372,Aumüller,48.164656,11.465166,Bakery,1
2823,Obermenzing,48.16257,11.469372,Tennis-Club Blutenburg e.V.,48.162969,11.467291,Tennis Court,1
2824,Obermenzing,48.16257,11.469372,Bäcker Ihle,48.164402,11.470856,Bakery,1


Just to draw a comparison, we can see that a neighborhood with a different sports venue (a soccer field) got assigned to a different cluster. Its neighborhood is much more focused on transports, a gym and a restaurant:

In [17]:
munich_venues_merged[munich_venues_merged["Bezirksteil"]==munich_venues_merged[munich_venues_merged["Venue Category"]=="Soccer Field"]["Bezirksteil"].values.tolist()[0]]

Unnamed: 0,Bezirksteil,Bezirksteil Latitude,Bezirksteil Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Cluster Labels
2830,Aubing-Süd,48.152617,11.410323,"Restaurant Apollon in ""Die Lüfte""",48.15396,11.413955,Greek Restaurant,3
2831,Aubing-Süd,48.152617,11.410323,H Neuaubing West,48.152557,11.41021,Bus Stop,3
2832,Aubing-Süd,48.152617,11.410323,Soccerpark,48.155255,11.412073,Soccer Field,3
2833,Aubing-Süd,48.152617,11.410323,S Aubing,48.15607,11.41227,Light Rail Station,3
2834,Aubing-Süd,48.152617,11.410323,clever fit München Aubing,48.155043,11.41451,Gym / Fitness Center,3
2835,Aubing-Süd,48.152617,11.410323,H Riesenburgstraße,48.148762,11.41375,Bus Stop,3


We will keep **exploring the tennis court cluster** and further divide it to obtain the most similar neighborhoods:

In [50]:
munich_grouped_2 = pd.merge(munich_grouped,munich_merged[['Bezirksteil','Cluster Labels']])
munich_grouped_2 = munich_grouped_2[(munich_grouped_2['Cluster Labels']==1)]

# set number of clusters
kclusters = 10

munich_grouped_clustering_2 = munich_grouped_2.drop('Bezirksteil', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(munich_grouped_clustering_2)

# add clustering labels
neighbourhoods_venues_sorted_2 = pd.merge(neighbourhoods_venues_sorted,munich_merged[['Bezirksteil','Cluster Labels']])
neighbourhoods_venues_sorted_2 = neighbourhoods_venues_sorted_2[(neighbourhoods_venues_sorted_2['Cluster Labels']==1)]

del neighbourhoods_venues_sorted_2["Cluster Labels"]
neighbourhoods_venues_sorted_2.insert(0, 'Cluster Labels', kmeans.labels_)

munich_merged_2 = munich_merged[(munich_merged['Cluster Labels']==1)]
del munich_merged_2["Cluster Labels"]

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
munich_merged_2 = pd.merge(neighbourhoods_venues_sorted_2[['Bezirksteil','Cluster Labels']],munich_merged_2)
munich_merged_2

Unnamed: 0,Bezirksteil,Cluster Labels,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alt Moosach,1,48.181071,11.516728,Bakery,Italian Restaurant,Asian Restaurant,Fast Food Restaurant,Burrito Place,Food,Light Rail Station,Drugstore,Supermarket,German Restaurant
1,Altbogenhausen,8,48.145785,11.605498,Bakery,Italian Restaurant,Plaza,Indian Restaurant,Restaurant,Japanese Restaurant,Vietnamese Restaurant,Café,Sporting Goods Shop,
2,Am Waldfriedhof,4,48.113498,11.510758,Bus Stop,Bakery,Plaza,Rental Car Location,Supermarket,Asian Restaurant,,,,
3,Balanstraße-West,1,48.104872,11.605344,Drugstore,Bakery,Supermarket,Lounge,IT Services,Asian Restaurant,Café,Office,Bus Stop,Museum
4,Englschalking,0,48.157923,11.64263,Bakery,Bus Stop,Drugstore,Pharmacy,Supermarket,Park,Italian Restaurant,Greek Restaurant,Light Rail Station,
5,Feldmoching,0,48.213804,11.541275,Ice Cream Shop,Supermarket,Bakery,Bus Stop,Greek Restaurant,Farmers Market,Taverna,Bus Line,Lottery Retailer,Plaza
6,Forstenried,3,48.084835,11.494477,Bakery,Greek Restaurant,BBQ Joint,Ramen Restaurant,,,,,,
7,Fürstenried-West,0,48.088366,11.48083,Bakery,Bus Stop,Plaza,Market,Supermarket,Drugstore,Farmers Market,Pool,Metro Station,Pharmacy
8,Gartenstadt Trudering,2,48.113139,11.657732,Bakery,Bus Stop,,,,,,,,
9,Giesing,1,48.11113,11.596084,Bakery,Park,Gym / Fitness Center,Shipping Store,Burger Joint,Drugstore,Supermarket,Greek Restaurant,Ice Cream Shop,Hookah Bar


The tennis court neighborhood of *Obermenzig* is the one that presents the highest number of similar neighborhoods, three. In the map they appear in red: 

In [54]:
# create map
map_clusters = folium.Map(location=[locat.latitude, locat.longitude], zoom_start=11.4)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(munich_merged_2['Latitude'], munich_merged_2['Longitude'], munich_merged_2['Bezirksteil'], munich_merged_2['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    if cluster == 0 or cluster == 5 or cluster == 7:
        varradius=15
    else:
        varradius=5
    folium.CircleMarker(
        [lat, lon],
        radius=varradius,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

If we study Obermenzig´s cluster, we see that all neighborhoods have **a bakery, a bus stop and a supermarket**:

In [55]:
munich_merged_2[munich_merged_2["Cluster Labels"]==0]

Unnamed: 0,Bezirksteil,Cluster Labels,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Englschalking,0,48.157923,11.64263,Bakery,Bus Stop,Drugstore,Pharmacy,Supermarket,Park,Italian Restaurant,Greek Restaurant,Light Rail Station,
5,Feldmoching,0,48.213804,11.541275,Ice Cream Shop,Supermarket,Bakery,Bus Stop,Greek Restaurant,Farmers Market,Taverna,Bus Line,Lottery Retailer,Plaza
7,Fürstenried-West,0,48.088366,11.48083,Bakery,Bus Stop,Plaza,Market,Supermarket,Drugstore,Farmers Market,Pool,Metro Station,Pharmacy
16,Obermenzing,0,48.16257,11.469372,Bakery,Bus Stop,Drugstore,Discount Store,Italian Restaurant,Supermarket,Tennis Court,Hostel,,


## Results and Discussion <a name="results"></a>

Our analysis shows that there is a large number of neighborhoods (14) that present similarities to those that already have tennis courts. They are neighborhoods out of the city centre with good transport connection and food stores. Most of them do not even reach 10 venues in Foursquare, so the venue density is low. 

In order to make the evaluation of candidates easier for the investment fund, the clustering was refined and as a result we found a pool of three candidates (Englschalking, Feldmoching and Fürstenried-West) with a high similarity to Obermenzig, the tennis court neighborhood with the most venue categories. This could make the location more attractive for our users.  

We recommend starting the evaluation of potential locations in these three neighborhoods, as they seem the most promising to host a padel court. This, of course, does not imply that those zones are actually optimal locations. The purpose of this analysis was to only provide info on areas out of the city center but attractive enough for potential users (due to transport, shops) - it is entirely possible that there is a very good reason that the suggested neighborhoods have no tennis court yet, reasons which would make them unsuitable for a tennis or padel court regardless of lack of competition in the area. Recommended zones should therefore be considered only as a starting point for more detailed analysis which could eventually lead to evaluating other factors and conditions or exploring other neighborhoods.

## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify Munich areas out of the city center with similar conditions to already existing neighborhoods with equivalent sports facilities for optimal location for a new padel court. By collecting public venue data from Foursquare and using it in a k-means clustering algorithm we have first identified a common pattern for downtown neighborhoods which needed to be eliminated from the candidate pool due to project requirements. The same clustering of neighborhoods showed that all tennis courts (similar sport to padel) were located in neighborhoods with alike characteristics. In order to reduce the pool of candidates to be evaluated by the managers, the clustering was refined and three possible locations were suggested: Englschalking, Feldmoching and Fürstenried-West. They are all neighborhoods in the outskirts of the city, well connected by bus and with bakeries and supermarkets where padel court users can casually shop before or after playing.

The final decision on optimal location will be made by the managers after a further evaluation of the recommended areas, taking into consideration additional factors like real estate availability, prices, social and economic dynamics of every neighborhood, attractiveness of the area, etc.