# Coursera Capstone - Battle of the Neighborhoods
---
## Clustering neighborhoods in Frankfurt am Main
---

### 1. Introduction : Business Problem

A client is interested in opening a franchise of their Asian restaurant chain in the city of Frankfurt am Main, preferably close to the city center. It will be their first restaurant in the city, and they want us to find out which would be the best neighborhood/district to open an Asian restaurant in the city.
Additionally, the results of the clustering algorithm in this project can also be used by someone interested in moving to Frankfurt and wanting to know about the cuisines available in the various districts.

### 2. Data:

Following datasets have been used in this project:

1. Street Directory of the city of Frankfurt am Main: https://offenedaten.frankfurt.de/dataset/strassenverzeichnis-der-stadt-frankfurt-am-main
2. Foursquare API to get the most common venues in Frankfurt districts.
3. Demographics of Frankfurt am Main Neighborhoods : https://offenedaten.frankfurt.de/dataset/stadtteilprofile-bevoelkerung
4. Election Atlas 2015 - GeoJSON Frankfurt neighborhoods: https://offenedaten.frankfurt.de/dataset/wahlatlas-2015-geodaten/resource/84dff094-ab75-431f-8c64-39606672f1da

### 2.1 Data Gathering and Analysis:

The districts of the city of Frankfurt am Main will be analyzed in this project.

#### Data 1 : Street directory of Frankfurt am Main:
This dataset will be used to extract the district names and postcodes in Frankfurt. It is available as a CSV file and can be accessed via the link given above. Frankfurt contains 46 city districts.

#### Data 2 : 
The geographical coordinates of the districts will be utilized as input for Foursquare API, that will be leveraged to extract information for each district respectively. We will use the Foursquare API to explore the districts in Frankfurt.

#### Data 3 : Frankfurt Demographics:
This dataset contains district-wise distribution of population for the city of Frankfurt. It also contains useful data about the percentage of foreigners, and specifically population of various ethnicities in the districts.

#### Data 4: Frankfurt neighborhoods GeoJSON:
This dataset contains the geoJSON file for the Frankfurt districts. This is used to plot the Choropleth map of Frankfurt am Main.


### 3. Methodology: 

#### Analytical approach: 
In this project, we shall first use k-means clustering to cluster the neighborhoods in Frankfurt. Frankfurt has 46 districts. We shall use the geocoder to get the geographical coordinates for each of these districts. We will use Foursquare API to explore the districts using their coordinates, and get the most common venues in each district. Based on this information, we shall cluster the districts using k-means and take a look at each cluster. We need to look at clusters with a greater number of Asian and similar cuisine restaurants, as that indicates that there is demand for Asian cuisine in that cluster.

Then we shall use the demographics data to find the districts with a greater population and compare that with the cluster data. We shall find districts which have more Asian restaurants as well as a sizeable Asian population, as these will be ideal for opening a new Asian restaurant. 
Additionally, we shall also look at closeby districts with lesser Asian restaurants but a sizeable Asian population, as this is also a good prospect, due to less competition in the area.

#### Exploratory Data Analysis:

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
from bs4 import BeautifulSoup
import os
print('Libraries imported.')


Libraries imported.


#### Importing the Frankfurt districts dataset:

In [2]:
#importing the dataset for Frankfurt districts
df_frankfurt_streets = pd.read_csv('zprojekteopen-datadatenamt-12strassenverzeichnis2019strassenverzeichnis2019.csv',sep=';')
df_frankfurt_streets.head()

Unnamed: 0,Straßennummer,Straßenname,Folge,Hausnr. (von),Zusatz zur Hausnr (von),Hausnr. (bis),Zusatz zur Hausnr (bis),Ortsbezirk,Stadtbezirksvorst,Stadtbezirk,Polizeirevier,Sozialrathaus Name,Schiedsleutebezirk,Stadtteil Name,"Postleitzahl,"
0,3083,Abflugring*,,,,,,5,5.34,328,19,Sachsenhausen,5A,Flughafen,60549
1,3083,Abflugring*,,,,,,5,5.34,329,19,Sachsenhausen,5A,Flughafen,60549
2,1,Abtsgäßchen,,,,,,5,5.31,300,8,Sachsenhausen,5A,Sachsenhausen-N,60594
3,2,Achenbachstraße,,,,,,5,5.33,322,8,Sachsenhausen,5A,Sachsenhausen-N,60596
4,3,Ackermannstraße,,,,,,1,1.05,154,16,Gallus,1,Gallus,60326


In [3]:
df_frankfurt_streets.shape

(4540, 15)

#### Dropping unnecessary columns:

In [162]:
df_frankfurt_areas=df_frankfurt_streets.drop(df_frankfurt_streets.columns[0:11],axis=1) #dropping unnecessary columns
df_frankfurt_areas= df_frankfurt_areas.drop(columns='Schiedsleutebezirk',axis=1)
df_frankfurt_areas.head(10)

Unnamed: 0,Sozialrathaus Name,Stadtteil Name,"Postleitzahl,"
0,Sachsenhausen,Flughafen,60549
1,Sachsenhausen,Flughafen,60549
2,Sachsenhausen,Sachsenhausen-N,60594
3,Sachsenhausen,Sachsenhausen-N,60596
4,Gallus,Gallus,60326
5,Nord,Ginnheim,60431
6,Dornbusch,Dornbusch,60431
7,Bockenheim,Bockenheim,60486
8,Bockenheim,Bockenheim,60486
9,Nord,Nieder-Eschbach,60437


#### Dropping duplicates so that the rows contain only unique District names

In [14]:
df_frankfurt_areas.drop_duplicates(subset ="Stadtteil Name", inplace=True) #dropping duplicates
df_frankfurt_areas.head()

Unnamed: 0,Sozialrathaus Name,Stadtteil Name,"Postleitzahl,"
0,Sachsenhausen,Flughafen,60549
2,Sachsenhausen,Sachsenhausen-N,60594
4,Gallus,Gallus,60326
5,Nord,Ginnheim,60431
6,Dornbusch,Dornbusch,60431


In [17]:
df_frankfurt_areas = df_frankfurt_areas.sort_values(by='Stadtteil Name').reset_index(drop=True) #sort by districts alphabetically
df_frankfurt_areas.head()

Unnamed: 0,Sozialrathaus Name,Stadtteil Name,"Postleitzahl,"
0,Ost,Altstadt,60311
1,Gallus,Bahnhofsviertel,60329
2,Ost,Bergen-Enkheim,60388
3,Nord,Berkersheim,60435
4,Bockenheim,Bockenheim,60486


In [18]:
df_frankfurt_areas.shape

(46, 3)

#### Getting the geographical coordinates for Frankfurt am Main

In [20]:
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode("Frankfurt am Main")
location

Location(Frankfurt am Main, Hessen, Deutschland, (50.1106444, 8.6820917, 0.0))

#### Getting latitude and longitude for every district

In [21]:
frankfurt_area_data = pd.DataFrame(columns=['District', 'Latitude', 'Longitude'])

# loop over all entries of old data frame and store according values
items = []

for idx, district in enumerate(df_frankfurt_areas['Stadtteil Name']):
    code = df_frankfurt_areas['Postleitzahl,'][idx]
    address = district + ', ' + "Frankfurt" # to get format of address
    geolocator = Nominatim(user_agent="ny_explorer")
    
    count = 0
    location = None
    
    while(location is None and count < 15):
        location = geolocator.geocode(address)
        count += 1
    
    print(location)
    
    latitude = location.latitude
    longitude = location.longitude
    items.append({'District': district, 
                  'Postal Code': code,
                  'Latitude': latitude,
                  'Longitude': longitude})

Altstadt, Innenstadt 1, Frankfurt am Main, Hessen, 60311, Deutschland
Bahnhofsviertel, Innenstadt 1, Frankfurt am Main, Hessen, 60329, Deutschland
Bergen-Enkheim, Frankfurt am Main, Hessen, 60388, Deutschland
Berkersheim, Nord-Ost, Frankfurt am Main, Hessen, Deutschland
Bockenheim, Innenstadt 2, Frankfurt am Main, Hessen, Deutschland
Bonames, Nord-Ost, Frankfurt am Main, Hessen, Deutschland
Bornheim, Bornheim/Ostend, Frankfurt am Main, Hessen, 60385, Deutschland
Dornbusch, Mitte-Nord, Frankfurt am Main, Hessen, 60320, Deutschland
Eckenheim, Nord-Ost, Frankfurt am Main, Hessen, Deutschland
Eschersheim, Mitte-Nord, Frankfurt am Main, Hessen, 60433, Deutschland
Fechenheim, Ost, Frankfurt am Main, Hessen, 60386, Deutschland
Flughafen, Süd, Frankfurt am Main, Hessen, 60549, Deutschland
Frankfurter Berg, Nord-Ost, Frankfurt am Main, Hessen, Deutschland
Gallus, Innenstadt 1, Frankfurt am Main, Hessen, Deutschland
Ginnheim, Mitte-Nord, Frankfurt am Main, Hessen, 60431, Deutschland
Griesheim, W

#### Adding the latitude and longitude values to the dataframe:

In [22]:
frankfurt_area_data=frankfurt_area_data.append(items)
frankfurt_area_data.head()

Unnamed: 0,District,Latitude,Longitude,Postal Code
0,Altstadt,50.110442,8.682901,60311
1,Bahnhofsviertel,50.108411,8.668151,60329
2,Bergen-Enkheim,50.158015,8.762039,60388
3,Berkersheim,50.173289,8.697312,60435
4,Bockenheim,50.123311,8.646056,60486


In [163]:
frankfurt_area_data['Postal Code'] = frankfurt_area_data['Postal Code'].str.replace(',','') #removing the comma in the postal code column
frankfurt_area_data.head(10)

Unnamed: 0,District,Latitude,Longitude,Postal Code
0,Altstadt,50.110442,8.682901,60311
1,Bahnhofsviertel,50.108411,8.668151,60329
2,Bergen-Enkheim,50.158015,8.762039,60388
3,Berkersheim,50.173289,8.697312,60435
4,Bockenheim,50.123311,8.646056,60486
5,Bonames,50.181347,8.663331,60437
6,Bornheim,50.129731,8.710612,60385
7,Dornbusch,50.139046,8.675271,60431
8,Eckenheim,50.15171,8.679746,60435
9,Eschersheim,50.158203,8.656212,60439


#### Visualizing the districts on a map using Folium

In [24]:
map_frankfurt = folium.Map(location=[frankfurt_area_data["Latitude"].iloc[0], frankfurt_area_data["Longitude"].iloc[0]], zoom_start=11)

# add markers to map
for lat, lng, district in zip(frankfurt_area_data['Latitude'], frankfurt_area_data['Longitude'], frankfurt_area_data['District']):
    label = '{}'.format(district)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_frankfurt)  
    
map_frankfurt


#### Loading the Foursquare API credentials

In [25]:
CLIENT_ID = 'JF2UHJL3VTZQVRVSXU4IIPJY1UZ1DFEN4F0UBE0MQBTEGTOK' # your Foursquare ID
CLIENT_SECRET = 'MSRCVIJ2YQJKEIEDWT5ZDJTAKGJRR4TR0SK3AYZNEM1S4TWW' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: JF2UHJL3VTZQVRVSXU4IIPJY1UZ1DFEN4F0UBE0MQBTEGTOK
CLIENT_SECRET:MSRCVIJ2YQJKEIEDWT5ZDJTAKGJRR4TR0SK3AYZNEM1S4TWW


#### Exploring the district of Altstadt

In [26]:
frankfurt_area_data.loc[0,'District']

'Altstadt'

In [27]:
neighborhood_latitude = frankfurt_area_data.loc[0, 'Latitude'] # neighborhoods latitude value
neighborhood_longitude = frankfurt_area_data.loc[0, 'Longitude'] # neighborhoods longitude value

neighborhood_name = frankfurt_area_data.loc[0, 'District'] # neighborhood name


#### Now, we shall take a look at the top 100 venues that are in the Altstadt district within a radius of 500 meters.

In [28]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

 # create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=JF2UHJL3VTZQVRVSXU4IIPJY1UZ1DFEN4F0UBE0MQBTEGTOK&client_secret=MSRCVIJ2YQJKEIEDWT5ZDJTAKGJRR4TR0SK3AYZNEM1S4TWW&v=20180605&ll=50.11044205,8.682901089428581&radius=500&limit=100'

In [29]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f2a83b67455323d64518372'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Altstadt',
  'headerFullLocation': 'Altstadt, Frankfurt am Main',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 95,
  'suggestedBounds': {'ne': {'lat': 50.114942054500005,
    'lng': 8.689904884523033},
   'sw': {'lat': 50.1059420455, 'lng': 8.67589729433413}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4c836113d6ebbfb7c04551a4',
       'name': 'Römerberg',
       'location': {'address': 'Römerberg',
        'lat': 50.110488574510946,
        'lng': 8.682130927626094,
        'labeledLatLngs': [{'label': 'display',
          'lat': 50.110488574510

In [34]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#### Checking the nearby venues in the district of Altstadt

In [35]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
 
nearby_venues.head(10)

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Römerberg,Plaza,50.110489,8.682131
1,Hoppenworth & Ploch,Café,50.110891,8.683701
2,Dom Aussichtsplattform,Scenic Lookout,50.110609,8.684908
3,Carhartt,Boutique,50.111929,8.681853
4,Weinterasse Rollanderhof,Wine Bar,50.112473,8.682164
5,Bitter & Zart Chocolaterie,Chocolate Shop,50.111444,8.683904
6,Main,River,50.10839,8.682631
7,SCHIRN Kunsthalle,Art Museum,50.110291,8.683542
8,Kleinmarkthalle,Market,50.112778,8.682958
9,Jamy's Burger,Burger Joint,50.111226,8.681699


#### Let's create a function to repeat the same process to all the districts around Frankfurt am Main

In [36]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [37]:
frankfurt_venues = getNearbyVenues(names=frankfurt_area_data['District'],
                                   latitudes=frankfurt_area_data['Latitude'],
                                   longitudes=frankfurt_area_data['Longitude']
                                  )

Altstadt
Bahnhofsviertel
Bergen-Enkheim
Berkersheim
Bockenheim
Bonames
Bornheim
Dornbusch
Eckenheim
Eschersheim
Fechenheim
Flughafen
Frankfurter Berg
Gallus
Ginnheim
Griesheim
Gutleutviertel
Harheim
Hausen
Heddernheim
Höchst
Innenstadt
Kalbach-Riedberg
Nied
Nieder-Erlenbach
Nieder-Eschbach
Niederrad
Niederursel
Nordend-Ost
Nordend-West
Oberrad
Ostend
Praunheim
Preungesheim
Riederwald
Rödelheim
Sachsenhausen-N
Sachsenhausen-S
Schwanheim
Seckbach
Sindlingen
Sossenheim
Unterliederbach
Westend-Nord
Westend-Süd
Zeilsheim


In [38]:
print(frankfurt_venues.shape)
frankfurt_venues.head()

(886, 7)


Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Altstadt,50.110442,8.682901,Römerberg,50.110489,8.682131,Plaza
1,Altstadt,50.110442,8.682901,Hoppenworth & Ploch,50.110891,8.683701,Café
2,Altstadt,50.110442,8.682901,Dom Aussichtsplattform,50.110609,8.684908,Scenic Lookout
3,Altstadt,50.110442,8.682901,Carhartt,50.111929,8.681853,Boutique
4,Altstadt,50.110442,8.682901,Weinterasse Rollanderhof,50.112473,8.682164,Wine Bar


#### Checking how many venues were returned for each district

In [39]:
frankfurt_venues.groupby('District').count()

Unnamed: 0_level_0,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Altstadt,95,95,95,95,95,95
Bahnhofsviertel,100,100,100,100,100,100
Bergen-Enkheim,2,2,2,2,2,2
Berkersheim,4,4,4,4,4,4
Bockenheim,59,59,59,59,59,59
Bonames,9,9,9,9,9,9
Bornheim,41,41,41,41,41,41
Dornbusch,18,18,18,18,18,18
Eckenheim,8,8,8,8,8,8
Eschersheim,11,11,11,11,11,11


In [40]:
print('There are {} uniques categories.'.format(len(frankfurt_venues['Venue Category'].unique())))

There are 188 uniques categories.


### Analyzing each District

In [52]:
#one-hot encoding
frankfurt_onehot = pd.get_dummies(frankfurt_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
frankfurt_onehot['District'] = frankfurt_venues['District']

# move neighborhood column to the first column
frankfurt_onehot = frankfurt_onehot[['District'] + [col for col in frankfurt_onehot.columns if col != 'District']]


frankfurt_onehot.head()

Unnamed: 0,District,African Restaurant,Airport Lounge,Airport Service,American Restaurant,Apple Wine Pub,Argentinian Restaurant,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Automotive Shop,BBQ Joint,Bakery,Bank,Bar,Beach Bar,Beer Bar,Beer Garden,Beer Store,Bistro,Board Shop,Boarding House,Bookstore,Boutique,Breakfast Spot,Brewery,Building,Burger Joint,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Castle,Chinese Restaurant,Chocolate Shop,Church,Cigkofte Place,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Residence Hall,Comedy Club,Concert Hall,Convenience Store,Cosmetics Shop,Currywurst Joint,Czech Restaurant,Dance Studio,Department Store,Dessert Shop,Diner,Discount Store,Dog Run,Doner Restaurant,Drugstore,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Friterie,Garden,Gas Station,Gastropub,General Entertainment,German Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Hawaiian Restaurant,Health Food Store,Historic Site,History Museum,Hookah Bar,Hostel,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Insurance Office,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Malga,Market,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Museum,Music Store,Nightclub,Opera House,Organic Grocery,Other Nightlife,Outdoor Sculpture,Outdoor Supply Store,Outlet Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pharmacy,Photography Studio,Pie Shop,Pizza Place,Platform,Plaza,Pool,Pool Hall,Portuguese Restaurant,Pub,Racetrack,Radio Station,Ramen Restaurant,Restaurant,River,Road,Roof Deck,Salad Place,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Repair,Shopping Mall,Shopping Plaza,Snack Place,Soccer Field,Soup Place,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tapas Restaurant,Taverna,Tea Room,Thai Restaurant,Theater,Trail,Train Station,Tram Station,Transportation Service,Trattoria/Osteria,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Waterfront,Whisky Bar,Wine Bar,Yoga Studio,Zoo
0,Altstadt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Altstadt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Altstadt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Altstadt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Altstadt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0


Now we group the venues by district and take the mean of the frequency of occurence of each category

In [53]:
frankfurt_grouped = frankfurt_onehot.groupby('District').mean().reset_index()
frankfurt_grouped

Unnamed: 0,District,African Restaurant,Airport Lounge,Airport Service,American Restaurant,Apple Wine Pub,Argentinian Restaurant,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Automotive Shop,BBQ Joint,Bakery,Bank,Bar,Beach Bar,Beer Bar,Beer Garden,Beer Store,Bistro,Board Shop,Boarding House,Bookstore,Boutique,Breakfast Spot,Brewery,Building,Burger Joint,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Castle,Chinese Restaurant,Chocolate Shop,Church,Cigkofte Place,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Residence Hall,Comedy Club,Concert Hall,Convenience Store,Cosmetics Shop,Currywurst Joint,Czech Restaurant,Dance Studio,Department Store,Dessert Shop,Diner,Discount Store,Dog Run,Doner Restaurant,Drugstore,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Friterie,Garden,Gas Station,Gastropub,General Entertainment,German Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Hawaiian Restaurant,Health Food Store,Historic Site,History Museum,Hookah Bar,Hostel,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Insurance Office,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Malga,Market,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Museum,Music Store,Nightclub,Opera House,Organic Grocery,Other Nightlife,Outdoor Sculpture,Outdoor Supply Store,Outlet Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pharmacy,Photography Studio,Pie Shop,Pizza Place,Platform,Plaza,Pool,Pool Hall,Portuguese Restaurant,Pub,Racetrack,Radio Station,Ramen Restaurant,Restaurant,River,Road,Roof Deck,Salad Place,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Repair,Shopping Mall,Shopping Plaza,Snack Place,Soccer Field,Soup Place,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tapas Restaurant,Taverna,Tea Room,Thai Restaurant,Theater,Trail,Train Station,Tram Station,Transportation Service,Trattoria/Osteria,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Waterfront,Whisky Bar,Wine Bar,Yoga Studio,Zoo
0,Altstadt,0.0,0.0,0.0,0.0,0.0,0.0,0.042105,0.0,0.0,0.0,0.010526,0.0,0.0,0.010526,0.0,0.021053,0.010526,0.010526,0.0,0.010526,0.021053,0.010526,0.0,0.0,0.021053,0.010526,0.0,0.0,0.042105,0.0,0.0,0.010526,0.0,0.105263,0.0,0.0,0.010526,0.010526,0.010526,0.0,0.0,0.010526,0.010526,0.031579,0.0,0.0,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010526,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.010526,0.010526,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042105,0.010526,0.0,0.0,0.0,0.031579,0.0,0.0,0.0,0.0,0.010526,0.010526,0.0,0.0,0.010526,0.0,0.0,0.010526,0.0,0.010526,0.0,0.0,0.0,0.021053,0.010526,0.0,0.0,0.0,0.0,0.010526,0.0,0.0,0.010526,0.0,0.010526,0.0,0.010526,0.0,0.0,0.0,0.010526,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021053,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.010526,0.052632,0.010526,0.0,0.0,0.0,0.0,0.021053,0.010526,0.0,0.0,0.010526,0.0,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.010526,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010526,0.010526,0.010526,0.010526,0.0,0.010526,0.0,0.0
1,Bahnhofsviertel,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.09,0.01,0.0,0.01,0.06,0.0,0.0,0.01,0.01,0.02,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.03,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.04,0.01,0.01,0.0,0.0,0.0
2,Bergen-Enkheim,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Berkersheim,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bockenheim,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.050847,0.016949,0.0,0.0,0.0,0.016949,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.016949,0.0,0.084746,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.033898,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.016949,0.016949,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.016949,0.0,0.0,0.033898,0.0,0.0,0.0,0.0,0.0,0.084746,0.033898,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.016949,0.016949,0.0,0.0,0.0,0.0,0.033898,0.0,0.0,0.0,0.0,0.016949,0.016949,0.0,0.016949,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033898,0.0,0.0,0.0,0.0,0.050847,0.0,0.016949,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.033898,0.0,0.016949,0.0,0.0,0.033898,0.0,0.0
5,Bonames,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Bornheim,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04878,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.073171,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04878,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.02439,0.0,0.097561,0.0,0.0,0.0,0.0,0.02439,0.0,0.04878,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04878,0.0,0.0,0.0,0.02439,0.0,0.073171,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.02439,0.02439,0.02439,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.073171,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.04878,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.02439,0.0,0.0,0.02439,0.0,0.0
7,Dornbusch,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.055556,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Eckenheim,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Eschersheim,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [54]:
frankfurt_grouped.shape

(46, 189)

#### Printing the top 5 venues for each district:

In [55]:
num_top_venues = 5

for dist in frankfurt_grouped['District']:
    print("----"+dist+"----")
    temp = frankfurt_grouped[frankfurt_grouped['District'] == dist].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Altstadt----
          venue  freq
0          Café  0.11
1    Restaurant  0.05
2         Plaza  0.05
3    Art Museum  0.04
4  Burger Joint  0.04


----Bahnhofsviertel----
                   venue  freq
0                  Hotel  0.09
1      Indian Restaurant  0.06
2                    Bar  0.06
3                   Café  0.05
4  Vietnamese Restaurant  0.04


----Bergen-Enkheim----
         venue  freq
0        Plaza   0.5
1     Bus Stop   0.5
2         Park   0.0
3  Music Store   0.0
4    Nightclub   0.0


----Berkersheim----
                venue  freq
0                Farm  0.25
1      Shipping Store  0.25
2   German Restaurant  0.25
3  Light Rail Station  0.25
4  African Restaurant  0.00


----Bockenheim----
                venue  freq
0  Italian Restaurant  0.08
1                Café  0.08
2    Asian Restaurant  0.05
3         Supermarket  0.05
4  Spanish Restaurant  0.03


----Bonames----
                venue  freq
0       Metro Station  0.22
1                Café  0.22
2      

In [56]:
def return_most_common_venues(row, num_top_venues):  #function to sort venues in descending order
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### Creating a pandas dataframe containing the top 10 venues for each District

In [76]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
frankfurt_venues_sorted = pd.DataFrame(columns=columns)
frankfurt_venues_sorted['District'] = frankfurt_grouped['District']

for ind in np.arange(frankfurt_grouped.shape[0]):
    frankfurt_venues_sorted.iloc[ind, 1:] = return_most_common_venues(frankfurt_grouped.iloc[ind, :], num_top_venues)

frankfurt_venues_sorted.head()

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Altstadt,Café,Plaza,Restaurant,German Restaurant,Burger Joint,Art Museum,Gym / Fitness Center,Coffee Shop,Pastry Shop,Bar
1,Bahnhofsviertel,Hotel,Bar,Indian Restaurant,Café,Vietnamese Restaurant,Thai Restaurant,Chinese Restaurant,Seafood Restaurant,Theater,Drugstore
2,Bergen-Enkheim,Plaza,Bus Stop,Drugstore,Farmers Market,Farm,Falafel Restaurant,Exhibit,Event Space,Ethiopian Restaurant,Electronics Store
3,Berkersheim,Light Rail Station,Farm,Shipping Store,German Restaurant,Zoo,Farmers Market,Falafel Restaurant,Exhibit,Event Space,Ethiopian Restaurant
4,Bockenheim,Italian Restaurant,Café,Asian Restaurant,Supermarket,Turkish Restaurant,Ice Cream Shop,Japanese Restaurant,Wine Bar,Drugstore,Spanish Restaurant


### Clustering the neighborhoods/districts

Now that we have an overview about the data and made some first explorations, it's time to cluster the neighborhoods in order to get an idea about the types of neighborhoods.


In [78]:
# set number of clusters
kclusters = 5

frankfurt_grouped_clustering = frankfurt_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(frankfurt_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 3, 4, 0, 0, 0, 2, 2, 2])

In [79]:
frankfurt_area_data.head()

Unnamed: 0,District,Latitude,Longitude,Postal Code
0,Altstadt,50.110442,8.682901,60311
1,Bahnhofsviertel,50.108411,8.668151,60329
2,Bergen-Enkheim,50.158015,8.762039,60388
3,Berkersheim,50.173289,8.697312,60435
4,Bockenheim,50.123311,8.646056,60486


We shall create a new dataframe that contains the clusters as well as the top 10 venues for that district

In [80]:
# add clustering labels
frankfurt_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

frankfurt_merged_clusters = frankfurt_area_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
frankfurt_merged_clusters = frankfurt_merged_clusters.join(frankfurt_venues_sorted.set_index('District'), on='District')

frankfurt_merged_clusters.head()

Unnamed: 0,District,Latitude,Longitude,Postal Code,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Altstadt,50.110442,8.682901,60311,0,Café,Plaza,Restaurant,German Restaurant,Burger Joint,Art Museum,Gym / Fitness Center,Coffee Shop,Pastry Shop,Bar
1,Bahnhofsviertel,50.108411,8.668151,60329,0,Hotel,Bar,Indian Restaurant,Café,Vietnamese Restaurant,Thai Restaurant,Chinese Restaurant,Seafood Restaurant,Theater,Drugstore
2,Bergen-Enkheim,50.158015,8.762039,60388,3,Plaza,Bus Stop,Drugstore,Farmers Market,Farm,Falafel Restaurant,Exhibit,Event Space,Ethiopian Restaurant,Electronics Store
3,Berkersheim,50.173289,8.697312,60435,4,Light Rail Station,Farm,Shipping Store,German Restaurant,Zoo,Farmers Market,Falafel Restaurant,Exhibit,Event Space,Ethiopian Restaurant
4,Bockenheim,50.123311,8.646056,60486,0,Italian Restaurant,Café,Asian Restaurant,Supermarket,Turkish Restaurant,Ice Cream Shop,Japanese Restaurant,Wine Bar,Drugstore,Spanish Restaurant


In [81]:
frankfurt_merged_clusters.shape

(46, 15)

### Visualizing the clusters on a map

In [164]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11,tiles='stamenterrain')

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.brg(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(frankfurt_merged_clusters['Latitude'], frankfurt_merged_clusters['Longitude'], frankfurt_merged_clusters['District'], frankfurt_merged_clusters['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examining the clusters

#### Cluster 0 (Light Green):

In [84]:
cluster0 = frankfurt_merged_clusters.loc[frankfurt_merged_clusters['Cluster Labels'] == 0, frankfurt_merged_clusters.columns[[2] + list(range(5, frankfurt_merged_clusters.shape[1]))]]
print(cluster0['1st Most Common Venue'].value_counts())
print(cluster0['2nd Most Common Venue'].value_counts())

Italian Restaurant    5
Café                  2
Hotel                 2
IT Services           1
Metro Station         1
German Restaurant     1
BBQ Joint             1
Park                  1
Name: 1st Most Common Venue, dtype: int64
Café                  5
Thai Restaurant       1
Bakery                1
Steakhouse            1
Indian Restaurant     1
Airport Lounge        1
Bar                   1
Convenience Store     1
Italian Restaurant    1
Plaza                 1
Name: 2nd Most Common Venue, dtype: int64


As we can see, this cluster contains Italian restaurants as its most common venue. But it also contains German, Indian and Thai restaurants. Hence we shall name this the Multicuisine cluster. Note that the districts in this cluster are close to the city centre. That will have an effect on the decision to the business problem.


#### Cluster 1 (Blue):

In [89]:
cluster1 = frankfurt_merged_clusters.loc[frankfurt_merged_clusters['Cluster Labels'] == 1, frankfurt_merged_clusters.columns[[2] + list(range(5, frankfurt_merged_clusters.shape[1]))]]
print(cluster1['1st Most Common Venue'].value_counts())
print(cluster1['2nd Most Common Venue'].value_counts())

German Restaurant    2
Food & Drink Shop    1
Tram Station         1
Name: 1st Most Common Venue, dtype: int64
Supermarket      1
Metro Station    1
Gym Pool         1
Pizza Place      1
Name: 2nd Most Common Venue, dtype: int64


This cluster contains 4 districts, all of which are on the outskirts of the city. There aren't many venues to be found in this cluster, although it does contain German restaurants as it's most common venue. We shall name this the outskirts cluster.

#### Cluster 2 (Purple):

In [90]:
cluster2 = frankfurt_merged_clusters.loc[frankfurt_merged_clusters['Cluster Labels'] == 2, frankfurt_merged_clusters.columns[[2] + list(range(5, frankfurt_merged_clusters.shape[1]))]]
print(cluster2['1st Most Common Venue'].value_counts())
print(cluster2['2nd Most Common Venue'].value_counts())

Hotel            6
Supermarket      4
Bakery           2
Metro Station    1
Soccer Field     1
Pizza Place      1
Grocery Store    1
Exhibit          1
Name: 1st Most Common Venue, dtype: int64
Bakery                       3
Metro Station                2
Hotel                        1
Park                         1
Cajun / Creole Restaurant    1
River                        1
German Restaurant            1
Light Rail Station           1
Italian Restaurant           1
Market                       1
Bistro                       1
Café                         1
Platform                     1
Automotive Shop              1
Name: 2nd Most Common Venue, dtype: int64


This cluster contains mostly hotels and supermarkets. Hence we shall name this the Hotels cluster. The abundance of hotels indicates that there will be more tourists in the areas in this cluster. Also, we can see that the districts in this cluster are not too far from the city center. Hence, this is also an important cluster for making our decision.

#### Cluster 3 (Red):

In [91]:
cluster3 = frankfurt_merged_clusters.loc[frankfurt_merged_clusters['Cluster Labels'] == 3, frankfurt_merged_clusters.columns[[2] + list(range(5, frankfurt_merged_clusters.shape[1]))]]
print(cluster3['1st Most Common Venue'].value_counts())
print(cluster3['2nd Most Common Venue'].value_counts())

Plaza    1
Name: 1st Most Common Venue, dtype: int64
Bus Stop    1
Name: 2nd Most Common Venue, dtype: int64


This cluster contains only 1 district, which does not have many popular venues as per Foursquare. Hence we shall name this the Bergen-Enkheim cluster.

#### Cluster 4 (Olive Green):

In [92]:
cluster4 = frankfurt_merged_clusters.loc[frankfurt_merged_clusters['Cluster Labels'] == 4, frankfurt_merged_clusters.columns[[2] + list(range(5, frankfurt_merged_clusters.shape[1]))]]
print(cluster4['1st Most Common Venue'].value_counts())
print(cluster4['2nd Most Common Venue'].value_counts())

German Restaurant     6
Plaza                 1
Pizza Place           1
Light Rail Station    1
Supermarket           1
Name: 1st Most Common Venue, dtype: int64
Bakery                2
German Restaurant     2
Italian Restaurant    2
Farm                  1
Thai Restaurant       1
Photography Studio    1
Plaza                 1
Name: 2nd Most Common Venue, dtype: int64


This cluster contains German restaurants as its most common venue. We shall name this the German restaurants cluster. Additionally, there are Italian and Thai restaurants as well in this cluster, but not many.

### Observation from the cluster examinations:

We observe that the purple and light green clusters contain the most districts and the most number of venues. While the light green cluster contains more restaurants, the purple cluster contains more hotels, which indicates tourists. We can see that a variety of cuisines are offered in the light green cluster, indicating that they cater to a variety of customers. Most of the districts are located close to the city center. These factors make this cluster the most eligible for opening a new Asian restaurant. But, the fact that there are many restaurants, could also be a negative factor as that means more competition - not a good thing for a new restaurant to go up against established eateries catering to customers for years.

The purple cluster, on the other hand, although it does not contain many restaurants, has a lot of hotels and is pretty close to the city center. Fewer restaurants means lesser competition, and more tourists, some of them Asian, means more prospective customers and if one finds a location not too far from the city centre, an Asian restaurant here could flourish.

To know which district specifically would be perfect for opening an Asian restaurant, we shall look at the district-wise demographics of Frankfurt am Main, and then explore districts from both the light green and purple clusters. 

### Data exploration - Frankfurt demographics

We shall import a dataset of Frankfurt demographics that dates back to 2012. This was the latest data available. 
We shall then visualize the data on a choropleth map to help us make a decision.

In [116]:
#importing the dataset containing Frankfurt demographics
df_frankfurt_demo = pd.read_excel('bevoelkerung .xls')
df_frankfurt_demo.shape

(46, 164)

This is a large dataset, containing 164 columns. We shall only pick up the columns we need from the dataset.

In [117]:
df_frankfurt_filtered = df_frankfurt_demo.iloc[:,[1,3,15,17,19,55,63,65,73,103]]
df_frankfurt_filtered.head()

Unnamed: 0,Stadtteil,Bevölkerungsstruktur Einwohnerinnen und Einwohner 2012,Bevölkerungsstruktur Ausländerinnen und Ausländer 2012,Bevölkerungsstruktur Ausländerinnen und Ausländer in % 2012,Bevölkerungsstruktur Durchschnittsalter 2012,Bevölkerungsstruktur Junge Erwachsene von 18 bis 29 Jahren 2012,Bevölkerungsstruktur Ausländerinnen und Ausländer von 18 bis 29 Jahren 2012,Bevölkerungsstruktur Einwohnerinnen und Einwohner von 30 bis 64 Jahren 2012,Bevölkerungsstruktur Ausländerinnen und Ausländer von 30 bis 64 Jahren 2012,Bevölkerungsstruktur Ausländerinnen und Ausländer aus Asien und Australien 2012
0,Altstadt,3601,1254,34.82366,43.4,647,280,1981,761,272
1,Innenstadt,6334,2739,43.242817,41.6,1415,567,3433,1727,505
2,Bahnhofsviertel,3117,1630,52.293872,37.5,865,443,1804,960,329
3,Westend-Süd,17076,4053,23.735067,40.7,2597,783,9435,2539,698
4,Westend-Nord,9083,2312,25.454145,40.0,1604,496,4661,1344,467


The dataset is in German, as we can see. We shall now translate the column names to English.

In [118]:
translated_cols = ['District','Population structure Residents 2012','Population structure of foreigners in 2012',
                   'Population structure of foreigners in % 2012','Population structure average age 2012','Population structure Young adults aged 18 to 29 in 2012',
                   'Population structure Foreign nationals from 18 to 29 years 2012','Population structure Residents aged 30 to 64 in 2012',
                  'Population structure Foreigners aged 30 to 64 in 2012','Population structure Foreigners from Asia and Australia 2012']
df_frankfurt_filtered.columns = translated_cols

Checking the names of districts in the JSON file. The names from the dataset much match the names in the json file for plotting a choropleth map.

In [119]:
import json
communities_geo = r'ffmstadtteilewahlen.geojson'

# open the json file - json.load() methods returns a python dictionary
with open(communities_geo) as communities_file:
    communities_json = json.load(communities_file)
    
# we loop through the dictionary to obtain the name of the communities in the json file
denominations_json = []
for index in range(len(communities_json['features'])):
    denominations_json.append(communities_json['features'][index]['properties']['STTLNAME'])
    
#denominations_json

The districts of Bahnhofsviertel and Gutleutviertel have been combined into a single district in the geoJSON file. To match that, we shall combine the districts in our dataframe as well.

In [120]:
#combining the 2 rows of Bahnhofsviertel and Gutleutviertel to match the geoJSON file
ser = df_frankfurt_filtered.iloc[2]+df_frankfurt_filtered.iloc[9]  
#adding row to end of dataset
df_frankfurt_filtered = df_frankfurt_filtered.append(ser,ignore_index=True)

df_frankfurt_filtered.tail()

Unnamed: 0,District,Population structure Residents 2012,Population structure of foreigners in 2012,Population structure of foreigners in % 2012,Population structure average age 2012,Population structure Young adults aged 18 to 29 in 2012,Population structure Foreign nationals from 18 to 29 years 2012,Population structure Residents aged 30 to 64 in 2012,Population structure Foreigners aged 30 to 64 in 2012,Population structure Foreigners from Asia and Australia 2012
42,Nieder-Eschbach,11351,2125,18.720818,42.5,1716,400,5478,1265,465
43,Bergen-Enkheim,17563,2842,16.181746,44.3,2229,532,8942,1763,254
44,Frankfurter Berg,7627,1816,23.810148,39.0,1070,322,3958,1111,370
45,Frankfurt am Main,678691,176935,26.070038,41.3,109209,35121,350862,107255,24511
46,BahnhofsviertelGutleutviertel,9069,3866,89.861077,78.6,2070,885,5005,2272,777


Renaming the district to match the geoJSON district name, and correcting the 'Population structure of foreigners in % 2012' and 'Population structure average age 2012' columns:

In [121]:
df_frankfurt_filtered.iloc[46,0] = 'Gutleut-/Bahnhofsviertel'
df_frankfurt_filtered.iloc[46,3] = ((df_frankfurt_filtered.iloc[46,2]/df_frankfurt_filtered.iloc[46,1])*100)
df_frankfurt_filtered.iloc[46,4] = (df_frankfurt_filtered.iloc[2,4]+df_frankfurt_filtered.iloc[9,4])/2

df_frankfurt_filtered.tail()

Unnamed: 0,District,Population structure Residents 2012,Population structure of foreigners in 2012,Population structure of foreigners in % 2012,Population structure average age 2012,Population structure Young adults aged 18 to 29 in 2012,Population structure Foreign nationals from 18 to 29 years 2012,Population structure Residents aged 30 to 64 in 2012,Population structure Foreigners aged 30 to 64 in 2012,Population structure Foreigners from Asia and Australia 2012
42,Nieder-Eschbach,11351,2125,18.720818,42.5,1716,400,5478,1265,465
43,Bergen-Enkheim,17563,2842,16.181746,44.3,2229,532,8942,1763,254
44,Frankfurter Berg,7627,1816,23.810148,39.0,1070,322,3958,1111,370
45,Frankfurt am Main,678691,176935,26.070038,41.3,109209,35121,350862,107255,24511
46,Gutleut-/Bahnhofsviertel,9069,3866,42.628735,39.3,2070,885,5005,2272,777


In [122]:
#dropping the Frankfurt am Main total row
df_frankfurt_filtered = df_frankfurt_filtered.drop(45)

#renaming the 2 rows to match the previous dataset so that coordinates can be merged
df_frankfurt_filtered['District']= df_frankfurt_filtered['District'].replace({'Sachsenhausen-Nord':'Sachsenhausen-N',
                                                                              'Sachsenhausen-Süd':'Sachsenhausen-S'})

df_frankfurt_filtered = df_frankfurt_filtered.sort_values('District').reset_index(drop=True)


In [123]:
#merging with the dataset containing the latitude and longitude
df_frankfurt_filtered = df_frankfurt_filtered.join(frankfurt_area_data.set_index('District'), on='District')
#df_frankfurt_filtered

In [74]:
#renaming the districts to match the data in the geoJSON file
df_frankfurt_filtered['District']= df_frankfurt_filtered['District'].replace({'Höchst':'HÃ¶chst','Westend-Süd':'Westend-SÃ¼d',
                                                                             'Rödelheim':'RÃ¶delheim','Sachsenhausen-N':'Sachsenhausen-Nord',
                                                                              'Sachsenhausen-S':'Sachsenhausen-SÃ¼d'}) 
df_frankfurt_filtered.head()

Unnamed: 0,District,Population structure Residents 2012,Population structure of foreigners in 2012,Population structure of foreigners in % 2012,Population structure average age 2012,Population structure Young adults aged 18 to 29 in 2012,Population structure Foreign nationals from 18 to 29 years 2012,Population structure Residents aged 30 to 64 in 2012,Population structure Foreigners aged 30 to 64 in 2012,Population structure Foreigners from Asia and Australia 2012,Latitude,Longitude,Postal Code
0,Altstadt,3601,1254,34.82366,43.4,647,280,1981,761,272,50.110442,8.682901,60311
1,Bahnhofsviertel,3117,1630,52.293872,37.5,865,443,1804,960,329,50.108411,8.668151,60329
2,Bergen-Enkheim,17563,2842,16.181746,44.3,2229,532,8942,1763,254,50.158015,8.762039,60388
3,Berkersheim,3643,611,16.771891,38.9,463,101,1752,376,127,50.173289,8.697312,60435
4,Bockenheim,35789,10170,28.416553,38.9,7489,2546,19357,6048,1586,50.123311,8.646056,60486


### Choropleth maps :

#### 1. Total district-wise population:

In [75]:
communities_geo = r'ffmstadtteilewahlen.geojson' # geojson file

# create a plain map of Frankfurt am Main
map_frankfurt_pop = folium.Map(location=[frankfurt_area_data["Latitude"].iloc[0], frankfurt_area_data["Longitude"].iloc[0]], 
                           zoom_start=11,tiles='stamenterrain')

# generate choropleth map 
map_frankfurt_pop.choropleth(
    geo_data=communities_geo,
    data=df_frankfurt_filtered,
    columns=['District','Population structure Residents 2012'],
    key_on='feature.properties.STTLNAME',
    fill_color='RdPu', 
    fill_opacity=0.8, 
    line_opacity=0.8,
    legend_name='Total population - District',
    smooth_factor=0)

for lat, lon, poi in zip(frankfurt_area_data['Latitude'], frankfurt_area_data['Longitude'], frankfurt_area_data['District']):
        label = folium.Popup(str(poi), parse_html=True)
        folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        fill=True,
        fill_opacity=0.7).add_to(map_frankfurt_pop)
        
# display map
map_frankfurt_pop

#### 2. Population of foreigners:

In [265]:
communities_geo = r'ffmstadtteilewahlen.geojson' # geojson file

# create a plain map of Frankfurt am Main
map_frankfurt_pop = folium.Map(location=[frankfurt_area_data["Latitude"].iloc[0], frankfurt_area_data["Longitude"].iloc[0]], 
                           zoom_start=11,tiles='stamenterrain')

# generate choropleth map 
map_frankfurt_pop.choropleth(
    geo_data=communities_geo,
    data=df_frankfurt_filtered,
    columns=['District','Population structure of foreigners in 2012'],
    key_on='feature.properties.STTLNAME',
    fill_color='RdPu', 
    fill_opacity=0.8, 
    line_opacity=0.8,
    legend_name='Population of foreigners - 2012',
    smooth_factor=0)

for lat, lon, poi in zip(frankfurt_area_data['Latitude'], frankfurt_area_data['Longitude'], frankfurt_area_data['District']):
        label = folium.Popup(str(poi), parse_html=True)
        folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        fill=True,
        fill_opacity=0.7).add_to(map_frankfurt_pop)
        
# display map
map_frankfurt_pop

#### 3. Population of Asians and Australians:

In [217]:
# create a plain map of Frankfurt am Main
map_frankfurt_pop = folium.Map(location=[frankfurt_area_data["Latitude"].iloc[0], frankfurt_area_data["Longitude"].iloc[0]], 
                           zoom_start=11,tiles='stamenterrain')

# generate choropleth map 
map_frankfurt_pop.choropleth(
    geo_data=communities_geo,
    data=df_frankfurt_filtered,
    columns=['District','Population structure Foreigners from Asia and Australia 2012'],
    key_on='feature.properties.STTLNAME',
    fill_color='RdPu', 
    fill_opacity=0.8, 
    line_opacity=0.8,
    legend_name='Population of Asians and Australians',
    smooth_factor=0)

for lat, lon, poi in zip(frankfurt_area_data['Latitude'], frankfurt_area_data['Longitude'], frankfurt_area_data['District']):
        label = folium.Popup(str(poi), parse_html=True)
        folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        fill=True,
        fill_opacity=0.7).add_to(map_frankfurt_pop)

        
# display map
map_frankfurt_pop
    


We can see from the above maps, that the districts of Bockenheim and Gallus have the highest population of Asians and Australians. Out of these, Bockenheim comes under the light green cluster, and Gallus comes under the purple cluster. Let us explore these 2 neighborhoods and find out the number of Asian or similar cuisine restaurants in these districts.

In [145]:
print(frankfurt_area_data[frankfurt_area_data['District']=='Bockenheim'].index.values) #getting the index of the district

[4]


In [146]:
neighborhood_latitude = frankfurt_area_data.loc[4, 'Latitude'] # neighborhoods latitude value
neighborhood_longitude = frankfurt_area_data.loc[4, 'Longitude'] # neighborhoods longitude value

neighborhood_name = frankfurt_area_data.loc[4, 'District'] # neighborhood name
print("{}, Latitude: {}, Longitude: {}".format(neighborhood_name,neighborhood_latitude,neighborhood_longitude))

Bockenheim, Latitude: 50.1233115, Longitude: 8.6460563


In [154]:
index = df_frankfurt_filtered[df_frankfurt_filtered['District']=='Bockenheim'].index.values
pop = df_frankfurt_filtered.loc[index[0],'Population structure Foreigners from Asia and Australia 2012']
print("Asian Population of {} is {}".format(neighborhood_name,pop))

Asian Population of Bockenheim is 1586


#### Let's take a look at the top 50 venues in Bockenheim:

In [95]:
LIMIT = 50 # limit of number of venues returned by Foursquare API

radius = 1500 # define radius

 # create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

results_bock = requests.get(url).json()


In [96]:
venues_bock = results_bock['response']['groups'][0]['items']
    
nearby_venues_bock = json_normalize(venues_bock) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues_bock =nearby_venues_bock.loc[:, filtered_columns]

# filter the category for each row
nearby_venues_bock['venue.categories'] = nearby_venues_bock.apply(get_category_type, axis=1)

# clean columns
nearby_venues_bock.columns = [col.split(".")[-1] for col in nearby_venues.columns]
#print(nearby_venues_bock['categories'])
nearby_venues_bock.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Ponte,Portuguese Restaurant,50.122648,8.64672
1,Radio-X,Radio Station,50.121992,8.643615
2,da Cimino,Pizza Place,50.119496,8.645682
3,Hake Cafe Im Hinterhof,Arts & Crafts Store,50.121691,8.646722
4,Kochhaus,Gourmet Shop,50.12236,8.646052


In [97]:
cat_asian = ['Asian Restaurant','Korean Restaurant','Vietnamese Restaurant','Thai Restaurant','Chinese Restaurant',
             'Japanese Restaurant']
nearby_venues_bock_asian = nearby_venues_bock[nearby_venues_bock.categories.isin(cat_asian)]

nearby_venues_bock_asian

Unnamed: 0,name,categories,lat,lng
11,T.Style,Japanese Restaurant,50.11927,8.64498
12,Lhamo Bistro,Asian Restaurant,50.123901,8.641923
17,Ban Thai,Thai Restaurant,50.12144,8.648446
19,Mai Vien,Asian Restaurant,50.118236,8.644628
21,Hama Sushi,Japanese Restaurant,50.119954,8.648582
34,Mangetsu,Japanese Restaurant,50.115902,8.646909
43,Wayang Indonesische Spezialitäten,Asian Restaurant,50.123712,8.641863


In [98]:
print("There are {} Asian or similar restaurants in {}".format(nearby_venues_bock_asian.shape[0],neighborhood_name))

There are 7 Asian or similar restaurants in Bockenheim


We can see that there are 7 Asian or similar restaurants in Bockenheim. This indicates that as expected from the demographics, there is good demand for Asian food in this district. But so many restaurants also means more competition. 

We shall now explore the district of Gallus:

In [155]:
print(frankfurt_area_data[frankfurt_area_data['District']=='Gallus'].index.values)

[13]


In [156]:
neighborhood_latitude = frankfurt_area_data.loc[13, 'Latitude'] # neighborhoods latitude value
neighborhood_longitude = frankfurt_area_data.loc[13, 'Longitude'] # neighborhoods longitude value

neighborhood_name = frankfurt_area_data.loc[13, 'District'] # neighborhood name
print("{}, Latitude: {}, Longitude: {}".format(neighborhood_name,neighborhood_latitude,neighborhood_longitude))

Gallus, Latitude: 50.1038405, Longitude: 8.6431009


In [157]:
index = df_frankfurt_filtered[df_frankfurt_filtered['District']=='Gallus'].index.values
pop = df_frankfurt_filtered.loc[index[0],'Population structure Foreigners from Asia and Australia 2012']
print("Asian Population of {} is {}".format(neighborhood_name,pop))


Asian Population of Gallus is 1512


#### Let's take a look at the top 50 venues in Gallus

In [101]:
LIMIT = 50 # limit of number of venues returned by Foursquare API

radius = 1500 # define radius

 # create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

results_gall = requests.get(url).json()

In [102]:
venues_gall = results_gall['response']['groups'][0]['items']
    
nearby_venues_gall = json_normalize(venues_gall) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues_gall = nearby_venues_gall.loc[:, filtered_columns]

# filter the category for each row
nearby_venues_gall['venue.categories'] = nearby_venues_gall.apply(get_category_type, axis=1)

# clean columns
nearby_venues_gall.columns = [col.split(".")[-1] for col in nearby_venues_gall.columns]
#print(nearby_venues_bock['categories'])
nearby_venues_gall.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,DORMERO Hotel,Hotel,50.108106,8.644107
1,Capri by Fraser Frankfurt,Hotel,50.108896,8.64879
2,Melody Karaoke,Karaoke Bar,50.10106,8.634424
3,Engel Pizza & Pasta,Italian Restaurant,50.103868,8.636109
4,Premier Inn Frankfurt Messe,Hotel,50.108972,8.647594


In [103]:
cat_asian = ['Asian Restaurant','Korean Restaurant','Vietnamese Restaurant','Thai Restaurant','Chinese Restaurant',
             'Japanese Restaurant']
nearby_venues_gall_asian = nearby_venues_gall[nearby_venues_gall.categories.isin(cat_asian)]

nearby_venues_gall_asian

Unnamed: 0,name,categories,lat,lng
21,Konamon,Japanese Restaurant,50.103886,8.63654
35,Mr. Lee,Korean Restaurant,50.102042,8.660384
38,Coa,Asian Restaurant,50.10835,8.654117
47,Mangetsu,Japanese Restaurant,50.115902,8.646909
49,Mei Mei Chinapoint,Asian Restaurant,50.102821,8.641377


In [104]:
print("There are {} Asian or similar restaurants in {}".format(nearby_venues_gall_asian.shape[0],neighborhood_name))

There are 5 Asian or similar restaurants in Gallus


Gallus also has a good number of Asian or similar restaurants, and has a high demand for Asian cuisine as expected. But even here, since the number is high, it means more competition. But, being in the purple cluster, we know that there are more hotels in this area as well, so that is a plus point. Nevertheless we shall explore another district nearby which is in the purple cluster and also has a sizeable Asian population - Niederrad.

In [159]:
print(frankfurt_area_data[frankfurt_area_data['District']=='Niederrad'].index.values)

[26]


In [160]:
neighborhood_latitude = frankfurt_area_data.loc[26, 'Latitude'] # neighborhoods latitude value
neighborhood_longitude = frankfurt_area_data.loc[26, 'Longitude'] # neighborhoods longitude value

neighborhood_name = frankfurt_area_data.loc[26, 'District'] # neighborhood name
print("{}, Latitude: {}, Longitude: {}".format(neighborhood_name,neighborhood_latitude,neighborhood_longitude))

Niederrad, Latitude: 50.0882911, Longitude: 8.6427072


In [161]:
index = df_frankfurt_filtered[df_frankfurt_filtered['District']=='Niederrad'].index.values
pop = df_frankfurt_filtered.loc[index[0],'Population structure Foreigners from Asia and Australia 2012']
print("Asian Population of {} is {}".format(neighborhood_name,pop))


Asian Population of Niederrad is 929


#### Let's take a look at the top 50 venues in Griesheim

In [109]:
LIMIT = 50 # limit of number of venues returned by Foursquare API

radius = 1500 # define radius

 # create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

results_nied = requests.get(url).json()

In [110]:
venues_nied = results_nied['response']['groups'][0]['items']
    
nearby_venues_nied = json_normalize(venues_nied) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues_nied = nearby_venues_nied.loc[:, filtered_columns]

# filter the category for each row
nearby_venues_nied['venue.categories'] = nearby_venues_nied.apply(get_category_type, axis=1)

# clean columns
nearby_venues_nied.columns = [col.split(".")[-1] for col in nearby_venues_nied.columns]
#print(nearby_venues_bock['categories'])
nearby_venues_nied.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Alexis Sorbas,Greek Restaurant,50.082212,8.646586
1,Orange Beach,Beer Garden,50.089988,8.630624
2,Cono Cimino,Italian Restaurant,50.081595,8.638967
3,Licht- und Luftbad,Park,50.093798,8.646702
4,Ponton Lilu,Beer Garden,50.093792,8.646791


In [112]:
cat_asian = ['Asian Restaurant','Korean Restaurant','Vietnamese Restaurant','Thai Restaurant','Chinese Restaurant',
             'Japanese Restaurant']
nearby_venues_nied_asian = nearby_venues_nied[nearby_venues_nied.categories.isin(cat_asian)]

nearby_venues_nied_asian

Unnamed: 0,name,categories,lat,lng
36,Bambus Haus,Asian Restaurant,50.086535,8.642155


We can see that there is only 1 Asian restaurant in Niederrad. Let us check the number of hotels in the district.

In [114]:
cat_hotel = ['Hotel']
nearby_venues_nied_hotel = nearby_venues_nied[nearby_venues_nied.categories.isin(cat_hotel)]

nearby_venues_nied_hotel

Unnamed: 0,name,categories,lat,lng
32,Dorint Hotel Frankfurt Niederrad,Hotel,50.085524,8.632752
33,INNSIDE Frankfurt Niederrad,Hotel,50.080489,8.628484
35,Hotel NH Frankfurt Niederrad,Hotel,50.084805,8.629488


There are 3 hotels in Niederrad. 3 hotels and only 1 Asian restaurant in a district with a sizeable population of Asians. It is in close proximity to the city center as well. Therefore, it seems that Niederrad is also a good prospect for opening an Asian restaurant.

### Results and Discussion:

By clustering the districts in Frankfurt and subsequently analysing the district-wise demographics of the city, and then merging the two findings, we could arrive at 3 prospective neighborhoods that would be ideal for opening an Asian restaurant in the city.

#### 1. Bockenheim:
Bockenheim falls in the light green cluster and is very close to the city center. It has 7 Asian restaurants which shows that there is a lot of demand for Asian cuisine in the area. It also has the highest population of Asians in the city at 1586. But there is a lot of competition too.
Hence, it is a good option for opening an Asian restaurant provided that the client is ready to accept the competition and establish themselves.

#### 2. Gallus:
Gallus is in the purple cluster containing a greater number of hotels. It is not far from the city center and has 5 Asian restaurants indicating that there is demand here as well. It has the second highest population of Asians in the city at 1512. 
Five restaurants would be a considerable amount of competition but this district will also see a lot of tourists, thus indicating more customers.
Hence, this seems like a better option than Bockenheim for opening an Asian restaurant owing to lesser competition, similar Asian population and more prospective customers in the form of tourists.

#### 3. Niederrad:
Niederrad is also in the purple cluster having more hotels. It is also not far from the city center, but has only 1 Asian restaurant - much less than both Bockenheim and Gallus. Niederrad also has a sizeable Asian population at 929, although a bit less than the other 2 districts in contention. Since it is in the purple cluster, we can expect more tourists in this district. We see that there are 3 hotels in the area. This translates to more prospective customers. 
Hence, this also seems like a good alternative to Gallus owing to much lesser competition, proximity to the city center and more tourists.



### Conclusion:

The neighborhoods in Frankfurt am Main were clustered and displayed on a map containing the results. The demographics were studied and based on the findings, 3 districts were found to be ideal as a solution to the Business proble of opening an Asian restaurant. The client can choose any of the 3 neighborhoods to open an Asian restaurant, based on their preferences, confidence and affinity to risk-taking.


### References:

[1]. Street Directory of the city of Frankfurt am Main: https://offenedaten.frankfurt.de/dataset/strassenverzeichnis-der-stadt-frankfurt-am-main

[2]. Foursquare API: https://developer.foursquare.com

[3]. Demographics of Frankfurt am Main Neighborhoods : https://offenedaten.frankfurt.de/dataset/stadtteilprofile-bevoelkerung
