# Capstone Project - Hamburg Analysis Week Two
### Applied Data Science by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

In this Capstone project we will try to find a good location for a restaurant. Our stakeholder decided to start with their first restaurant of the international Mama’s Pizza Chain company in Hamburg. They want to get an overview of the characteristics of Hamburg’s boroughs in terms of going out to eat. Specifically, this analysis report will be targeted to stakeholders interested in opening an **Italian restaurant** in **Hamburg**, Germany.

Since there are lots of restaurants in Hamburg, we will try to detect **locations that are not already crowded with Italian restaurants**. We are also particularly interested in **areas where potential customers live with a high income**. We would also **prefer locations as close to city center as possible**, assuming that first two conditions are met.

We will use our data science methods to find promising boroughs based on these criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data <a name="data"></a>

Based on definition of our business problem, factors that will influence our decision are:
* characteristic of boroughs in Hamburg (population, density, size)
* number and type of existing restaurants in the boroughs 
* number of Italian restaurants in potential good places 
* income in boroughs and their neighborhood
* distance of neighborhood from city center

We decided to use the official coordinates for the different boroughs and neighborhoods (wikipedia, geojson soruces).

Following data sources will be needed to extract/generate the required information:
* information about boroughs will be obtained by **Wikipedia**
* number of restaurants and their type and location in every borough will be obtained using **Foursquare API**
* **geo data from github** will be used for visualization of the neighborhoods of Hamburg
* **income data** from publication of STATISTIKAMT NORD (**Statistic office North Germany**) will be used

In [112]:
# Install needed libaries 

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!pip install geopy
!pip install folium

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

import numpy as np

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library



### Borough Candidates

Let's create latitude & longitude coordinates for centroids of our candidate boroughs and get some additional insights about these boroughs.

In [49]:
# I extracted the coordinates for the boroughs manually from https://de.wikipedia.org - did not find another more structured source

cord_hh = pd.DataFrame([['Hamburg-Mitte', 53.550278, 9.994167], ['Altona', 53.55, 9.933333], ['Eimsbütttel', 53.575833, 9.951944], 
 ['Hamburg-Nord', 53.593611, 10],['Wandsbek',53.582033, 10.084261], ['Bergedorf',53.483333,10.216667], ['Harburg',53.459259, 9.982672]], columns=['Borough','Latitude','Longitude'])   
cord_hh 

Unnamed: 0,Borough,Latitude,Longitude
0,Hamburg-Mitte,53.550278,9.994167
1,Altona,53.55,9.933333
2,Eimsbütttel,53.575833,9.951944
3,Hamburg-Nord,53.593611,10.0
4,Wandsbek,53.582033,10.084261
5,Bergedorf,53.483333,10.216667
6,Harburg,53.459259,9.982672


In [50]:
# Additionally, I found some additional characteristics of these boroughs

url = 'https://en.wikipedia.org/wiki/Boroughs_and_quarters_of_Hamburg#Boroughs'
html = requests.get(url).content
df_list = pd.read_html(html)

type(df_list)
df_hh = df_list[1]
df_hh.head(10)

Unnamed: 0,Borough,Population,Area (km²),Density
0,Hamburg-Mitte,"233,114[5]",107.1 km²,2177
1,Altona,"243,972[5]",78.3 km²,3149
2,Eimsbüttel,"246,087[5]",50.1 km²,4915
3,Hamburg-Nord,"279,498[5]",57.8 km²,4838
4,Wandsbek,"409,771[5]",147.5 km²,2777
5,Bergedorf,"118,942[5]",154.8 km²,769
6,Harburg,"201,119[5]","125,4 km²",1253


In [51]:
# Merge df_hh and cord_hh to get boroughs and coodinates for the further steps
 
df_hh_base = pd.merge(df_hh, cord_hh, on='Borough')
print(df_hh_base.shape)
df_hh_base

(6, 6)


Unnamed: 0,Borough,Population,Area (km²),Density,Latitude,Longitude
0,Hamburg-Mitte,"233,114[5]",107.1 km²,2177,53.550278,9.994167
1,Altona,"243,972[5]",78.3 km²,3149,53.55,9.933333
2,Hamburg-Nord,"279,498[5]",57.8 km²,4838,53.593611,10.0
3,Wandsbek,"409,771[5]",147.5 km²,2777,53.582033,10.084261
4,Bergedorf,"118,942[5]",154.8 km²,769,53.483333,10.216667
5,Harburg,"201,119[5]","125,4 km²",1253,53.459259,9.982672


Let's visualize the data we have so far: locations of candidate boroughs:

In [52]:
# create map of  Hamburg using latitude and longitude values of boroughs

latitude =53.550556
longitude = 9.993333

map_hh = folium.Map(location=[latitude,longitude], zoom_start=12)


# add markers to map
for lat, lng, borough in zip(df_hh_base['Latitude'], df_hh_base['Longitude'], df_hh_base['Borough']):
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_hh)  
    
map_hh

### Restaurant data for  Borough Candidates (Foursquare API)

Let's use Foursquare API to get info on restaurants in each borough.

In [53]:
#Define Foursquare Credentials and Version
CLIENT_ID = '4UIOJQDCN5IWVX4VK4RWNSKTCYBO5LSJWJDPJAWHLECKGJK5' # your Foursquare ID
CLIENT_SECRET = '4RBRPJNGE5XGR1F2GYFWZ04BNZ2EDZU4MNQ44I33VC0FXKFS' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 4UIOJQDCN5IWVX4VK4RWNSKTCYBO5LSJWJDPJAWHLECKGJK5
CLIENT_SECRET:4RBRPJNGE5XGR1F2GYFWZ04BNZ2EDZU4MNQ44I33VC0FXKFS


In [54]:
# Explore boroughs in Hamburg

# Let's create a function to repeat the same process to all the boroughs in Hamburg
# We will use a radius of 2500 meters for each borough

LIMIT = 100 # limit of number of venues returned by Foursquare API

def getNearbyVenues(names, latitudes, longitudes, radius=2500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [55]:
# Now write the code to run the above function on each borough and create a new dataframe called hh_venues

hh_venues = getNearbyVenues(names= df_hh_base['Borough'],
                                   latitudes= df_hh_base['Latitude'],
                                   longitudes= df_hh_base['Longitude']
                                  )

Hamburg-Mitte
Altona
Hamburg-Nord
Wandsbek
Bergedorf
Harburg


In [56]:
# Let's check the size of the resulting dataframe

print(hh_venues.shape)
hh_venues.head()

(530, 7)


Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Hamburg-Mitte,53.550278,9.994167,estancia steaks,53.548581,9.995539,Steakhouse
1,Hamburg-Mitte,53.550278,9.994167,Rathausmarkt,53.550737,9.993503,Plaza
2,Hamburg-Mitte,53.550278,9.994167,Picasso,53.549934,9.995627,Spanish Restaurant
3,Hamburg-Mitte,53.550278,9.994167,Le Lion,53.550125,9.994436,Cocktail Bar
4,Hamburg-Mitte,53.550278,9.994167,Jungfernstieg,53.552862,9.993174,Plaza


In [57]:
hh_venues.groupby('Borough').count()

Unnamed: 0_level_0,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Altona,100,100,100,100,100,100
Bergedorf,56,56,56,56,56,56
Hamburg-Mitte,100,100,100,100,100,100
Hamburg-Nord,100,100,100,100,100,100
Harburg,74,74,74,74,74,74
Wandsbek,100,100,100,100,100,100


In [58]:
# Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(hh_venues['Venue Category'].unique())))

There are 144 uniques categories.


### Income data for the neigborhoods of Hamburg

We use public geo data from github and income date from publication of STATISTIKAMT NORD (Statistic office North Germany)

In [None]:
# Load geo data for Hamburg
hh_geo = requests.get('https://github.com/codeforamerica/click_that_hood/raw/master/public/data/hamburg.geojson').json()
hh_geo

In [60]:
# Load geo data for Hamburg
hh_geo_feat = hh_geo['features']
hh_list = pd.DataFrame.from_dict(json_normalize(hh_geo_feat), orient='columns')
hh_list['e'] = pd.Series(np.random.randn(103), index=hh_list.index)
hh_list.drop(['geometry.coordinates', 'geometry.type','properties.cartodb_id', 'properties.created_at' ,'type','properties.updated_at'], axis=1, inplace=True)
print(hh_list.dtypes)
hh_list.head()

properties.name     object
e                  float64
dtype: object


Unnamed: 0,properties.name,e
0,Spadenland,-0.118928
1,Rothenburgsort,-0.024198
2,Neuenfelde,-2.211668
3,Cranz,-0.39811
4,Finkenwerder,-0.771531


In [61]:
# Load and clean income data from publication of STATISTIKAMT NORD (Statistic office North Germany) 

url = 'https://www.shz.de/regionales/hamburg/mehr-steuerpflichtige-in-hamburg-durchschnittseinkommen-bei-39-054-euro-id18048351.html'
html = requests.get(url).content
df_list1 = pd.read_html(html)

income= df_list1[0]
income.columns = ['properties.name', '1', '2','3','4','income']
income.drop([0, 1], axis=0, inplace=True)
income.replace(regex=['\.'], value='' ,inplace=True)
income.astype({'income': 'str'}, inplace=True)
income['income'] = income['income'].str.replace(' ','')
income['income'] = pd.to_numeric(income['income'])
income.head(120)

Unnamed: 0,properties.name,1,2,3,4,income
2,Bezirk Hamburg-Mitte,134169,3493867,550324,26041,18808
3,Hamburg-Altstadt,1952,61168,11577,31336,10811
4,HafenCity,1255,116973,34051,93206,57913
5,Neustadt,7015,242164,46861,34521,24715
6,St Pauli,11066,309596,55589,27977,19399
7,St Georg,5683,250742,58371,44121,27161
8,Hammerbrook,1199,23342,3731,19468,10502
9,Borgfelde,3643,95508,15100,26217,21584
10,Hamm,21255,547216,83252,25745,21932
11,Horn,17772,385386,48253,21685,18299


In [62]:
result = pd.merge(hh_list, income, on='properties.name', how='left')

result.at[4, 'income'] = 31649
result.at[5, 'income'] = 31649
result.at[5, 'income'] = 31649
result.at[7, 'income'] = 34.544
result.at[101, 'income'] = 34544
result.at[100, 'income'] =13777
result.at[84, 'income'] =32295
result.at[28, 'income'] = 32295
result.at[15, 'income'] =  13777
result.at[16, 'income'] = 27977
result.at[35, 'income'] = 44121

result.drop('e', axis=1, inplace=True)
result.drop('1', axis=1, inplace=True)
result.drop('2', axis=1, inplace=True)
result.drop('3', axis=1, inplace=True)
result.drop('4', axis=1, inplace=True)
result.rename(columns = {'properties.name':'neigborhoods'}, inplace=True)

result.head(120)


Unnamed: 0,neigborhoods,income
0,Spadenland,27451.0
1,Rothenburgsort,15460.0
2,Neuenfelde,22909.0
3,Cranz,22852.0
4,Finkenwerder,31649.0
5,Waltershof,31649.0
6,Othmarschen,46527.0
7,Altenwerder,34.544
8,Altona-Altstadt,23000.0
9,Hausbruch,21355.0


In [63]:
map_hh.choropleth(
    geo_data=hh_geo,
    #data=hh_list,
    data=result,
    #columns=['properties.name','e'],
    columns=['neigborhoods','income'],
    key_on='feature.properties.name',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Rent Level in HH'
)

# display map
map_hh

## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting areas of Hamburg close to the center, that have only few Italian restaurants and where wealthy citizen live. 

We will limit our analysis to the level of boroughs for the Foursquare data analysis. Looking at income data we go down to the level of neighborhoods.

In first step we have collected the required data: location and type (category) of every restaurant within 2,5km from each borough in Hamburg. 

In the second step we will calculate for each borough the top seven venues and try to cluster boroughs with similar characteristics. On basis of this information we will select the cluster/boroughs most close to the city with few Italian restaurant.

In the third step in our analysis we will use a map to visualize the income on neighborhood level for the selected borough to identify an area, where potential wealthy customers live. Additionally, we will analyses the location of the different existing Italian restaurants trying to be not too close to these venues for the suggested neighborhood.


## Analysis <a name="analysis"></a>

Let's perform some basic explanatory data analysis and derive some additional info from our raw data out of Foursquare. 

In [96]:
# Analyze each Borough

# one hot encoding
hh_onehot = pd.get_dummies(hh_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
hh_onehot['Borough'] = hh_venues['Borough'] 

# move neighborhood column to the first column
fixed_columns = [hh_onehot.columns[-1]] + list(hh_onehot.columns[:-1])

hh_onehot = hh_onehot[fixed_columns]

print(hh_onehot.shape)
hh_onehot.head()

(530, 145)


Unnamed: 0,Borough,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Austrian Restaurant,BBQ Joint,Bakery,Bank,Bar,Bavarian Restaurant,Beach,Beach Bar,Beer Bar,Beer Store,Big Box Store,Bistro,Bookstore,Botanical Garden,Brazilian Restaurant,Bridge,Burger Joint,Bus Stop,Café,Canal,Candy Store,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,Comfort Food Restaurant,Concert Hall,Cooking School,Creperie,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Discount Store,Doner Restaurant,Donut Shop,Drugstore,Electronics Store,Event Space,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish Market,Food & Drink Shop,Fountain,French Restaurant,Fruit & Vegetable Store,Furniture / Home Store,Gas Station,Gastropub,German Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hardware Store,Hawaiian Restaurant,History Museum,Hookah Bar,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Insurance Office,Irish Pub,Italian Restaurant,Japanese Restaurant,Juice Bar,Kebab Restaurant,Lake,Light Rail Station,Market,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Movie Theater,Multiplex,Museum,Music Venue,Neighborhood,Nightclub,Opera House,Organic Grocery,Palace,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Planetarium,Playground,Plaza,Pool,Pool Hall,Portuguese Restaurant,Pub,Public Art,Restaurant,River,Rock Club,Salad Place,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Schnitzel Restaurant,Seafood Restaurant,Shopping Mall,Soup Place,Spa,Spanish Restaurant,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Swabian Restaurant,Tapas Restaurant,Taverna,Tennis Court,Thai Restaurant,Theater,Track,Train Station,Trattoria/Osteria,Turkish Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Volleyball Court,Water Park,Wine Bar
0,Hamburg-Mitte,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Hamburg-Mitte,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Hamburg-Mitte,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Hamburg-Mitte,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Hamburg-Mitte,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [97]:
# Next, let's group rows by Borough and by taking the mean of the frequency of occurrence of each category

hh_grouped = hh_onehot.groupby('Borough').mean().reset_index()
print(hh_grouped.shape)
hh_grouped

(6, 145)


Unnamed: 0,Borough,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Austrian Restaurant,BBQ Joint,Bakery,Bank,Bar,Bavarian Restaurant,Beach,Beach Bar,Beer Bar,Beer Store,Big Box Store,Bistro,Bookstore,Botanical Garden,Brazilian Restaurant,Bridge,Burger Joint,Bus Stop,Café,Canal,Candy Store,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,Comfort Food Restaurant,Concert Hall,Cooking School,Creperie,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Discount Store,Doner Restaurant,Donut Shop,Drugstore,Electronics Store,Event Space,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish Market,Food & Drink Shop,Fountain,French Restaurant,Fruit & Vegetable Store,Furniture / Home Store,Gas Station,Gastropub,German Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hardware Store,Hawaiian Restaurant,History Museum,Hookah Bar,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Insurance Office,Irish Pub,Italian Restaurant,Japanese Restaurant,Juice Bar,Kebab Restaurant,Lake,Light Rail Station,Market,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Movie Theater,Multiplex,Museum,Music Venue,Neighborhood,Nightclub,Opera House,Organic Grocery,Palace,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Planetarium,Playground,Plaza,Pool,Pool Hall,Portuguese Restaurant,Pub,Public Art,Restaurant,River,Rock Club,Salad Place,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Schnitzel Restaurant,Seafood Restaurant,Shopping Mall,Soup Place,Spa,Spanish Restaurant,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Swabian Restaurant,Tapas Restaurant,Taverna,Tennis Court,Thai Restaurant,Theater,Track,Train Station,Trattoria/Osteria,Turkish Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Volleyball Court,Water Park,Wine Bar
0,Altona,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.03,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.08,0.0,0.01,0.0,0.01,0.03,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.05,0.01,0.01,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.05,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.03,0.0,0.02,0.0,0.0,0.05,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.03,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.07,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01
1,Bergedorf,0.0,0.0,0.017857,0.0,0.0,0.017857,0.035714,0.017857,0.017857,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.035714,0.035714,0.0,0.0,0.0,0.0,0.035714,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.053571,0.017857,0.0,0.0,0.017857,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.053571,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.053571,0.0,0.0,0.0,0.017857,0.017857,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.017857,0.0,0.0,0.0,0.017857,0.178571,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Hamburg-Mitte,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.05,0.0,0.0,0.0,0.02,0.02,0.07,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.03,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.13,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.03,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.01,0.01,0.01,0.0,0.02,0.0,0.0,0.0
3,Hamburg-Nord,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.02,0.0,0.14,0.0,0.0,0.0,0.01,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.07,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.02,0.01,0.0,0.01,0.01,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.06,0.0,0.03,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.01
4,Harburg,0.013514,0.0,0.0,0.0,0.0,0.0,0.067568,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.013514,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.013514,0.0,0.013514,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.013514,0.0,0.0,0.040541,0.013514,0.0,0.0,0.013514,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.027027,0.027027,0.0,0.054054,0.0,0.013514,0.0,0.013514,0.0,0.0,0.013514,0.0,0.013514,0.0,0.027027,0.0,0.0,0.0,0.0,0.013514,0.054054,0.0,0.0,0.013514,0.013514,0.0,0.013514,0.013514,0.013514,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.013514,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.027027,0.162162,0.013514,0.013514,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0
5,Wandsbek,0.0,0.0,0.0,0.02,0.01,0.01,0.04,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.03,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.03,0.02,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.03,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.04,0.04,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.17,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0


In [98]:
# Let's print each Borough along with the top 5 most common venues

num_top_venues = 5

for hood in hh_grouped['Borough']:
    print("----"+hood+"----")
    temp = hh_grouped[hh_grouped['Borough'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Altona----
                venue  freq
0                Café  0.08
1  Seafood Restaurant  0.07
2                Park  0.05
3   German Restaurant  0.05
4      Ice Cream Shop  0.05


----Bergedorf----
            venue  freq
0     Supermarket  0.18
1       Drugstore  0.05
2           Hotel  0.05
3            Park  0.05
4  Clothing Store  0.04


----Hamburg-Mitte----
               venue  freq
0              Hotel  0.13
1        Coffee Shop  0.07
2               Café  0.05
3         Restaurant  0.03
4  French Restaurant  0.03


----Hamburg-Nord----
                venue  freq
0                Café  0.14
1                Park  0.07
2         Supermarket  0.06
3              Bakery  0.05
4  Italian Restaurant  0.04


----Harburg----
                venue  freq
0         Supermarket  0.16
1              Bakery  0.07
2            Bus Stop  0.05
3  Italian Restaurant  0.05
4   German Restaurant  0.05


----Wandsbek----
                venue  freq
0         Supermarket  0.17
1      Ice Crea

In [99]:
# Let's put that into a pandas dataframe
# First, let's write a function to sort the venues in descending order.

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]   

In [100]:
# Now let's create the new dataframe and display the top 7 venues for each Borough.

import numpy as np

num_top_venues = 7

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Borough'] = hh_grouped['Borough']

for ind in np.arange(hh_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(hh_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(18)

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,Altona,Café,Seafood Restaurant,Park,German Restaurant,Ice Cream Shop,Bakery,Cocktail Bar
1,Bergedorf,Supermarket,Hotel,Drugstore,Park,Hardware Store,Burger Joint,Big Box Store
2,Hamburg-Mitte,Hotel,Coffee Shop,Café,Steakhouse,French Restaurant,Restaurant,Plaza
3,Hamburg-Nord,Café,Park,Supermarket,Bakery,Ice Cream Shop,Italian Restaurant,Sushi Restaurant
4,Harburg,Supermarket,Bakery,German Restaurant,Bus Stop,Italian Restaurant,Drugstore,Restaurant
5,Wandsbek,Supermarket,Ice Cream Shop,Hotel,Bakery,Drugstore,Café,Greek Restaurant


Let's cluster the Boroughs to see Boroughs with same characteristic

In [101]:
# Cluster Boroughs

# import k-means from clustering stage
from sklearn.cluster import KMeans

# Run k-means to cluster the neighborhood into 3 clusters.

# set number of clusters
kclusters = 3

hh_grouped_clustering = hh_grouped.drop('Borough', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(hh_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:18] 

array([1, 0, 2, 1, 0, 0], dtype=int32)

In [102]:
#check data types
print(neighborhoods_venues_sorted.dtypes)
df_hh_base.dtypes

Borough                  object
1st Most Common Venue    object
2nd Most Common Venue    object
3rd Most Common Venue    object
4th Most Common Venue    object
5th Most Common Venue    object
6th Most Common Venue    object
7th Most Common Venue    object
dtype: object


Borough        object
Population     object
Area (km²)     object
Density         int64
Latitude      float64
Longitude     float64
dtype: object

In [103]:
# Let's create a new dataframe that includes the cluster as well as the top 7 venues for each Borough.

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

hh_merged = df_hh_base

# merge toronto_grouped with toronto_data to add latitude/longitude for each Borough
hh_merged = hh_merged.join(neighborhoods_venues_sorted.set_index('Borough'), on='Borough')

print(hh_merged.shape)
hh_merged.head() 

(6, 14)


Unnamed: 0,Borough,Population,Area (km²),Density,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,Hamburg-Mitte,"233,114[5]",107.1 km²,2177,53.550278,9.994167,2,Hotel,Coffee Shop,Café,Steakhouse,French Restaurant,Restaurant,Plaza
1,Altona,"243,972[5]",78.3 km²,3149,53.55,9.933333,1,Café,Seafood Restaurant,Park,German Restaurant,Ice Cream Shop,Bakery,Cocktail Bar
2,Hamburg-Nord,"279,498[5]",57.8 km²,4838,53.593611,10.0,1,Café,Park,Supermarket,Bakery,Ice Cream Shop,Italian Restaurant,Sushi Restaurant
3,Wandsbek,"409,771[5]",147.5 km²,2777,53.582033,10.084261,0,Supermarket,Ice Cream Shop,Hotel,Bakery,Drugstore,Café,Greek Restaurant
4,Bergedorf,"118,942[5]",154.8 km²,769,53.483333,10.216667,0,Supermarket,Hotel,Drugstore,Park,Hardware Store,Burger Joint,Big Box Store


In [104]:
# Examine Clusters 
# Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster.

hh_merged.rename(columns={"Cluster Labels": "ClusterLabels"}, inplace=True)

# Cluster 1 - a cluster with a lot of Coffee Shops and Cafes
print('Cluster 0')
hh_merged[hh_merged.ClusterLabels == 0]


Cluster 0


Unnamed: 0,Borough,Population,Area (km²),Density,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
3,Wandsbek,"409,771[5]",147.5 km²,2777,53.582033,10.084261,0,Supermarket,Ice Cream Shop,Hotel,Bakery,Drugstore,Café,Greek Restaurant
4,Bergedorf,"118,942[5]",154.8 km²,769,53.483333,10.216667,0,Supermarket,Hotel,Drugstore,Park,Hardware Store,Burger Joint,Big Box Store
5,Harburg,"201,119[5]","125,4 km²",1253,53.459259,9.982672,0,Supermarket,Bakery,German Restaurant,Bus Stop,Italian Restaurant,Drugstore,Restaurant


In [105]:
print('Cluster 1')
hh_merged[hh_merged.ClusterLabels == 1]


Cluster 1


Unnamed: 0,Borough,Population,Area (km²),Density,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
1,Altona,"243,972[5]",78.3 km²,3149,53.55,9.933333,1,Café,Seafood Restaurant,Park,German Restaurant,Ice Cream Shop,Bakery,Cocktail Bar
2,Hamburg-Nord,"279,498[5]",57.8 km²,4838,53.593611,10.0,1,Café,Park,Supermarket,Bakery,Ice Cream Shop,Italian Restaurant,Sushi Restaurant


In [106]:
print('Cluster 2')
hh_merged[hh_merged.ClusterLabels == 2]


Cluster 2


Unnamed: 0,Borough,Population,Area (km²),Density,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,Hamburg-Mitte,"233,114[5]",107.1 km²,2177,53.550278,9.994167,2,Hotel,Coffee Shop,Café,Steakhouse,French Restaurant,Restaurant,Plaza


In [107]:
hh_venues.groupby('Borough').count()

Unnamed: 0_level_0,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Altona,100,100,100,100,100,100
Bergedorf,56,56,56,56,56,56
Hamburg-Mitte,100,100,100,100,100,100
Hamburg-Nord,100,100,100,100,100,100
Harburg,74,74,74,74,74,74
Wandsbek,100,100,100,100,100,100


In [108]:
# create map

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(hh_merged['Latitude'], hh_merged['Longitude'], hh_merged['Borough'], hh_merged['ClusterLabels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

**Borough Cluster Analysis**

When you look at the clusters and boroughs data result above, you see that

* Cluster 0 (red dots on the map - Wansbek, Bergedorf & Harburg) is more a suburban area, where you have as top one place to go out for eating supermarket, followed by Hotels. We see also that these areas are quite far away from the city center. Additionally, the number of venues is for Bergedorf and Harburg below 100, which indicates that only few restaurants are in this boroughs. This makes sense when you see that there is also a low density of people living there.
* Cluster 1 (lilla dots on the map - Altona & Hamburg-Nord) are both closer to the center and have a high density of people living there. Top one place to go out are Cafes. These are urban places in Hamburg.
* Cluster 2 (light green dot on map - Hamburg-Mitte) is a borough, where we have as top place to eat Hotels, followed by Coffee Shops and Cafes. This place is the center of Hamburg with a high density of people living there.

**We go into further analysis for the Cluster 2, because this is the only cluster, where we do not have Italian restaurants under the top 7 venues**

So, let's look at the income distribution for Cluster 2


In [109]:
map_clusters.choropleth(
    geo_data=hh_geo,
    #data=hh_list,
    data=result,
    #columns=['properties.name','e'],
    columns=['neigborhoods','income'],
    #columns=['properties.name','income'],
    key_on='feature.properties.name',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Rent Level in HH'
)

# display map

map_clusters



**Income Analysis of Neighborhoods**

When you look at the Clusters 2 we see that the Neighborhood "Hafencity" is a place, where wealthy people live and a place which is quite close to the city. So let’ s select this Neighborhood for the further analysis.
Let's check if there are Italian restaurants near this neighborhood/Borough Hamburg-Mitte.



In [117]:
# This is the center of the Borough Hamburg-Mitte
address = 'Rathausmarkt 11, Hamburg, Germany'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

53.5507842 9.99309983077213


In [118]:
search_query = 'Italian'
radius = 500

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=4UIOJQDCN5IWVX4VK4RWNSKTCYBO5LSJWJDPJAWHLECKGJK5&client_secret=4RBRPJNGE5XGR1F2GYFWZ04BNZ2EDZU4MNQ44I33VC0FXKFS&ll=53.5507842,9.99309983077213&v=20180605&query=Italian&radius=500&limit=100'

In [119]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5db1768ef9dbde0038b62ff0'},
 'response': {'venues': [{'id': '4c73c8fe9e706dcb8a464e62',
    'name': "L'Italiana Gelateria",
    'location': {'address': 'Ballindamm 40',
     'lat': 53.55192574233942,
     'lng': 9.994833353388508,
     'labeledLatLngs': [{'label': 'display',
       'lat': 53.55192574233942,
       'lng': 9.994833353388508}],
     'distance': 171,
     'postalCode': '20095',
     'cc': 'DE',
     'neighborhood': 'Hamburg-Altstadt',
     'city': 'Hamburg',
     'state': 'Hamburg',
     'country': 'Deutschland',
     'formattedAddress': ['Ballindamm 40', '20095 Hamburg', 'Deutschland']},
    'categories': [{'id': '4bf58dd8d48988d1c9941735',
      'name': 'Ice Cream Shop',
      'pluralName': 'Ice Cream Shops',
      'shortName': 'Ice Cream',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/icecream_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1571911310',
    'hasPerk': False}]}}

In [120]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId
0,"[{'id': '4bf58dd8d48988d1c9941735', 'name': 'I...",False,4c73c8fe9e706dcb8a464e62,Ballindamm 40,DE,Hamburg,Deutschland,171,"[Ballindamm 40, 20095 Hamburg, Deutschland]","[{'label': 'display', 'lat': 53.55192574233942...",53.551926,9.994833,Hamburg-Altstadt,20095,Hamburg,L'Italiana Gelateria,v-1571911310


In [121]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,cc,city,country,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,L'Italiana Gelateria,Ice Cream Shop,Ballindamm 40,DE,Hamburg,Deutschland,171,"[Ballindamm 40, 20095 Hamburg, Deutschland]","[{'label': 'display', 'lat': 53.55192574233942...",53.551926,9.994833,Hamburg-Altstadt,20095,Hamburg,4c73c8fe9e706dcb8a464e62


**Final Analysis result**

Within the Cluster 0 we have one Italian restaurant close to the center. This is already quite far away from our selected neighborhood Hafencity.

So, the conclusion of this analysis is, that we suggest our stakeholder to open their first restaurant in the neighborhood Hafencity, a location that satisfies the defined requirements:
*	location that are not already crowded with Italian restaurants
*	area where potential customers live with a high income
*	location close to city center if first two conditions are met


## Results and Discussion <a name="results"></a>

The Borough Cluster Analysis get us a good insight of the different areas of Hamburg. We found a bigger suburban area – Cluster 0 (with the boroughs Wansbek, Bergedorf & Harburg), where we do not have so many restaurants (mostly in supermarkets and Hotels) and a lower density of people living there. Another cluster – Cluster 1 - (with the boroughs - Altona & Hamburg-Nord) is a bit closer to the city center and has a higher density of people living there. 

Finally, there is Cluster 2 with the borough Hamburg Mitte, which includes the city center of Hamburg. This cluster has a high density of people living there and a lot of places to eat in Hotels, followed by Coffee Shops and Cafes. Additionally, this cluster is the only one, which has no Italian restaurants under the top 7 venues.

When you looked at the map with the income visualization, we see that in Cluster 2 the neighborhood "Hafencity" is a place, where wealthy people live. Additionally, this is a place which is quite close to the city and has currently no Italian restaurant listed in the Foursquare data. 

So, the conclusion of this analysis is, that we suggest our stakeholder to open their first restaurant in the neighborhood “Hafencity”, a location that satisfies the defined requirements of the business problem:
*	location that are not already crowded with Italian restaurants
*	area where potential customers with a high income live 
*	location close to city center if first two conditions are met


## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify an area in Hamburg, close to the city center with only a few Italian restaurants and wealthy citizen living there. We finally could identify such a neighborhood – the “Hafencity”, based on our data and analysis.

The final decision on the optimal restaurant location within the neighborhood “Hafencity” will be made by stakeholders based on possible place, which can be rented.

When we look at this analysis, we could get even better data and probably more insights, when we would use and analyze Foursquare data at neighborhood level. Currently this was not needed on basis of the business problems. However, if the Mama’s Pizza chain wants to expand their footprint in Hamburg this analysis would be the next step to find a couple of places for new Italian restaurants.

Another further analysis could be the test, if the Foursquare data for Hamburg is probably not well maintain. My gut feeling is that there are more Italian restaurants in Hamburg then Foursquare reported. We could try to use other sources like Yelp.

