## Introduction

My project is about opening new fitness studio in Hamburg, Germany. Since it is very popular to go fitness studios and build muscles. It is a trending business to open fitness studios.
In Hamburg there are in total 104 areas and 7 regions. According to foursquare dataset and frequency of the studios, we will decide to where open the new studio. 

## Data Description

As dataset I am going to use Wikipedia and Foursquare. On the following link, there is list of areas and their coordinates:
https://de.wikipedia.org/wiki/Liste_der_Bezirke_und_Stadtteile_Hamburgs
In order to obtain latitude and longitude, the geolocator funtion will be utilized.
And for the fitness studio location I am going to use Foursquare data.

## Methodology

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


Data Gathering:

In [35]:
df=pd.read_html('https://de.wikipedia.org/wiki/Liste_der_Bezirke_und_Stadtteile_Hamburgs')[1]

In [36]:
df.head()

Unnamed: 0,Stadtteil,Ortsteile,Bezirk,Fläche(km²),Einwohner,Bevölkerungsdichte(Einwohner/km²),Koordinaten,Karte
0,Hamburg-Altstadt,,Hamburg-Mitte,,2350.0,979,"53° 33′ 0″ N, 10° 0′ 0″ O",
1,HafenCity,,Hamburg-Mitte,,4925.0,2239,"53° 32′ 28″ N, 10° 0′ 1″ O",
2,Neustadt,,Hamburg-Mitte,,12.762,5549,"53° 33′ 7″ N, 9° 59′ 8″ O",
3,St. Pauli,,Hamburg-Mitte,,22.097,8839,"53° 33′ 25″ N, 9° 57′ 50″ O",
4,St. Georg,,Hamburg-Mitte,,11.358,4733,"53° 33′ 18″ N, 10° 0′ 44″ O",


Getting rid of unnecessary columns:

In [37]:
df= df.drop(columns=['Ortsteile', 'Fläche(km²)','Bevölkerungsdichte(Einwohner/km²)','Karte','Koordinaten'])

In [38]:
df.head()

Unnamed: 0,Stadtteil,Bezirk,Einwohner
0,Hamburg-Altstadt,Hamburg-Mitte,2350.0
1,HafenCity,Hamburg-Mitte,4925.0
2,Neustadt,Hamburg-Mitte,12.762
3,St. Pauli,Hamburg-Mitte,22.097
4,St. Georg,Hamburg-Mitte,11.358


In [39]:
df.shape

(104, 3)

Renaming the columns:

In [40]:
df.rename(columns={"Stadtteil": "Areas", "Bezirk": "Region","Einwohner":"Population"}, inplace=True)

In [44]:
df.head()

Unnamed: 0,Areas,Region,Population
0,Hamburg-Altstadt,Hamburg-Mitte,2350.0
1,HafenCity,Hamburg-Mitte,4925.0
2,Neustadt,Hamburg-Mitte,12.762
3,St. Pauli,Hamburg-Mitte,22.097
4,St. Georg,Hamburg-Mitte,11.358


Obtaining latitude and longitude values for each area:

In [65]:
Latitude =[]
Longtitude =[]
for area, region in zip(df['Areas'], df['Region']):
    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(area + ', ' + region)
    Latitude.append(location.latitude)
    Longtitude.append(location.longitude)


In [66]:
df['Latitude']=Latitude
df['Longitude']=Longtitude


In [67]:
df.head()

Unnamed: 0,Areas,Region,Population,Latitude,Longitude
0,Hamburg-Altstadt,Hamburg-Mitte,2350.0,53.550468,9.99464
1,HafenCity,Hamburg-Mitte,4925.0,53.542913,9.995835
2,Neustadt,Hamburg-Mitte,12.762,53.549881,9.979048
3,St. Pauli,Hamburg-Mitte,22.097,53.550796,9.970075
4,St. Georg,Hamburg-Mitte,11.358,53.556993,10.014162


In [68]:
df.tail()

Unnamed: 0,Areas,Region,Population,Latitude,Longitude
99,Hausbruch,Harburg,17.036,53.471441,9.880114
100,Neugraben-Fischbek,Harburg,31.589,53.473958,9.842648
101,Francop,Harburg,715.0,53.504095,9.877607
102,Neuenfelde,Harburg,4927.0,53.518241,9.807916
103,Cranz,Harburg,804.0,53.537285,9.777929


In [69]:
address = 'Hamburg'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Hamburg are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Hamburg are 53.5437641, 10.0099133.


Visualisation of the Areas on the Map

In [70]:
# create map of New York using latitude and longitude values
map_hamburg = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, area, region in zip(df['Latitude'], df['Longitude'], df['Areas'], df['Region']):
    label = '{}, {}'.format(region, area)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_hamburg)  
    
map_hamburg

Foursquare API Call:

In [71]:
CLIENT_ID = 'VLW1MW0PGPKNNOUG2HNH2ASUJH1DJXHPKQWUK3YCIGUKM1QN' # your Foursquare ID
CLIENT_SECRET = 'JVGOCXXM3SGKV3RG4UFU0JNWXNZ1HNGJIW5XRE53C54FDEJT' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: VLW1MW0PGPKNNOUG2HNH2ASUJH1DJXHPKQWUK3YCIGUKM1QN
CLIENT_SECRET:JVGOCXXM3SGKV3RG4UFU0JNWXNZ1HNGJIW5XRE53C54FDEJT


In [145]:
LIMIT = 100
radius = 1000

In [146]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Areas', 
                  'Areas Latitude', 
                  'Areas Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Getting Information for Venues:

In [147]:
df_venues = getNearbyVenues(names=df['Areas'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude'])

Hamburg-Altstadt
HafenCity
Neustadt
St. Pauli
St. Georg
Hammerbrook
Borgfelde
Hamm
Horn
Billstedt
Billbrook
Rothenburgsort
Veddel
Wilhelmsburg
Kleiner Grasbrook
Steinwerder
Waltershof
Finkenwerder
Neuwerk
Altona-Altstadt
Sternschanze
Altona-Nord
Ottensen
Bahrenfeld
Groß Flottbek
Othmarschen
Lurup
Osdorf
Nienstedten
Blankenese
Iserbrook
Sülldorf
Rissen
Eimsbüttel
Rotherbaum
Harvestehude
Hoheluft-West
Lokstedt
Niendorf
Schnelsen
Eidelstedt
Stellingen
Hoheluft-Ost
Eppendorf
Groß Borstel
Alsterdorf
Winterhude
Uhlenhorst
Hohenfelde
Barmbek-Süd
Dulsberg
Barmbek-Nord
Ohlsdorf
Fuhlsbüttel
Langenhorn
Eilbek
Wandsbek
Marienthal
Jenfeld
Tonndorf
Farmsen-Berne
Bramfeld
Steilshoop
Wellingsbüttel
Sasel
Poppenbüttel
Hummelsbüttel
Lemsahl-Mellingstedt
Duvenstedt
Wohldorf-Ohlstedt
Bergstedt
Volksdorf
Rahlstedt
Lohbrügge
Bergedorf
Curslack
Altengamme
Neuengamme
Kirchwerder
Ochsenwerder
Reitbrook
Allermöhe
Billwerder
Moorfleet
Tatenberg
Spadenland
Neuallermöhe
Harburg
Neuland
Gut Moor
Wilstorf
Rönneburg


In [148]:
df_venues.head()

Unnamed: 0,Areas,Areas Latitude,Areas Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Hamburg-Altstadt,53.550468,9.99464,Le Lion,53.550125,9.994436,Cocktail Bar
1,Hamburg-Altstadt,53.550468,9.99464,Rathausmarkt,53.550737,9.993503,Plaza
2,Hamburg-Altstadt,53.550468,9.99464,Picasso,53.549934,9.995627,Spanish Restaurant
3,Hamburg-Altstadt,53.550468,9.99464,estancia steaks,53.548581,9.995539,Steakhouse
4,Hamburg-Altstadt,53.550468,9.99464,Le Plat du Jour,53.548773,9.994295,French Restaurant


In [149]:
print('There are {} uniques categories.'.format(len(df_venues['Venue Category'].unique())))

There are 258 uniques categories.


In [150]:
df_venues.groupby('Venue Category').count()

Unnamed: 0_level_0,Areas,Areas Latitude,Areas Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Accessories Store,1,1,1,1,1,1
Adult Boutique,1,1,1,1,1,1
Afghan Restaurant,1,1,1,1,1,1
American Restaurant,2,2,2,2,2,2
Antique Shop,1,1,1,1,1,1
Arepa Restaurant,1,1,1,1,1,1
Art Gallery,3,3,3,3,3,3
Art Museum,1,1,1,1,1,1
Arts & Crafts Store,3,3,3,3,3,3
Asian Restaurant,24,24,24,24,24,24


There are many different namings can lead our analyse, we need to combine them:

In [151]:
misleading_enteries=['Gym','Yoga Studio','Climbing Gym','Gym Pool','Gym / Fitness Center']
correct_entery='Fitness Studio'
for i in misleading_enteries:
    df_venues.replace(misleading_enteries, correct_entery, inplace=True)


In [117]:
df_venues.groupby('Venue Category').count()

Unnamed: 0_level_0,Area,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Accessories Store,1,1,1,1,1,1
Adult Boutique,1,1,1,1,1,1
Afghan Restaurant,1,1,1,1,1,1
American Restaurant,2,2,2,2,2,2
Antique Shop,1,1,1,1,1,1
Arepa Restaurant,1,1,1,1,1,1
Art Gallery,3,3,3,3,3,3
Art Museum,1,1,1,1,1,1
Arts & Crafts Store,3,3,3,3,3,3
Asian Restaurant,24,24,24,24,24,24


In [199]:
fitness= df_venues['Venue Category'] == 'Fitness Studio'
df_fitness = df_venues[fitness]

In [168]:
df_fitness.head()

Unnamed: 0,Areas,Areas Latitude,Areas Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
185,Neustadt,53.549881,9.979048,MeridianSpa,53.546301,9.980929,Fitness Studio
392,St. Georg,53.556993,10.014162,Le Royal Meridien Hamburg Fitness Centre,53.5586,10.007644,Fitness Studio
416,Hammerbrook,53.546815,10.026559,Betriebssportverband Hamburg,53.54783,10.03042,Fitness Studio
420,Hammerbrook,53.546815,10.026559,InnoFit,53.543685,10.029404,Fitness Studio
488,Wilhelmsburg,53.498473,10.006859,Schwimmhalle Inselpark,53.495517,10.001777,Fitness Studio


In [154]:
df_fitness.shape

(27, 7)

## Clustering Fitness Studios

In [200]:
# set number of clusters
kclusters = 3

df_grouped_clustering = df_fitness[['Venue Latitude', 'Venue Longitude']]

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(df_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 0, 0, 0, 1, 1, 1, 1, 1])

Let's create a new dataframe that includes the cluster as well as the 27 venues for each neighborhood.

In [201]:
list = kmeans.labels_.astype(int)

In [202]:
df_fitness.insert(0, 'Cluster Labels', list)
hamburg_merged=df
hamburg_merged = hamburg_merged.join(df_fitness.set_index('Areas'), on='Areas')

hamburg_merged.head()

Unnamed: 0,Areas,Region,Population,Latitude,Longitude,Cluster Labels,Areas Latitude,Areas Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Hamburg-Altstadt,Hamburg-Mitte,2350.0,53.550468,9.99464,,,,,,,
1,HafenCity,Hamburg-Mitte,4925.0,53.542913,9.995835,,,,,,,
2,Neustadt,Hamburg-Mitte,12.762,53.549881,9.979048,1.0,53.549881,9.979048,MeridianSpa,53.546301,9.980929,Fitness Studio
3,St. Pauli,Hamburg-Mitte,22.097,53.550796,9.970075,,,,,,,
4,St. Georg,Hamburg-Mitte,11.358,53.556993,10.014162,0.0,53.556993,10.014162,Le Royal Meridien Hamburg Fitness Centre,53.5586,10.007644,Fitness Studio


In [203]:
hamburg_merged['Cluster Labels'].replace(np.nan,4, inplace=True)

In [204]:
hamburg_merged.head()

Unnamed: 0,Areas,Region,Population,Latitude,Longitude,Cluster Labels,Areas Latitude,Areas Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Hamburg-Altstadt,Hamburg-Mitte,2350.0,53.550468,9.99464,4.0,,,,,,
1,HafenCity,Hamburg-Mitte,4925.0,53.542913,9.995835,4.0,,,,,,
2,Neustadt,Hamburg-Mitte,12.762,53.549881,9.979048,1.0,53.549881,9.979048,MeridianSpa,53.546301,9.980929,Fitness Studio
3,St. Pauli,Hamburg-Mitte,22.097,53.550796,9.970075,4.0,,,,,,
4,St. Georg,Hamburg-Mitte,11.358,53.556993,10.014162,0.0,53.556993,10.014162,Le Royal Meridien Hamburg Fitness Centre,53.5586,10.007644,Fitness Studio


## Visualisation of the Clustred Fitness Studios & Areas Without Fitness Studio

In [207]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters+2)
ys = [i + x + (i*x)**2 for i in range(kclusters+2)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(hamburg_merged['Latitude'], hamburg_merged['Longitude'], hamburg_merged['Areas'], hamburg_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=3,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Results

In the light of analyses we can come up with the idea of openning fitness studios on the orange colored areas. Because on the map these are the areas which are not able to assigned any cluster. 
On the other hand, the dark blue areas are, as expected, full with fitness studios, while being city center. After dark blue, red and light blue areas also seem having relative high numbers of fitness studio.

## Discussion

When we evaluate the results, we have to consider the reliability of the veneu data. As a person living Hamburg, the number of fitness studio in Hamburg should be around 27. However, the results have reasonability while the density of fitness studios are increasing in the city center.