# Capstone Project - Dance Studio in Budapest

## Table of contents

* [Business Problem and Background](#introduction)
* [Data](#data)
* [Methodology and analysis](#methodology)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Business Problem and Background <a name="introduction"></a>

This project intends to identify the optimal place for a dance studio in Budapest. The topic "dance" comes from my brother being a dance teacher, who used to be on the hunt for a place to open his own studio some years back. At that time I was not even close to data science, but moving the situation to present days in theory, I wonder if/how I and data could help him make the close to best decision. 

So my stakeholder is my onetime brother, but the results of my analysis could be interesting to anyone having similar plans in the capital of Hungary nowadays (well, after the current COVID situation is over, as dancing is a very contact-intensive hobby). 

According to my brother's information, the competition has always been "strong" in the dancing business. He claims there were and still are many such studios in Budapest compared to the size of the city (population). I am going to try and check if data confirms his statement, and will be searching for areas that have no or just a few dance studios nearby, but which are easily accessible for a number of people without the usage of cars, as parking is almost impossible in Budapest.

Finding the best spot is not in my intention, as I understand there are many other factors to be taken into account, some of them being less data driven, but I will attempt to locate the top 3 neighborhoods to help narrow down the list of potential areas to consider for a new dance studio. 

## Data <a name="data"></a>

As per the problem setting above, I will focus on the below aspects: 

What data would be useful/needed?   
1. The number and location of existing dance studios in Budapest as of today.   
2. The distance of the studios from the centre of the city.  
3. The average ratings the studios received from the users.  
4. Crime data of the neighborhoods/districts.  
5. Entry fees or hourly rates of the studios.           
       
       
What data are available?   
1. With a bit of probing I found that the Foursquare page has various but limited information about the places, like location (address, geo coordinates), tips, reviews and ratings. Useful for my purpose from these are the geo ones and the ratings, so I will concentrate on retrieving those primarily, via the Foursquare API. 

2. Understanding the crime heatmap about the districts would be an asset to find a rather secure location. There can be venues, like night clubs that may to a certain degree magnetize crime to a specific area. Though the open hours of the dance studios (mainly evenings) and that of the night clubs (late night) have no intersection, this would be worth looking into. Unfortunately though, I did not find any available dataset about the criminal records nor on Foursquare neither on any other website, so I put this view aside for the current analysis. 
3. As for the prices, I have no information about the fees and rates for those studios the Foursquare site lists. Hence, I postpone this aspect too for a later study.  

So as above mentioned, the source of the data to be used will be Foursquare. I will first capture the coordinates of Budapest, then pull the list of venues in the city, transform the json file into a dataframe and clean it for the purpose (filter on the dance studios, let go of the unnecessary columns, etc.) Feature selection part I would skip for this project as I would be applying the clustering method, so I would rather concentrate on the pre-processing in that direction. 

Then I will visualize the clusters with the folium map to have a deeper understanding and insights about the distribution/spread of the studios and the potentially find further aspects to investigate. 

#### Importing the packages needed for the project

In [2]:
import pandas as pd 
import numpy as np

# !conda install -c conda-forge folium=0.5.0 --yes
import folium 
from folium import plugins
from folium.plugins import HeatMap

import requests

from sklearn.cluster import KMeans
from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors

!pip install geopy
# !conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim

print('Libraries imported.')

Collecting geopy
[?25l  Downloading https://files.pythonhosted.org/packages/ab/97/25def417bf5db4cc6b89b47a56961b893d4ee4fec0c335f5b9476a8ff153/geopy-1.22.0-py2.py3-none-any.whl (113kB)
[K     |████████████████████████████████| 122kB 6.3MB/s eta 0:00:01
[?25hCollecting geographiclib<2,>=1.49 (from geopy)
  Downloading https://files.pythonhosted.org/packages/8b/62/26ec95a98ba64299163199e95ad1b0e34ad3f4e176e221c40245f211e425/geographiclib-1.50-py3-none-any.whl
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-1.50 geopy-1.22.0
Libraries imported.


#### Obtaining the latitude and longitude coordinates of Budapest 

In [3]:
address = 'Budapest'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

47.4983815 19.0404707


#### Defining the Foursquare credentials and version for the API call

In [4]:
CLIENT_ID = '200MTYYTLYAZ1KQSP2HEL0XSOTY3H2W1V5IGZ4AKJSNAS2SH'
CLIENT_SECRET = 'A45KMHQR5A1S1RYP0FN4WUS5Y1J3JSGA2E3WYB1Y1U5XTFVC' 
VERSION = '20180604'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 200MTYYTLYAZ1KQSP2HEL0XSOTY3H2W1V5IGZ4AKJSNAS2SH
CLIENT_SECRET:A45KMHQR5A1S1RYP0FN4WUS5Y1J3JSGA2E3WYB1Y1U5XTFVC


#### Building the URL (filtering the search for Dance studios and setting the radius) for the API call

In [5]:
search_query = 'Dance studio '
radius = 5000
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=200MTYYTLYAZ1KQSP2HEL0XSOTY3H2W1V5IGZ4AKJSNAS2SH&client_secret=A45KMHQR5A1S1RYP0FN4WUS5Y1J3JSGA2E3WYB1Y1U5XTFVC&ll=47.4983815,19.0404707&v=20180604&query=Dance studio &radius=5000&limit=100'

#### Executing the API call and storing the results in a variable

In [6]:
results = requests.get(url).json()
# results

### Pre-processing and cleaning

#### Slice the venue part from the JSON file and transform it into a pandas dataframe

In [7]:
venues = results['response']['venues']
df_raw = pd.json_normalize(venues)
df_raw.head(3)

Unnamed: 0,id,name,categories,referralId,hasPerk,location.lat,location.lng,location.labeledLatLngs,location.distance,location.cc,location.country,location.formattedAddress,location.address,location.city,location.state,location.postalCode,location.crossStreet,venuePage.id,location.neighborhood
0,53bd64d7498e6200782fb06c,Aerialarts Pole dance studio,"[{'id': '52e81612bcbc57f1066b7a2e', 'name': 'S...",v-1589313429,False,47.494974,19.021891,"[{'label': 'display', 'lat': 47.49497421290476...",1447,HU,Magyarország,[Magyarország],,,,,,,
1,50c2397ce4b0dbacdb37117f,MIRAVOS Dance Studio,"[{'id': '4bf58dd8d48988d134941735', 'name': 'D...",v-1589313429,False,47.510802,19.033793,"[{'label': 'display', 'lat': 47.51080193056089...",1470,HU,Magyarország,"[Bajcsy-Zsilinszky út 66, Magyarország]",Bajcsy-Zsilinszky út 66,,,,,,
2,52d25d9511d2066ed9545fc3,Professional Pole Dance Stúdió,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",v-1589313429,False,47.507802,19.05834,"[{'label': 'display', 'lat': 47.50780189592716...",1704,HU,Magyarország,"[1066. Budapest, Jókai utca 26., Magyarország]","1066. Budapest, Jókai utca 26.",,,,,,


#### Tidy up the category so that it displays the relevant part, the name only

In [8]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# apply the category cut for all rows
df_raw['categories'] = df_raw.apply(get_category_type, axis=1)

df_raw.head(3)

Unnamed: 0,id,name,categories,referralId,hasPerk,location.lat,location.lng,location.labeledLatLngs,location.distance,location.cc,location.country,location.formattedAddress,location.address,location.city,location.state,location.postalCode,location.crossStreet,venuePage.id,location.neighborhood
0,53bd64d7498e6200782fb06c,Aerialarts Pole dance studio,Sports Club,v-1589313429,False,47.494974,19.021891,"[{'label': 'display', 'lat': 47.49497421290476...",1447,HU,Magyarország,[Magyarország],,,,,,,
1,50c2397ce4b0dbacdb37117f,MIRAVOS Dance Studio,Dance Studio,v-1589313429,False,47.510802,19.033793,"[{'label': 'display', 'lat': 47.51080193056089...",1470,HU,Magyarország,"[Bajcsy-Zsilinszky út 66, Magyarország]",Bajcsy-Zsilinszky út 66,,,,,,
2,52d25d9511d2066ed9545fc3,Professional Pole Dance Stúdió,Gym,v-1589313429,False,47.507802,19.05834,"[{'label': 'display', 'lat': 47.50780189592716...",1704,HU,Magyarország,"[1066. Budapest, Jókai utca 26., Magyarország]","1066. Budapest, Jókai utca 26.",,,,,,


#### Filter out the needed columns and rename them

In [30]:
df_col_filtered = df_raw[['name','location.lat', 'location.lng','location.distance','location.address','id','categories']]
df_col_filtered.rename(columns={'location.lat':'lat','location.lng':'lng','location.distance':'dist','location.address':'addr','categories':'cat'}, inplace=True)

In [29]:
df_col_filtered.head(3)

Unnamed: 0,name,lat,lng,dist,addr,id,cat
0,Aerialarts Pole dance studio,47.494974,19.021891,1447,,53bd64d7498e6200782fb06c,Sports Club
1,MIRAVOS Dance Studio,47.510802,19.033793,1470,Bajcsy-Zsilinszky út 66,50c2397ce4b0dbacdb37117f,Dance Studio
2,Professional Pole Dance Stúdió,47.507802,19.05834,1704,"1066. Budapest, Jókai utca 26.",52d25d9511d2066ed9545fc3,Gym


#### See the size of the dataframe

In [10]:
df_col_filtered.shape

(50, 7)

#### Retrieve the ratings for the dance studios

In [12]:
ids=[]
for item in df_col_filtered['id']:
    
    venue_id = item
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)

    result = requests.get(url).json()
    try:
        print(result['response']['venue']['rating'])
    except:
        print('No rating.')
    
    ids.append(item)
    
ids

No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.
No rating.


['53bd64d7498e6200782fb06c',
 '50c2397ce4b0dbacdb37117f',
 '52d25d9511d2066ed9545fc3',
 '540f3e5a498e888bed60c920',
 '5c7f5fcd3e6741002c9cd073',
 '5ba3f822dd70c5002c50b72a',
 '4d4b0c8d2c42a0935222f0e6',
 '53569e25498e03ae3876d311',
 '4f063f85722e1319aa7165af',
 '50b8965729a65951ae8d9012',
 '513cc51d8acab2c9e2bd007c',
 '5a1313ebdff81533d1dcbc1d',
 '4e6e29c9b61cf2b03c3b570d',
 '4f763edae4b0ca8cfbdbf4de',
 '55cc75e6498ef9c69312e360',
 '5609a037498e2f64e30524ce',
 '56b1205d237c532a7b98e1b7',
 '4bdaecce3c4fef3ba5b4f4bf',
 '5e1f235816790d0007cb92a1',
 '4e60f01b091afcffd98424c3',
 '53dffa7e498e875cb3614f0d',
 '4ee0ffcaf7903c92cdf6006a',
 '4da891be1e72d9bb47492407',
 '5227780211d2c6dbd6ebdf6e',
 '4e5faa50d164ed2b48f1ae93',
 '4f4b7a86e4b0cccede68db9e',
 '575fd62c498ec8a99ced2788',
 '4fba7078e4b0be7f03b9bc7b',
 '505c8180e4b0b55be988e721',
 '4bb06d37f964a52026453ce3',
 '5613fc37498ee51d7d91c4f7',
 '57cf0aef498eb312d0037eee',
 '526838f811d2f7cb427dba47',
 '4d8bd8d7809e6ea8fd869323',
 '4f3953a3e4b0

So it seems I am not going to be able to use ratings as there are none for the selected/filtered studios. 

## Methodology and analysis <a name="methodology"></a>

In this project I focus on analysing the dance studio density. I will use folium to visualize their spread with a heatmap first, looking for the low numbered areas and then use k-means clustering machine learnig technique (as we are talking about location data) to group the studios based on their location and see if further patterns, insights mihgt become visible enabling better exploration for the optimal location. 

### Exploratory data analysis

#### How many places have "dance" in their names from that 50?  
Going through the lines I saw dance being spelled with both capital and lower case "d", so let's filter on both!

In [13]:
condition1 = df_col_filtered['name'].str.contains('Dance')    
df_dance1 = df_col_filtered[condition1]

condition2 = df_col_filtered['name'].str.contains('dance')    
df_dance2 = df_col_filtered[condition2]

print (df_dance1.shape)
print (df_dance2.shape)

(19, 7)
(4, 7)


Let's join them into into one dataframe, and have another round of look!

In [14]:
df_dance = pd.concat([df_dance1, df_dance2])
df_dance

Unnamed: 0,name,lat,lng,dist,addr,id,cat
1,MIRAVOS Dance Studio,47.510802,19.033793,1470,Bajcsy-Zsilinszky út 66,50c2397ce4b0dbacdb37117f,Dance Studio
2,Professional Pole Dance Stúdió,47.507802,19.05834,1704,"1066. Budapest, Jókai utca 26.",52d25d9511d2066ed9545fc3,Gym
3,Polex Pole Dance Studio,47.514809,19.050858,1988,Hollán Ernő utca 11,540f3e5a498e888bed60c920,Gym / Fitness Center
4,V25 Dance Studio,47.515266,19.056236,2222,Visegrádi u. 25,5c7f5fcd3e6741002c9cd073,Dance Studio
5,R3D ONE Dance Studio,47.507559,19.066498,2207,68 Andrássy út,5ba3f822dd70c5002c50b72a,Dance Studio
6,Urban Dance Studio,47.515741,19.056367,2272,Őr u. 1.,4d4b0c8d2c42a0935222f0e6,Gym / Fitness Center
7,Master Dance Studio,47.517904,19.057819,2534,,53569e25498e03ae3876d311,Athletics & Sports
8,"Dance Studio, Rokk Szelard 21",47.491835,19.109429,5237,"Rokk Szelard, 21",4f063f85722e1319aa7165af,Nightclub
9,MayaDance Studio,47.495777,19.065013,1868,Rákóczi út 20.,50b8965729a65951ae8d9012,Dance Studio
11,Quality Dance TSE,47.508871,19.056647,1686,,5a1313ebdff81533d1dcbc1d,Dance Studio


#### Performing a sanity check on the categories to see if there is any to close out

In [15]:
df_dance['cat'].value_counts()

Dance Studio            7
Gym / Fitness Center    4
Nightclub               3
Gym                     3
Sports Club             2
Hookah Bar              1
Athletics & Sports      1
Salsa Club              1
Strip Club              1
Name: cat, dtype: int64

Categories seems to vary, yet the names of places suggest similarity in scope, so many of those not under dance studio might represent competition. Even pole dance venues could be "dangerous" as they can be an option for the female target audience. Let's remove the night club-like places only. Those are the: Nightclub (3), Strip Club (1), Hookah Bar (1)  

In [16]:
condition3 = df_dance['cat'] != 'Nightclub'
df_dance3 = df_dance[condition3]

condition4 = df_dance3['cat'] != 'Strip Club'
df_dance4 = df_dance3[condition4]

condition5 = df_dance4['cat'] != 'Hookah Bar'
df_dance5 = df_dance4[condition5]

df_dance5

Unnamed: 0,name,lat,lng,dist,addr,id,cat
1,MIRAVOS Dance Studio,47.510802,19.033793,1470,Bajcsy-Zsilinszky út 66,50c2397ce4b0dbacdb37117f,Dance Studio
2,Professional Pole Dance Stúdió,47.507802,19.05834,1704,"1066. Budapest, Jókai utca 26.",52d25d9511d2066ed9545fc3,Gym
3,Polex Pole Dance Studio,47.514809,19.050858,1988,Hollán Ernő utca 11,540f3e5a498e888bed60c920,Gym / Fitness Center
4,V25 Dance Studio,47.515266,19.056236,2222,Visegrádi u. 25,5c7f5fcd3e6741002c9cd073,Dance Studio
5,R3D ONE Dance Studio,47.507559,19.066498,2207,68 Andrássy út,5ba3f822dd70c5002c50b72a,Dance Studio
6,Urban Dance Studio,47.515741,19.056367,2272,Őr u. 1.,4d4b0c8d2c42a0935222f0e6,Gym / Fitness Center
7,Master Dance Studio,47.517904,19.057819,2534,,53569e25498e03ae3876d311,Athletics & Sports
9,MayaDance Studio,47.495777,19.065013,1868,Rákóczi út 20.,50b8965729a65951ae8d9012,Dance Studio
11,Quality Dance TSE,47.508871,19.056647,1686,,5a1313ebdff81533d1dcbc1d,Dance Studio
12,Astoria Green Dance Centrum,47.491454,19.060629,1701,Magyar utca 36.,4e6e29c9b61cf2b03c3b570d,Gym


#### See some main stats about the distance

In [17]:
df_dance5.describe()

Unnamed: 0,lat,lng,dist
count,18.0,18.0,18.0
mean,47.505164,19.050118,1776.388889
std,0.008564,0.015751,361.047681
min,47.491454,19.021891,1275.0
25%,47.495577,19.038059,1488.75
50%,47.507109,19.056507,1693.5
75%,47.510714,19.060162,2015.0
max,47.517904,19.066498,2534.0


So there is an average of 1776 metres of distance from city centre and a standard deviation of 361 metres. This indicates the dance studios are not straight in the so called centre, but a bit further away. I'll use the folium map to visualize them and see if there is any noticable pattern about their location.   
Seeing the max 2534 metres gives us a hint on what radius we could have chosen earlier on, but seem with the 5000 we made no mistake of closing out any places. 

#### See the dance studios on a map

In [18]:
map_bp = folium.Map(location=[latitude, longitude], zoom_start=13)

for lat, lng, in zip(df_dance5['lat'], df_dance5['lng']):
    folium.Circle([lat, lng], radius=50, color='blue', fill=False).add_to(map_bp)
    
map_bp

#### Now, see them on a heatmap to better illustrate their density

In [19]:
map_bp = folium.Map(location=[latitude, longitude], zoom_start = 13) 

heat_df = df_dance5[['lat', 'lng']]
heat_df = heat_df.dropna(axis=0, subset=['lat','lng'])

# List comprehension to make out list of lists
heat_data = [[row['lat'],row['lng']] for index, row in heat_df.iterrows()]

# Plot it on the map
HeatMap(heat_data).add_to(map_bp)

map_bp

### Clustering dance studios in Budapest

As the number of studios is not that high, I'll try with 2 iterations, first with 3 clusters and then seeing how it works, maybe increase the number a bit.

In [20]:
# First, let's try with 3 clusters
kclusters = 3

dance_clustering = df_dance5[['lat','lng']]
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dance_clustering)

kmeans.labels_[0:10] 

array([0, 1, 1, 1, 1, 1, 1, 2, 1, 2], dtype=int32)

In [21]:
df_dance5.insert(0,'Clusters3',kmeans.labels_)

In [22]:
df_dance5.head(5)

Unnamed: 0,Clusters3,name,lat,lng,dist,addr,id,cat
1,0,MIRAVOS Dance Studio,47.510802,19.033793,1470,Bajcsy-Zsilinszky út 66,50c2397ce4b0dbacdb37117f,Dance Studio
2,1,Professional Pole Dance Stúdió,47.507802,19.05834,1704,"1066. Budapest, Jókai utca 26.",52d25d9511d2066ed9545fc3,Gym
3,1,Polex Pole Dance Studio,47.514809,19.050858,1988,Hollán Ernő utca 11,540f3e5a498e888bed60c920,Gym / Fitness Center
4,1,V25 Dance Studio,47.515266,19.056236,2222,Visegrádi u. 25,5c7f5fcd3e6741002c9cd073,Dance Studio
5,1,R3D ONE Dance Studio,47.507559,19.066498,2207,68 Andrássy út,5ba3f822dd70c5002c50b72a,Dance Studio


In [23]:
# Visualize the clusters
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers
markers_colors = []
for lat, lng, cluster in zip(df_dance5['lat'], df_dance5['lng'],df_dance5['Clusters3']):
    label = folium.Popup('Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Those to the right from the river Danube seem to be grouped ok, but those to the left (Buda side) look odd. It is evident they the top ones and the bottom ones are quite far from each other, so probably were grouped together because of the low number of clusters. To find out, I increase the number of clusters a bit, say to 5, but first delete the existing cluster column that is not going to be used any more. 

In [31]:
df_dance5.drop(['Clusters3'], axis=1, inplace=True)

In [25]:
kclusters = 5

dance_clustering = df_dance5[['lat','lng']]
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dance_clustering)

kmeans.labels_[0:10] 

array([2, 1, 4, 4, 1, 4, 4, 3, 1, 3], dtype=int32)

In [26]:
df_dance5.insert(0,'Clusters5',kmeans.labels_)

In [27]:
df_dance5.head(5)

Unnamed: 0,Clusters5,name,lat,lng,dist,addr,id,cat
1,2,MIRAVOS Dance Studio,47.510802,19.033793,1470,Bajcsy-Zsilinszky út 66,50c2397ce4b0dbacdb37117f,Dance Studio
2,1,Professional Pole Dance Stúdió,47.507802,19.05834,1704,"1066. Budapest, Jókai utca 26.",52d25d9511d2066ed9545fc3,Gym
3,4,Polex Pole Dance Studio,47.514809,19.050858,1988,Hollán Ernő utca 11,540f3e5a498e888bed60c920,Gym / Fitness Center
4,4,V25 Dance Studio,47.515266,19.056236,2222,Visegrádi u. 25,5c7f5fcd3e6741002c9cd073,Dance Studio
5,1,R3D ONE Dance Studio,47.507559,19.066498,2207,68 Andrássy út,5ba3f822dd70c5002c50b72a,Dance Studio


In [28]:
# Visualize the clusters
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers
markers_colors = []
for lat, lng, cluster in zip(df_dance5['lat'], df_dance5['lng'],df_dance5['Clusters5']):
    label = folium.Popup('Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Results and Discussion <a name="results"></a>

As per the above map, there are a number of insights to draw.  

First is that, though it is relative what we call "many" and my biased brother would certainly doubt my coming statement, but I find the number of the dance studios in Budapest (at least those that are listed by Foursquare, 18 after the filtering) quite reasonable, definitely not way too many. 

The slightly large distance from the city centre seems confirmed, as it is around the letter of "a" in the word "Budapest" in the middle of the Map. So that means, studios are situated not straight in deepest downtown. This makes sense bearing in mind those areas are either mainly very narrow streets, for mostly tourists, and/or with limited and difficult public transportation for a mass of people. The green area just below the Budapest word (called Tabán) is a park, above it there is a castle where no locals go unless they live there. On the Pest side, same situation, there is the Parliament there, and the shopping streets, which we do like to avoid going.  

It is eye-catching also, that the existing studios tend to be around, or very close to the inner and middle boulevard of the city. As to be seen, they are more or less on a circle-shaped curve. These are the main streets of the city with decent public transport (trams, buses, underground stations and still considered downtown-ish.) Interesting to see that further out there are again no studios. 

The Buda side (left to river Danube) is less "crowded" of dance schools. Though this is not part of this analysis due to lack of dataset, but the real estate prices, rent fees, etc are way higher there than on Pest side, so anyone planning to open any kind of place there must be making decent revenue in order to make the investment profitable. This makes me concentrate on the Pest side (right to the river).  
Orange and purple ones (and even reds on Buda) are relatively close to railway stations. These are areas of the city there the commuting people arrive from and leave to their agglomeration homes.  

These above observations help me narrow down the list of potential areas to the following: 
1. Rákóczi út between East Railway station and Blaha Lujza square, 
2. Károly boulevard and Bajcsi Zsilinszki út between Astoria and West Railway station, 
3. Elisabeth boulevard. 




## Conclusion <a name="conclusion"></a>

This project had the purpose to find an ideal location of a new dance studio in Budapest. Under ideal I understood, being in an attracting neighborhood but also being away from the rest of the schools. I pulled and wrangled the Foursquare data about the currently existing (and listed) dance studios, their names, location, categories, etc. Unfortunately there were no ratings available for me to retrieve for these studios, so I could not analyze what the averages of ratings per neighborhoods are like. 
Considering the nature of the data at hand (locations) I used the clustering technique and complemented my analysis with a heatmap for density visualization. The results showed me not only where the studios are and so the areas where there are not that many or none, but also highlighted some key features/indicators to be taken into account from the location perspective (railway-proximity, etc.).  

As indicated in the beginning, my goal was not to find the "best" place. There are many other factors (rent price, crime rates, class fees, socializing potential, etc. that, if/once sufficient data is available, could be subject to further analysis) influencing what the best is, so I wanted to offer the top 3 options about the location based on other schools locations.  

Final decision is to be made by the stakeholder(s) based on specific nature of the neighborhoods and locations for each suggested street (intentionally not talking about a specific address of a given street).