## Introduction
Madrid is a citi that receives a lot of tourism, so let's analyze the hotels supply in the city through their location and if we can group them in similar clusters.

## Data requirements
For the data requirements, I am going to make a call to the Foursquare API to get the main hotels in the city.

### Methodology
Once I got the data, I will cluster the hotels through the K-Means model in scikit learn,

In [1]:
#Import of the needed libraries
import folium
import requests
import pandas as pd
from sklearn.cluster import KMeans
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors

I start by loading the Madrid map

In [2]:
latitude = 40.4168
longitude = -3.7038

map_madrid = folium.Map(location=[latitude, longitude], zoom_start=15)

In [3]:
map_madrid

After that, I set my credentials to run the Foursquare query:

In [4]:
CLIENT_ID = 'VQUC23XR54K05DSOIIRFUJTN0SDL1REBPONIAM1LDITOUIRI' # your Foursquare ID
CLIENT_SECRET = '5IDPFEMDTLDTSTBXDPF105TW2EPYKHWR4VU3J0D0C1VUEU5Y' # your Foursquare Secret
VERSION = '20180604' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: VQUC23XR54K05DSOIIRFUJTN0SDL1REBPONIAM1LDITOUIRI
CLIENT_SECRET:5IDPFEMDTLDTSTBXDPF105TW2EPYKHWR4VU3J0D0C1VUEU5Y


In [5]:
radius = 10000
limit = 3000
search_query = 'hotel'
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, limit)
url

'https://api.foursquare.com/v2/venues/search?client_id=VQUC23XR54K05DSOIIRFUJTN0SDL1REBPONIAM1LDITOUIRI&client_secret=5IDPFEMDTLDTSTBXDPF105TW2EPYKHWR4VU3J0D0C1VUEU5Y&ll=40.4168,-3.7038&v=20180604&query=hotel&radius=10000&limit=3000'

I get the results of the query. Now I should convert the important information (name and location) to a pandas dataframe.

In [6]:
results = requests.get(url).json()
results = results['response']['venues']

In [7]:
dicc = {}
dicc['id'] = []
dicc['name'] = []
dicc['lat'] = []
dicc['lng'] = []
for result in results:
        dicc['name'].append(result['name'])
        dicc['lat'].append(result['location']['lat'])
        dicc['lng'].append(result['location']['lng'])

In [8]:
df = pd.DataFrame()
df['name'] = dicc['name']
df['lat'] = dicc['lat']
df['lng'] = dicc['lng']
df.head()

Unnamed: 0,name,lat,lng
0,Hotel Europa Madrid***,40.417503,-3.703612
1,Hotel Moderno,40.417104,-3.704828
2,Cafetería-Restaurante Hotel Europa,40.417388,-3.703623
3,Hotel Eden Paraiso Neptuno,40.41746,-3.702777
4,Hotel Victoria 4,40.416237,-3.701691


Now, let's add the markers of the venues to de Madrid map

In [9]:
for lat, lng, name in zip(df['lat'], df['lng'], df['name']):
    label = name
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_madrid) 

In [10]:
map_madrid

Now lets's cluster the hotels.

In [11]:
kclusters=3
clustdf = df.drop('name', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(clustdf)

In [12]:
df.insert(0, 'Cluster labels', kmeans.labels_)

In [13]:
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
markers_colors = []
map_madrid = folium.Map(location=[latitude, longitude], zoom_start=15)
for lat, lon, name, cluster in zip(df['lat'], df['lng'], df['name'], df['Cluster labels']):
    label = folium.Popup(str(name) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_madrid)

In [14]:
map_madrid

## Results and conclusion
Having set n_clusters at 3 seems reasonable, because through the visual exploration we can see three different clusters (one very dense close to Santo Domingo square, other cluster rounding Sol square, and some hotels in the east.).