# Capstone Project Notebook

## *Opening a new shopping mall in Madrid, Spain*

In this notebooks a data analysis procedure is performed to answer the following business question:

**In the city of Madrid, Spain, if a property developer is looking to open a new shopping mall, where would you recommend that they open it?**

The followed steps are the following:
1. Load and clean a dataframe of neighborhoods in Madrid, Spain downloaded from a public government data source.
2. Get the geographical coordinates (latitude and longitude) of the neighborhoods.
3. Obtain the venue data for the neighborhoods from Foursquare API.
4. Explore and cluster the neighborhoods.
5. Select the best neighborhood to open a new Shopping Mall in Madrid, Spain.

-------------------------------------------------------------------------------------------------------------------------------------------------------------

### 0. Import the required libraries

In [2]:
#import required libraries
import numpy as np
import pandas as pd

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import folium #to plot a map

import requests #to make requests to the Foresquare API

from sklearn.cluster import KMeans #for clustering stage

import matplotlib.cm as cm
import matplotlib.colors as colors

print("Libraries imported!!")

Libraries imported!!


### 1. Load and clean the dataframe of neighborhoods in Madrid.

In [6]:
#read the dataframe
df = pd.read_csv('Data/madrid_neighborhoods.csv', sep=';', error_bad_lines=False)
df.head(10)

Unnamed: 0,OBJECTID_1,NOMDIS,NOMBRE,Shape_Leng,Shape_Area,COD_DIS,COD_DIS_TX,BARRIO_MAY,COD_DISBAR,COD_BAR,NUM_BAR,BARRIO_MT,COD_DISB
0,60,Centro,Palacio,5754822748,1469905684,1,1,PALACIO,11,11,1,PALACIO,1_1
1,50,Centro,Embajadores,4275227681,1033724698,1,1,EMBAJADORES,12,12,2,EMBAJADORES,1_2
2,55,Centro,Cortes,373107903,5918741219,1,1,CORTES,13,13,3,CORTES,1_3
3,64,Centro,Justicia,3597421427,739414338,1,1,JUSTICIA,14,14,4,JUSTICIA,1_4
4,66,Centro,Universidad,4060075813,9480270773,1,1,UNIVERSIDAD,15,15,5,UNIVERSIDAD,1_5
5,56,Centro,Sol,2719287883,4453008221,1,1,SOL,16,16,6,SOL,1_6
6,49,Arganzuela,Imperial,4557937642,967678602,2,2,IMPERIAL,21,21,1,IMPERIAL,2_1
7,40,Arganzuela,Acacias,3950326025,1073437937,2,2,ACACIAS,22,22,2,ACACIAS,2_2
8,31,Arganzuela,Chopera,3203407974,5677865291,2,2,CHOPERA,23,23,3,CHOPERA,2_3
9,24,Arganzuela,Legazpi,5141642671,1414470497,2,2,LEGAZPI,24,24,4,LEGAZPI,2_4


In [7]:
#remove unnecesary columns from the datarame
df = df.drop(['NOMDIS','OBJECTID_1', 'Shape_Leng','Shape_Area','COD_DIS','COD_DIS_TX','BARRIO_MAY','COD_DISBAR','COD_BAR','NUM_BAR','BARRIO_MT','COD_DISB'], axis=1)
#replace the NOMBRE columname
df = df.rename(columns={'NOMBRE': 'Neighborhood'})
df.head(10)

Unnamed: 0,Neighborhood
0,Palacio
1,Embajadores
2,Cortes
3,Justicia
4,Universidad
5,Sol
6,Imperial
7,Acacias
8,Chopera
9,Legazpi


In [8]:
#verify if there are nan values
df.isnull().values.any()

False

There are no nan values.

In [10]:
#print the shape of the dataset
df.shape

(131, 1)

### 2. Get the geographical coordinates of the neighborhoods

In [12]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Madrid, Spain'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [13]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in df["Neighborhood"].tolist() ]
#print the coordinates
coords

In [14]:
#create a temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
# merge the coordinates into the original dataframe
df['Latitude'] = df_coords['Latitude']
df['Longitude'] = df_coords['Longitude']
print(df.shape)
df.head(10)

(131, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Palacio,40.41517,-3.71273
1,Embajadores,40.40803,-3.70067
2,Cortes,40.41589,-3.69636
3,Justicia,40.42479,-3.69308
4,Universidad,40.42565,-3.70726
5,Sol,40.41802,-3.70577
6,Imperial,40.40833,-3.71865
7,Acacias,40.40137,-3.70669
8,Chopera,40.39536,-3.69833
9,Legazpi,40.38702,-3.6899


#### 2.1. Create a map of Madrid with neighborhoods superimposed on top.

In [16]:
# get the coordinates of madrid
address = 'Madrid, Spain'
geolocator = Nominatim(user_agent="http")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Madrid, Spain {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Madrid, Spain 40.4167047, -3.7035825.


In [17]:
# create map of Madrid using latitude and longitude values
map_madrid = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_madrid) 
#save the map as html file
map_madrid.save('map_neighborhoods_madrid.html')
#show the map
map_madrid

### 3. Obtain the venue data for the neighborhoods from Foursquare API

In [18]:
#define Foursquare Credentials and Version
CLIENT_ID = 'NO4T3FDTKKILCD5WRRWBPOEV1N1OIZXE2TXG3N0T2PQXWJCB' # your Foursquare ID
CLIENT_SECRET = 'BPKSPLKTCDMQDJYMNNKDISUEFCZAUQVRFA4CE1JK33ZGJ33T' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: NO4T3FDTKKILCD5WRRWBPOEV1N1OIZXE2TXG3N0T2PQXWJCB
CLIENT_SECRET:BPKSPLKTCDMQDJYMNNKDISUEFCZAUQVRFA4CE1JK33ZGJ33T


#### 3.1. Get the top 100 venues that are within a radius of 2000 meters

In [20]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [22]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)
# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']
print(venues_df.shape)
venues_df.head(10)

(11003, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Palacio,40.41517,-3.71273,Zuccaru,40.417179,-3.711674,Ice Cream Shop
1,Palacio,40.41517,-3.71273,Santa Iglesia Catedral de Santa María la Real ...,40.415767,-3.714516,Church
2,Palacio,40.41517,-3.71273,Plaza de la Villa,40.415409,-3.710391,Historic Site
3,Palacio,40.41517,-3.71273,Plaza de la Almudena,40.41632,-3.713777,Plaza
4,Palacio,40.41517,-3.71273,la gastroteca de santiago,40.416639,-3.710944,Restaurant
5,Palacio,40.41517,-3.71273,Palacio Real de Madrid,40.41794,-3.714259,Palace
6,Palacio,40.41517,-3.71273,Plaza de Oriente,40.418326,-3.712196,Plaza
7,Palacio,40.41517,-3.71273,Teatro Real de Madrid,40.418226,-3.711064,Opera House
8,Palacio,40.41517,-3.71273,El Landó,40.4119,-3.715076,Spanish Restaurant
9,Palacio,40.41517,-3.71273,Mercado de San Miguel,40.415443,-3.708943,Market


In [24]:
venues_df.groupby(["NOMBRE"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
NOMBRE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abrantes,98,98,98,98,98,98
Acacias,100,100,100,100,100,100
Adelfas,100,100,100,100,100,100
Aeropuerto,26,26,26,26,26,26
Alameda de Osuna,100,100,100,100,100,100
...,...,...,...,...,...,...
"Villaverde Alto, Casco Hist�rico de Villaverde",31,31,31,31,31,31
Vinateros,75,75,75,75,75,75
Vista Alegre,100,100,100,100,100,100
Zof�o,100,100,100,100,100,100


In [25]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 276 uniques categories.


In [27]:
# print out the list of categories
venues_df['VenueCategory'].unique()

array(['Ice Cream Shop', 'Church', 'Historic Site', 'Plaza', 'Restaurant',
       'Palace', 'Opera House', 'Spanish Restaurant', 'Market',
       'Other Nightlife', 'Hotel', 'Café', 'Tapas Restaurant', 'Pie Shop',
       'Dumpling Restaurant', 'Pastry Shop', 'Garden', 'Park',
       'Food & Drink Shop', 'Peruvian Restaurant', 'Bookstore',
       'History Museum', 'Coffee Shop', 'Bar', 'American Restaurant',
       'Mediterranean Restaurant', 'Indie Movie Theater', 'Bistro',
       'Gourmet Shop', 'Hostel', 'Japanese Restaurant', 'Cosmetics Shop',
       'Vegetarian / Vegan Restaurant', 'Gym', 'Italian Restaurant',
       'Theater', 'Electronics Store', 'Miscellaneous Shop',
       'Mexican Restaurant', 'Seafood Restaurant', 'Monument / Landmark',
       'Cocktail Bar', 'Art Museum', 'Argentinian Restaurant',
       'Performing Arts Venue', 'Art Gallery', 'Pub', 'Beer Garden',
       'Circus', 'Event Space', 'Pizza Place', 'Wine Shop',
       'Liquor Store', 'Sushi Restaurant', 'Gymnast

In [28]:
# check if the results contain "Shopping Mall"
"Neighborhood" in venues_df['VenueCategory'].unique()

True

In [31]:
# one hot encoding
kl_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
kl_onehot['NOMBRE'] = venues_df['NOMBRE'] 

# move neighborhood column to the first column
fixed_columns = [kl_onehot.columns[-1]] + list(kl_onehot.columns[:-1])
kl_onehot = kl_onehot[fixed_columns]

print(kl_onehot.shape)
kl_onehot

(11039, 277)


Unnamed: 0,NOMBRE,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Aquarium,...,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Palacio,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Palacio,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Palacio,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Palacio,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Palacio,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11034,Corralejos,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
11035,Corralejos,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
11036,Corralejos,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
11037,Corralejos,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [33]:
kl_grouped = kl_onehot.groupby(["NOMBRE"]).mean().reset_index()

print(kl_grouped.shape)
kl_grouped

(130, 277)


Unnamed: 0,NOMBRE,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Aquarium,...,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Abrantes,0.00,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,...,0.00,0.0,0.0,0.0,0.00,0.00,0.0,0.0,0.0,0.0
1,Acacias,0.00,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,...,0.02,0.0,0.0,0.0,0.00,0.01,0.0,0.0,0.0,0.0
2,Adelfas,0.00,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,...,0.01,0.0,0.0,0.0,0.01,0.00,0.0,0.0,0.0,0.0
3,Aeropuerto,0.00,0.038462,0.0,0.0,0.192308,0.115385,0.0,0.000000,0.0,...,0.00,0.0,0.0,0.0,0.00,0.00,0.0,0.0,0.0,0.0
4,Alameda de Osuna,0.01,0.000000,0.0,0.0,0.000000,0.010000,0.0,0.000000,0.0,...,0.00,0.0,0.0,0.0,0.01,0.00,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
125,"Villaverde Alto, Casco Hist�rico de Villaverde",0.00,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,...,0.00,0.0,0.0,0.0,0.00,0.00,0.0,0.0,0.0,0.0
126,Vinateros,0.00,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.026667,0.0,...,0.00,0.0,0.0,0.0,0.00,0.00,0.0,0.0,0.0,0.0
127,Vista Alegre,0.00,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,...,0.00,0.0,0.0,0.0,0.00,0.00,0.0,0.0,0.0,0.0
128,Zof�o,0.01,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,...,0.00,0.0,0.0,0.0,0.00,0.00,0.0,0.0,0.0,0.0


In [109]:
kl_grouped.columns.tolist()

['NOMBRE',
 'Accessories Store',
 'Airport',
 'Airport Food Court',
 'Airport Gate',
 'Airport Lounge',
 'Airport Service',
 'Airport Terminal',
 'American Restaurant',
 'Aquarium',
 'Arcade',
 'Arepa Restaurant',
 'Argentinian Restaurant',
 'Art Gallery',
 'Art Museum',
 'Art Studio',
 'Asian Restaurant',
 'Athletics & Sports',
 'Auto Garage',
 'BBQ Joint',
 'Bakery',
 'Bar',
 'Basketball Court',
 'Basketball Stadium',
 'Beach',
 'Bed & Breakfast',
 'Beer Bar',
 'Beer Garden',
 'Beer Store',
 'Big Box Store',
 'Bike Trail',
 'Bistro',
 'Board Shop',
 'Boarding House',
 'Bookstore',
 'Boutique',
 'Bowling Alley',
 'Boxing Gym',
 'Brazilian Restaurant',
 'Breakfast Spot',
 'Brewery',
 'Bridge',
 'Bubble Tea Shop',
 'Buffet',
 'Building',
 'Burger Joint',
 'Burrito Place',
 'Bus Station',
 'Cable Car',
 'Cafeteria',
 'Café',
 'Camera Store',
 'Campground',
 'Candy Store',
 'Cheese Shop',
 'Chinese Restaurant',
 'Chocolate Shop',
 'Church',
 'Circus',
 'Clothing Store',
 'Cocktail Bar',
 

In [110]:
len(kl_grouped[kl_grouped["Hotel"] > 0])

96

In [111]:
kl_mall = kl_grouped[["NOMBRE","Hotel"]]
kl_mall

Unnamed: 0,NOMBRE,Hotel
0,Abrantes,0.010204
1,Acacias,0.020000
2,Adelfas,0.020000
3,Aeropuerto,0.038462
4,Alameda de Osuna,0.130000
...,...,...
125,"Villaverde Alto, Casco Hist�rico de Villaverde",0.000000
126,Vinateros,0.000000
127,Vista Alegre,0.010000
128,Zof�o,0.010000


In [132]:
# set number of clusters
kclusters = 2

kl_clustering = kl_mall.drop(["NOMBRE"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(kl_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 1, 0, 0, 1, 0, 1, 1, 0])

In [133]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
kl_merged = kl_mall.copy()

# add clustering labels
kl_merged["Cluster Labels"] = kmeans.labels_

In [134]:
kl_merged

Unnamed: 0,NOMBRE,Hotel,Cluster Labels
0,Abrantes,0.010204,1
1,Acacias,0.020000,1
2,Adelfas,0.020000,1
3,Aeropuerto,0.038462,0
4,Alameda de Osuna,0.130000,0
...,...,...,...
125,"Villaverde Alto, Casco Hist�rico de Villaverde",0.000000,1
126,Vinateros,0.000000,1
127,Vista Alegre,0.010000,1
128,Zof�o,0.010000,1


In [135]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
kl_merged = kl_merged.join(df.set_index("NOMBRE"), on="NOMBRE")

print(kl_merged.shape)
kl_merged.head() # check the last columns!
kl_merged = kl_merged.drop(['NOMDIS'], axis=1)
kl_merged

(130, 6)


Unnamed: 0,NOMBRE,Hotel,Cluster Labels,Latitude,Longitude
0,Abrantes,0.010204,1,40.379800,-3.726360
1,Acacias,0.020000,1,40.401370,-3.706690
2,Adelfas,0.020000,1,40.401730,-3.672880
3,Aeropuerto,0.038462,0,40.483370,-3.559490
4,Alameda de Osuna,0.130000,0,40.458180,-3.589530
...,...,...,...,...,...
125,"Villaverde Alto, Casco Hist�rico de Villaverde",0.000000,1,40.350000,-3.700000
126,Vinateros,0.000000,1,40.404440,-3.640290
127,Vista Alegre,0.010000,1,40.384920,-3.746350
128,Zof�o,0.010000,1,40.379870,-3.714950


In [136]:
# sort the results by Cluster Labels
print(kl_merged.shape)
kl_merged.sort_values(["Cluster Labels"], inplace=True)
kl_merged

(130, 5)


Unnamed: 0,NOMBRE,Hotel,Cluster Labels,Latitude,Longitude
64,Justicia,0.060000,0,40.424790,-3.693080
112,Sol,0.090000,0,40.418020,-3.705770
111,Simancas,0.060000,0,40.435770,-3.624880
36,Concepci�n,0.050000,0,40.439730,-3.648500
37,Corralejos,0.036145,0,40.465400,-3.611640
...,...,...,...,...,...
52,Fontarr�n,0.010526,1,40.403010,-3.649100
51,Estrella,0.030000,1,40.411170,-3.665930
50,Entrev�as,0.010101,1,40.379250,-3.672120
47,El Viso,0.020000,1,40.447460,-3.685430


In [138]:

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
print(colors_array)
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(kl_merged['Latitude'], kl_merged['Longitude'], kl_merged['NOMBRE'], kl_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

[[5.0000000e-01 0.0000000e+00 1.0000000e+00 1.0000000e+00]
 [1.0000000e+00 1.2246468e-16 6.1232340e-17 1.0000000e+00]]


In [139]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

In [140]:
kl_merged.loc[kl_merged['Cluster Labels'] == 0]

Unnamed: 0,NOMBRE,Hotel,Cluster Labels,Latitude,Longitude
64,Justicia,0.06,0,40.42479,-3.69308
112,Sol,0.09,0,40.41802,-3.70577
111,Simancas,0.06,0,40.43577,-3.62488
36,Concepci�n,0.05,0,40.43973,-3.6485
37,Corralejos,0.036145,0,40.4654,-3.61164
38,Cortes,0.08,0,40.41589,-3.69636
82,Palacio,0.07,0,40.41517,-3.71273
40,Cuatro Caminos,0.05,0,40.45297,-3.69727
109,San Pascual,0.06,0,40.4455,-3.65181
108,San Juan Bautista,0.06,0,40.45159,-3.65591


In [141]:
kl_merged.loc[kl_merged['Cluster Labels'] == 1]

Unnamed: 0,NOMBRE,Hotel,Cluster Labels,Latitude,Longitude
93,Pradolongo,0.000000,1,40.382970,-3.708650
123,Valverde,0.010000,1,40.499910,-3.686080
125,"Villaverde Alto, Casco Hist�rico de Villaverde",0.000000,1,40.350000,-3.700000
124,Ventas,0.010000,1,40.422380,-3.650200
88,Pe�agrande,0.014706,1,40.478390,-3.727350
...,...,...,...,...,...
52,Fontarr�n,0.010526,1,40.403010,-3.649100
51,Estrella,0.030000,1,40.411170,-3.665930
50,Entrev�as,0.010101,1,40.379250,-3.672120
47,El Viso,0.020000,1,40.447460,-3.685430


In [123]:
kl_merged.loc[kl_merged['Cluster Labels'] == 2]

Unnamed: 0,NOMBRE,Hotel,Cluster Labels,Latitude,Longitude
115,Universidad,0.08,2,40.42565,-3.70726
38,Cortes,0.08,2,40.41589,-3.69636
58,Hell�n,0.07,2,40.43164,-3.6155
29,Castilla,0.07,2,40.47131,-3.676
82,Palacio,0.07,2,40.41517,-3.71273
76,Nueva Espa�a,0.07,2,40.46412,-3.6798
63,Jer�nimos,0.09,2,40.41729,-3.69223
6,Almenara,0.07,2,40.47114,-3.69581
4,Alameda de Osuna,0.13,2,40.45818,-3.58953
112,Sol,0.09,2,40.41802,-3.70577
