# Capstone Project Notebook

## *Opening a new shopping mall in Madrid, Spain*

In this notebooks a data analysis procedure is performed to answer the following business question:

**In the city of Madrid, Spain, if a property developer is looking to open a new shopping mall, where would you recommend that they open it?**

The followed steps are the following:
1. Load and clean a dataframe of neighborhoods in Madrid, Spain downloaded from a public government data source.
2. Get the geographical coordinates (latitude and longitude) of the neighborhoods.
3. Obtain the venue data for the neighborhoods from Foursquare API.
4. Explore and cluster the neighborhoods.
5. Explore the clusters and select the best neighborhood to open a new Shopping Mall in Madrid, Spain.

-------------------------------------------------------------------------------------------------------------------------------------------------------------

### 0. Import the required libraries

In [1]:
#import required libraries
import numpy as np
import pandas as pd

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import folium #to plot a map

import requests #to make requests to the Foresquare API

from sklearn.cluster import KMeans #for clustering stage

import matplotlib.cm as cm
import matplotlib.colors as colors

print("Libraries imported!!")

Libraries imported!!


### 1. Load and clean the dataframe of neighborhoods in Madrid.

In [2]:
#read the dataframe
df = pd.read_csv('Data/madrid_neighborhoods.csv', sep=';', error_bad_lines=False)
df.head(10)

Unnamed: 0,OBJECTID_1,NOMDIS,NOMBRE,Shape_Leng,Shape_Area,COD_DIS,COD_DIS_TX,BARRIO_MAY,COD_DISBAR,COD_BAR,NUM_BAR,BARRIO_MT,COD_DISB
0,60,Centro,Palacio,5754822748,1469905684,1,1,PALACIO,11,11,1,PALACIO,1_1
1,50,Centro,Embajadores,4275227681,1033724698,1,1,EMBAJADORES,12,12,2,EMBAJADORES,1_2
2,55,Centro,Cortes,373107903,5918741219,1,1,CORTES,13,13,3,CORTES,1_3
3,64,Centro,Justicia,3597421427,739414338,1,1,JUSTICIA,14,14,4,JUSTICIA,1_4
4,66,Centro,Universidad,4060075813,9480270773,1,1,UNIVERSIDAD,15,15,5,UNIVERSIDAD,1_5
5,56,Centro,Sol,2719287883,4453008221,1,1,SOL,16,16,6,SOL,1_6
6,49,Arganzuela,Imperial,4557937642,967678602,2,2,IMPERIAL,21,21,1,IMPERIAL,2_1
7,40,Arganzuela,Acacias,3950326025,1073437937,2,2,ACACIAS,22,22,2,ACACIAS,2_2
8,31,Arganzuela,Chopera,3203407974,5677865291,2,2,CHOPERA,23,23,3,CHOPERA,2_3
9,24,Arganzuela,Legazpi,5141642671,1414470497,2,2,LEGAZPI,24,24,4,LEGAZPI,2_4


In [3]:
#remove unnecesary columns from the datarame
df = df.drop(['NOMDIS','OBJECTID_1', 'Shape_Leng','Shape_Area','COD_DIS','COD_DIS_TX','BARRIO_MAY','COD_DISBAR','COD_BAR','NUM_BAR','BARRIO_MT','COD_DISB'], axis=1)
#replace the NOMBRE columname
df = df.rename(columns={'NOMBRE': 'Neighborhood'})
df.head(10)

Unnamed: 0,Neighborhood
0,Palacio
1,Embajadores
2,Cortes
3,Justicia
4,Universidad
5,Sol
6,Imperial
7,Acacias
8,Chopera
9,Legazpi


In [4]:
#verify if there are nan values
df.isnull().values.any()

False

There are no nan values.

In [5]:
#print the shape of the dataset
df.shape

(131, 1)

### 2. Get the geographical coordinates of the neighborhoods

In [6]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Madrid, Spain'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [7]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in df["Neighborhood"].tolist() ]
#print the coordinates
coords

[[40.415170000000046, -3.712729999999965],
 [40.40803000000005, -3.7006699999999455],
 [40.41589000000005, -3.69635999999997],
 [40.42479000000003, -3.693079999999952],
 [40.425650000000076, -3.7072599999999625],
 [40.41802000000007, -3.7057699999999727],
 [40.408330000000035, -3.7186499999999683],
 [40.40137000000004, -3.7066899999999805],
 [40.39536000000004, -3.6983299999999417],
 [40.387020000000064, -3.689899999999966],
 [40.39613000000003, -3.68945999999994],
 [40.40301000000005, -3.6935799999999404],
 [40.400540000000035, -3.6839199999999437],
 [40.40191000000004, -3.676029999999969],
 [40.40173000000004, -3.6728799999999637],
 [40.41117000000003, -3.665929999999946],
 [40.417940000000044, -3.6762599999999566],
 [40.41729000000004, -3.692229999999938],
 [40.356811479732706, -3.7835577629141532],
 [40.42530000000005, -3.6865099999999416],
 [40.425470000000075, -3.6741799999999785],
 [40.42376000000007, -3.667539999999974],
 [40.43241000000006, -3.66599999999994],
 [40.43231000000

In [8]:
#create a temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
# merge the coordinates into the original dataframe
df['Latitude'] = df_coords['Latitude']
df['Longitude'] = df_coords['Longitude']
print(df.shape)
df.head(10)

(131, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Palacio,40.41517,-3.71273
1,Embajadores,40.40803,-3.70067
2,Cortes,40.41589,-3.69636
3,Justicia,40.42479,-3.69308
4,Universidad,40.42565,-3.70726
5,Sol,40.41802,-3.70577
6,Imperial,40.40833,-3.71865
7,Acacias,40.40137,-3.70669
8,Chopera,40.39536,-3.69833
9,Legazpi,40.38702,-3.6899


#### 2.1. Create a map of Madrid with neighborhoods superimposed on top.

In [9]:
# get the coordinates of madrid
address = 'Madrid, Spain'
geolocator = Nominatim(user_agent="http")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Madrid, Spain {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Madrid, Spain 40.4167047, -3.7035825.


In [10]:
# create map of Madrid using latitude and longitude values
map_madrid = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_madrid) 
#save the map as html file
map_madrid.save('map_neighborhoods_madrid.html')
#show the map
map_madrid

### 3. Obtain the venue data for the neighborhoods from Foursquare API

In [11]:
#define Foursquare Credentials and Version
CLIENT_ID = 'NO4T3FDTKKILCD5WRRWBPOEV1N1OIZXE2TXG3N0T2PQXWJCB' # your Foursquare ID
CLIENT_SECRET = 'BPKSPLKTCDMQDJYMNNKDISUEFCZAUQVRFA4CE1JK33ZGJ33T' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: NO4T3FDTKKILCD5WRRWBPOEV1N1OIZXE2TXG3N0T2PQXWJCB
CLIENT_SECRET:BPKSPLKTCDMQDJYMNNKDISUEFCZAUQVRFA4CE1JK33ZGJ33T


#### 3.1. Get the top 100 venues that are within a radius of 2000 meters

In [12]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [13]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)
# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']
print(venues_df.shape)
venues_df.head(10)

(11024, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Palacio,40.41517,-3.71273,Zuccaru,40.417179,-3.711674,Ice Cream Shop
1,Palacio,40.41517,-3.71273,Santa Iglesia Catedral de Santa María la Real ...,40.415767,-3.714516,Church
2,Palacio,40.41517,-3.71273,Plaza de la Villa,40.415409,-3.710391,Historic Site
3,Palacio,40.41517,-3.71273,Plaza de la Almudena,40.41632,-3.713777,Plaza
4,Palacio,40.41517,-3.71273,la gastroteca de santiago,40.416639,-3.710944,Restaurant
5,Palacio,40.41517,-3.71273,Palacio Real de Madrid,40.41794,-3.714259,Palace
6,Palacio,40.41517,-3.71273,Plaza de Oriente,40.418326,-3.712196,Plaza
7,Palacio,40.41517,-3.71273,Teatro Real de Madrid,40.418226,-3.711064,Opera House
8,Palacio,40.41517,-3.71273,El Landó,40.4119,-3.715076,Spanish Restaurant
9,Palacio,40.41517,-3.71273,Mercado de San Miguel,40.415443,-3.708943,Market


#### 3.2. Check how many venues have been returned for each neighborhood

In [14]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abrantes,99,99,99,99,99,99
Acacias,100,100,100,100,100,100
Adelfas,100,100,100,100,100,100
Aeropuerto,26,26,26,26,26,26
Alameda de Osuna,100,100,100,100,100,100
...,...,...,...,...,...,...
"Villaverde Alto, Casco Hist�rico de Villaverde",34,34,34,34,34,34
Vinateros,78,78,78,78,78,78
Vista Alegre,100,100,100,100,100,100
Zof�o,100,100,100,100,100,100


#### 3.3. Check how many unique categories has been obtained

In [16]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 278 uniques categories.


In [17]:
#print out the list of categories
venues_df['VenueCategory'].unique()

array(['Ice Cream Shop', 'Church', 'Historic Site', 'Plaza', 'Restaurant',
       'Palace', 'Opera House', 'Spanish Restaurant', 'Market',
       'Other Nightlife', 'Hotel', 'Café', 'Tapas Restaurant',
       'Dumpling Restaurant', 'Pastry Shop', 'Pie Shop', 'Garden', 'Park',
       'Peruvian Restaurant', 'Food & Drink Shop', 'Mexican Restaurant',
       'Bookstore', 'History Museum', 'Coffee Shop', 'Bar',
       'American Restaurant', 'Mediterranean Restaurant',
       'Indie Movie Theater', 'Chocolate Shop', 'Bistro', 'Gourmet Shop',
       'Hostel', 'Cosmetics Shop', 'Gym', 'Vegetarian / Vegan Restaurant',
       'Italian Restaurant', 'Theater', 'Electronics Store',
       'Miscellaneous Shop', 'Monument / Landmark', 'Cocktail Bar',
       'Performing Arts Venue', 'Art Museum', 'Argentinian Restaurant',
       'Art Gallery', 'Seafood Restaurant', 'Circus', 'Event Space',
       'Pizza Place', 'Wine Shop', 'Liquor Store', 'Sushi Restaurant',
       'Gymnastics Gym', 'Movie Theater', 

In [18]:
# check if the results contain "Shopping Mall"
"Neighborhood" in venues_df['VenueCategory'].unique()

True

### 4. Analyse and cluster the neighborhoods

#### 4.1. Analyse each neighborhood

In [31]:
#one hot encoding the venue categories
madrid_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
madrid_onehot['Neighborhood'] = venues_df['Neighborhood'] 

#group rows by neighborhood and take the mean of the frecuency of occurrence of each category
madrid_grouped = madrid_onehot.groupby(["Neighborhood"]).mean().reset_index()

print(madrid_grouped.shape)
madrid_grouped.head(10)

(130, 278)


Unnamed: 0,Neighborhood,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Aquarium,...,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Abrantes,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0
1,Acacias,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0
2,Adelfas,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0
3,Aeropuerto,0.0,0.038462,0.0,0.0,0.192308,0.115385,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Alameda de Osuna,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0
5,Almagro,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Almenara,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Almendrales,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0
8,Aluche,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04
9,Amposta,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### 4.2. Create a dataframe for Shopping Mall data only

In [33]:
madrid_mall = madrid_grouped[["Neighborhood","Shopping Mall"]]
madrid_mall.head(10)

Unnamed: 0,Neighborhood,Shopping Mall
0,Abrantes,0.0
1,Acacias,0.0
2,Adelfas,0.0
3,Aeropuerto,0.0
4,Alameda de Osuna,0.01
5,Almagro,0.0
6,Almenara,0.01
7,Almendrales,0.01
8,Aluche,0.01
9,Amposta,0.0


In [34]:
#get number of neighborhoods with a frecuency of occurrence for a Shopping Mall higher than 0
len(madrid_mall[madrid_mall["Shopping Mall"] > 0])

42

#### 4.3. Cluster the neighborhoods

##### Run k-means to cluster the neighborhoods in Madrid into 3 clusters

In [37]:
#set number of clusters
kclusters = 3

#delete neighborhood column
madrid_clustering = madrid_mall.drop(["Neighborhood"], 1)

#run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(madrid_clustering)

#check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 2,
       1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 2, 1, 1,
       1, 1, 2, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0,
       0, 1, 1, 0, 0, 0, 1, 0, 0, 2, 1, 1, 2, 0, 1, 1, 1, 0, 0, 1, 1, 1,
       0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1,
       0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1])

In [38]:
#create a new dataframe that includes the cluster label for each neighborhood.
madrid_merged = madrid_mall.copy()

#add clustering labels
madrid_merged["Cluster Labels"] = kmeans.labels_
madrid_merged.head(10)

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,Abrantes,0.0,1
1,Acacias,0.0,1
2,Adelfas,0.0,1
3,Aeropuerto,0.0,1
4,Alameda de Osuna,0.01,0
5,Almagro,0.0,1
6,Almenara,0.01,0
7,Almendrales,0.01,0
8,Aluche,0.01,0
9,Amposta,0.0,1


In [39]:
#add latitude/longitude for each neighborhood
madrid_merged = madrid_merged.join(df.set_index("Neighborhood"), on="Neighborhood")

print(madrid_merged.shape)
madrid_merged.head(10) 

(130, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Abrantes,0.0,1,40.3798,-3.72636
1,Acacias,0.0,1,40.40137,-3.70669
2,Adelfas,0.0,1,40.40173,-3.67288
3,Aeropuerto,0.0,1,40.48337,-3.55949
4,Alameda de Osuna,0.01,0,40.45818,-3.58953
5,Almagro,0.0,1,40.43296,-3.69153
6,Almenara,0.01,0,40.47114,-3.69581
7,Almendrales,0.01,0,40.38431,-3.69992
8,Aluche,0.01,0,40.39271,-3.76032
9,Amposta,0.0,1,40.42643,-3.62131


In [40]:
# sort the results by Cluster Labels
print(madrid_merged.shape)
madrid_merged.sort_values(["Cluster Labels"], inplace=True)
madrid_merged

(130, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
37,Corralejos,0.012048,0,40.465400,-3.611640
107,San Isidro,0.010000,0,40.395880,-3.730480
31,Chopera,0.010000,0,40.395360,-3.698330
35,Comillas,0.010000,0,40.394950,-3.709760
73,Mirasierra,0.010000,0,40.494270,-3.717690
...,...,...,...,...,...
46,El Plant�o,0.033333,2,40.472885,-3.827385
41,Cuatro Vientos,0.045455,2,40.365080,-3.775300
78,Opa�el,0.023810,2,40.389150,-3.723750
21,Campamento,0.024691,2,40.396660,-3.774080


##### Visualize the resulting cluster in a map

In [43]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(madrid_merged['Latitude'], madrid_merged['Longitude'], madrid_merged['Neighborhood'], madrid_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
#save the map as HTML file
map_clusters.save('map_clusters.html')
#show the map
map_clusters

### 5. Examine the clusters and select the best neighborhood to open a new Shopping Mall in Madrid, Spain

#### 5.1. Examine the clusters

##### Cluster 0

In [46]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
37,Corralejos,0.012048,0,40.4654,-3.61164
107,San Isidro,0.01,0,40.39588,-3.73048
31,Chopera,0.01,0,40.39536,-3.69833
35,Comillas,0.01,0,40.39495,-3.70976
73,Mirasierra,0.01,0,40.49427,-3.71769
100,Rejas,0.011765,0,40.44629,-3.57489
96,Puerta Bonita,0.01,0,40.37982,-3.74046
93,Pradolongo,0.01,0,40.38297,-3.70865
92,Portazgo,0.01,0,40.39087,-3.6481
49,Ensanche de Vallecas,0.012195,0,40.37896,-3.61384


This cluster is composed of neighborhoods with moderate number of shopping malls. 

##### Cluster 1

In [49]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
87,Pavones,0.0,1,40.40004,-3.63300
86,Palos de Moguer,0.0,1,40.40301,-3.69358
85,Palomeras Sureste,0.0,1,40.38537,-3.63530
122,Vallehermoso,0.0,1,40.44480,-3.71483
123,Valverde,0.0,1,40.49991,-3.68608
...,...,...,...,...,...
45,El Goloso,0.0,1,40.54357,-3.69662
44,El Ca�averal,0.0,1,40.40728,-3.56454
43,Delicias,0.0,1,40.39613,-3.68946
42,C�rmenes,0.0,1,40.40280,-3.73178


This cluster is composed of neighborhoods with no existence of Shopping Malls.

##### Cluster 2

In [48]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
46,El Plant�o,0.033333,2,40.472885,-3.827385
41,Cuatro Vientos,0.045455,2,40.36508,-3.7753
78,Opa�el,0.02381,2,40.38915,-3.72375
21,Campamento,0.024691,2,40.39666,-3.77408
75,Ni�o Jes�s,0.041667,2,40.356811,-3.783558


This cluster is composed of neighborhoods with high concentration of Shopping Malls.

#### 5.2. Select the best neighborhood to open a new Shopping Mall in Madrid, Spain

The cluster results shows that neighborhoods in Cluster 1 represents a great opportunity and high potential areas to open new shopping malls as there is no competition from existing malls. Meanwhile, Shopping Malls in Cluster 2 are likely suffering from intense competition due to oversupply and high concentration of shopping malls. Property developers with unique selling propositions to stand out from the competition can also open new shopping malls in neighborhoods in Cluster 0 with moderate competition.

##### Analyse the ideal neighborhoods to open a new Shopping Mall in Madrid, Spain

In [54]:
#select neighborhoods of Cluster 1
potential_neighborhoods = madrid_merged.loc[madrid_merged['Cluster Labels'] == 1]

#count the number of potential neighborhoods
print('There are ', potential_neighborhoods.shape[0], ' potential neighborhoods to open a new Shopping Mall in Madrid.')

There are  88  potential neighborhoods to open a new Shopping Mall in Madrid.


In [55]:
#get a list of the potential neighborhoods
print('The potential neighborhoods to open a new Shopping Mall in Madrid are the following: ')
for n in potential_neighborhoods['Neighborhood'].tolist() :
    print('- ', n)

The potential neighborhoods to open a new Shopping Mall in Madrid are the following: 
-  Pavones
-  Palos de Moguer
-  Palomeras Sureste
-  Vallehermoso
-  Valverde
-  Sol
-  Pac�fico
-  Orcasur
-  Ventas
-  Numancia
-  Villaverde Alto, Casco Hist�rico de Villaverde
-  Vinateros
-  Nueva Espa�a
-  Palacio
-  Pinar del Rey
-  Valderrivas
-  Simancas
-  San Pascual
-  San Juan Bautista
-  Tim�n
-  Trafalgar
-  San Ferm�n
-  San Diego
-  San Crist�bal
-  Salvador
-  Rosas
-  Rios Rosas
-  Universidad
-  Recoletos
-  Quintana
-  Puerta del �ngel
-  Valdeacederas
-  Pueblo Nuevo
-  Prosperidad
-  Valdebernardo
-  Valdemar�n
-  Piovera
-  Abrantes
-  Justicia
-  Media Legua
-  Ciudad Universitaria
-  Ciudad Jard�n
-  Castillejos
-  Castilla
-  Castellana
-  Casco Hist�rico de Vic�lvaro
-  Casco Hist�rico de Barajas
-  Casa de Campo
-  Canillas
-  Berruguete
-  Colina
-  Bellas Vistas
-  Atalaya
-  Arg�elles
-  Arcos
-  Arapiles
-  Apostol Santiago
-  Amposta
-  Almagro
-  Aeropuerto
-  Adelf

In [56]:
#visualize a map with the potential neighborhoods to open a Shopping Mall in Madrid.
map_madrid = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(potential_neighborhoods['Latitude'], potential_neighborhoods['Longitude'], potential_neighborhoods['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_madrid) 
#save the map as html file
map_madrid.save('map_potential_neighborhoods_madrid.html')
#show the map
map_madrid

From this map it can be concluded that the best neighborhoods to open a Shopping Mall in Madrid are those located in the city centre, instead of in the suburns.