# Capstone Project - The Battle of Neighborhoods

### Retrieve the list of Barrios from Wikipedia and create a dataframe

Scrape the Wiki page using BeautifulSoup and turn the HTML code into an array. <br>
Make sure the array has all info. <br><br>
Turn the array of words into a DataFrame and remove extra columns.

In [1]:
import requests
from bs4 import BeautifulSoup

# Use BeautifulSoup to extract the html code of the page
wiki_url = requests.get('https://es.wikipedia.org/wiki/Anexo:Barrios_administrativos_de_Madrid').text
soup = BeautifulSoup(wiki_url,'html.parser')

# Find the code of the main table
My_table = soup.find('table',{'class':'wikitable sortable'})

In [2]:
# Turn the table contents into an array
words = []

for items in My_table.find_all("tr"):
    data = [' '.join(item.text.split()) for item in items.find_all(['th','td'])]
    words.append(data)

# Due to the table formatting, only the first row of each district has district
words[0:5]

[['Distrito', 'Número', 'Nombre', 'Superficie (km²)[2]\u200b', 'Imagen'],
 ['Centro', '11', 'Palacio', '1,471 km²', ''],
 ['12', 'Embajadores', '1,032 km²', ''],
 ['13', 'Cortes', '0,592 km²', ''],
 ['14', 'Justicia', '0,742 km²', '']]

In [3]:
# Copy respective district into all rows
distr = None

for word in words:
    if len(word) == 5:
        distr = word[0]
    else:
        word.insert(0, distr)

words[0:5]

[['Distrito', 'Número', 'Nombre', 'Superficie (km²)[2]\u200b', 'Imagen'],
 ['Centro', '11', 'Palacio', '1,471 km²', ''],
 ['Centro', '12', 'Embajadores', '1,032 km²', ''],
 ['Centro', '13', 'Cortes', '0,592 km²', ''],
 ['Centro', '14', 'Justicia', '0,742 km²', '']]

In [4]:
# Turn the array into a Dataframe
import pandas as pd
from pandas import DataFrame

barrios_madrid_surface = DataFrame.from_records(words[1:], columns=words[0])
barrios_madrid_surface.drop(barrios_madrid_surface.columns[[1,4]], axis = 1, inplace = True)
barrios_madrid_surface.head()

Unnamed: 0,Distrito,Nombre,Superficie (km²)[2]​
0,Centro,Palacio,"1,471 km²"
1,Centro,Embajadores,"1,032 km²"
2,Centro,Cortes,"0,592 km²"
3,Centro,Justicia,"0,742 km²"
4,Centro,Universidad,"0,947 km²"


### Select only from desired Districts

Let's limit our search by filtering the central Madrid districts.

In [5]:
# Filter data for the desired districts
desired = ['Centro', 'Retiro', 'Salamanca', 'Chamartín']

barrios_filtered = barrios_madrid_surface[barrios_madrid_surface['Distrito'].isin(desired)].reset_index(drop=True)
barrios_filtered.columns = ['Distrito', 'Barrio', 'Superficie']
barrios_filtered.head()

Unnamed: 0,Distrito,Barrio,Superficie
0,Centro,Palacio,"1,471 km²"
1,Centro,Embajadores,"1,032 km²"
2,Centro,Cortes,"0,592 km²"
3,Centro,Justicia,"0,742 km²"
4,Centro,Universidad,"0,947 km²"


### Get coordinates for each District

Let's use geopy to convert the address of each Barrio into geographic coordinates.

In [6]:
# Get the geo location for each Barrio
'''
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

latitude = []
longitude = []

for row in barrios_filtered.iterrows():

    location = None
    while (location is None):
            address = str(barrios_filtered['Nombre'] + ', ' + barrios_filtered['Distrito'] + ', Madrid, Spain')
            geolocator = Nominatim()
            location = geolocator.geocode(address)
    
    latitude.append(location.latitude)
    longitude.append(location.longitude)


barrios_filtered['latitude'] = latitude
barrios_filtered['longitude'] = longitude
barrios_filtered.head()
'''

# Since Geopy has a limit of requests, I prepared a file with the coordinates
import pandas as pd

barrios_filtered = pd.read_excel (r'https://github.com/filipe-afonso-carvalho/coursera-capstone/blob/master/madrid%20lat%20long.xlsx?raw=true')
barrios_filtered.drop(['Compose'], axis=1, inplace=True)
barrios_filtered.head()

Unnamed: 0,Barrio,Distrito,Superficie,Latitude,Longitude
0,Palacio,Centro,1.471,40.415129,-3.715618
1,Embajadores,Centro,1.032,40.409681,-3.701644
2,Cortes,Centro,0.592,40.414348,-3.698525
3,Justicia,Centro,0.742,40.423957,-3.695747
4,Universidad,Centro,0.947,40.425264,-3.706606


### Display in a Map

We create a Folium map centered in Madrid, with a market in the location of each Barrio.

In [7]:
# Convert an address into latitude and longitude values
from geopy.geocoders import Nominatim

address = 'Madrid, Spain'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# Create map of Madrid using latitude and longitude values
!conda install -c conda-forge folium=0.5.0 --yes
import folium

map_madrid = folium.Map(location=[latitude, longitude], zoom_start=11)

# Add markers to map
for lat, lng, label in zip(barrios_filtered['Latitude'], barrios_filtered['Longitude'], barrios_filtered['Barrio']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_madrid)  
    
map_madrid

Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
folium                    0.5.0                      py_0    conda-forge


### Get Restaurants for each Barrio

Using Foursquare API, we will retrieve 100 food venues closest to each Barrio geolocation.

In [8]:
# Foursquare ID data
CLIENT_ID = '0H2GCZS5CVBQOAFUQPUXBQ0PBUHD3ZC252VL0HRMSI4RBF3I' # your Foursquare ID
CLIENT_SECRET = 'TNRWNTSWKG1JMNVQ3UMOR1IVMAMIJHU4OQ0DSEFK5S2VZHIO' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
category = '4d4b7105d754a06374d81259' # Food category

# Create function to get venues of a desired category in a radius of a geolocation
def getNearbyRestaurants(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            category)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_restaurants = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_restaurants.columns = ['Barrio', 
                  'Barrio Latitude', 
                  'Barrio Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_restaurants)

In [9]:
# Run the function above for desired barrios
madrid_restaurants = getNearbyRestaurants(names=barrios_filtered['Barrio'],
                                   latitudes=barrios_filtered['Latitude'],
                                   longitudes=barrios_filtered['Longitude']
                                  )

Palacio
Embajadores
Cortes
Justicia
Universidad
Sol
Pacífico
Adelfas
Estrella
Ibiza
Jerónimos
Niño Jesús
Recoletos
Goya
Fuente del Berro
Guindalera
Lista
Castellana
El Viso
Prosperidad
Ciudad Jardín
Hispanoamérica
Nueva España
Castilla


In [10]:
# Check venue data for each barrio
print(madrid_restaurants.shape)
madrid_restaurants.head()

(1301, 7)


Unnamed: 0,Barrio,Barrio Latitude,Barrio Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Palacio,40.415129,-3.715618,Taberna Rayuela,40.413179,-3.713496,Tapas Restaurant
1,Palacio,40.415129,-3.715618,El Landó,40.4119,-3.715076,Spanish Restaurant
2,Palacio,40.415129,-3.715618,Charlie Champagne,40.413936,-3.712647,Restaurant
3,Palacio,40.415129,-3.715618,Pizzeria Mayor,40.412789,-3.717474,Pizza Place
4,Palacio,40.415129,-3.715618,Taquería del Alamillo,40.413697,-3.712549,Mexican Restaurant


### Get Restaurant frequency by Barrio

Let's use one-hot encoding and group the venues by Barrio and category. We can get the frequency of each category in each barrio. <br>
We will then calculate the number of restaurants per km2. <br><br>
Finally, we will create a new dataframe with the 20 most common venues per Barrio.

In [11]:
# One hot encoding
madrid_onehot = pd.get_dummies(madrid_restaurants[['Venue Category']], prefix="", prefix_sep="")

# Add neighborhood column back to dataframe
madrid_onehot['Barrio'] = madrid_restaurants['Barrio'] 

# Move neighborhood column to the first column
fixed_columns = [madrid_onehot.columns[-1]] + list(madrid_onehot.columns[:-1])
madrid_onehot = madrid_onehot[fixed_columns]

# Add area column
madrid_onehot = madrid_onehot.join(barrios_filtered.set_index('Barrio'), on='Barrio').drop(['Distrito', 'Latitude', 'Longitude'], axis=1)

madrid_onehot.head()

Unnamed: 0,Barrio,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bistro,...,Swiss Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Superficie
0,Palacio,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,1.471
1,Palacio,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1.471
2,Palacio,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1.471
3,Palacio,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1.471
4,Palacio,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1.471


In [12]:
# Add column to calculate restaurant per km2
madrid_onehot['Rest km2'] = madrid_onehot.groupby(['Barrio'])['Barrio'].transform('count') / madrid_onehot['Superficie']

# Drop area column
madrid_onehot.drop(['Superficie'], axis=1, inplace=True)

madrid_onehot.head()

Unnamed: 0,Barrio,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bistro,...,Swiss Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Rest km2
0,Palacio,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,23.113528
1,Palacio,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,23.113528
2,Palacio,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,23.113528
3,Palacio,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,23.113528
4,Palacio,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,23.113528


In [13]:
# Normalize one-hot columns
madrid_grouped = madrid_onehot.groupby('Barrio').mean().reset_index()

# Normalize restaurant per km2 column
from sklearn import preprocessing
x = madrid_grouped[['Rest km2']]
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
madrid_grouped['Rest km2'] = x_scaled

# Multiply restaurant per km2 column to give it 50% weight
# madrid_grouped['Rest km2'] = madrid_grouped['Rest km2'] * 76

madrid_grouped.head()

Unnamed: 0,Barrio,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bistro,...,Swiss Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Rest km2
0,Adelfas,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.083333,0.0,...,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.159952
1,Castellana,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.02,0.04,...,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.572153
2,Castilla,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Ciudad Jardín,0.0,0.030303,0.0,0.0,0.0,0.030303,0.0,0.030303,0.0,...,0.0,0.0,0.151515,0.0,0.030303,0.0,0.0,0.0,0.0,0.186009
4,Cortes,0.0,0.0,0.0,0.03,0.01,0.02,0.0,0.0,0.0,...,0.0,0.0,0.13,0.0,0.0,0.0,0.0,0.01,0.01,0.749626


In [14]:
import numpy as np

# Create function to return most common venues
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_restaurants = 20

indicators = ['st', 'nd', 'rd']

# Create columns according to number of top venues
columns = ['Barrio']
for ind in np.arange(num_top_restaurants):
    try:
        columns.append('{}{} Most Common Restaurant'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Restaurant'.format(ind+1))

# Create a new dataframe (excluding restaurant per km2 column)
barrios_restaurants_sorted = pd.DataFrame(columns=columns)
barrios_restaurants_sorted['Barrio'] = madrid_grouped['Barrio']

for ind in np.arange(madrid_grouped.shape[0]):
    barrios_restaurants_sorted.iloc[ind, 1:] = return_most_common_venues(madrid_grouped.drop('Rest km2', axis=1).iloc[ind, :], num_top_restaurants)

barrios_restaurants_sorted.head()

Unnamed: 0,Barrio,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant,6th Most Common Restaurant,7th Most Common Restaurant,8th Most Common Restaurant,9th Most Common Restaurant,...,11th Most Common Restaurant,12th Most Common Restaurant,13th Most Common Restaurant,14th Most Common Restaurant,15th Most Common Restaurant,16th Most Common Restaurant,17th Most Common Restaurant,18th Most Common Restaurant,19th Most Common Restaurant,20th Most Common Restaurant
0,Adelfas,Spanish Restaurant,Café,Diner,Asian Restaurant,Tapas Restaurant,Fast Food Restaurant,Bakery,Sandwich Place,Peruvian Restaurant,...,Food,Restaurant,Breakfast Spot,Cuban Restaurant,Deli / Bodega,Ethiopian Restaurant,Donut Shop,Dumpling Restaurant,Comfort Food Restaurant,Falafel Restaurant
1,Castellana,Spanish Restaurant,Restaurant,Tapas Restaurant,Mediterranean Restaurant,Gastropub,Burger Joint,Café,Bistro,Italian Restaurant,...,Deli / Bodega,Diner,Burrito Place,Breakfast Spot,Salad Place,Japanese Restaurant,Bakery,Bagel Shop,French Restaurant,Argentinian Restaurant
2,Castilla,Chinese Restaurant,Café,Bakery,Pizza Place,Vietnamese Restaurant,Falafel Restaurant,Deli / Bodega,Diner,Donut Shop,...,Ethiopian Restaurant,Fast Food Restaurant,Food,Food Court,Food Truck,French Restaurant,Gastropub,Greek Restaurant,Grilled Meat Restaurant,Cuban Restaurant
3,Ciudad Jardín,Spanish Restaurant,Restaurant,Tapas Restaurant,Café,Pizza Place,Diner,Gastropub,Middle Eastern Restaurant,Mediterranean Restaurant,...,Italian Restaurant,Bakery,BBQ Joint,Food,Theme Restaurant,American Restaurant,Cuban Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant
4,Cortes,Spanish Restaurant,Tapas Restaurant,Restaurant,Mediterranean Restaurant,Café,Japanese Restaurant,Pizza Place,Argentinian Restaurant,Deli / Bodega,...,Seafood Restaurant,Cuban Restaurant,Sushi Restaurant,BBQ Joint,Mexican Restaurant,Italian Restaurant,Molecular Gastronomy Restaurant,Indian Restaurant,Venezuelan Restaurant,Modern European Restaurant


### Cluster Neighborhoods

Let's use K-means to cluster our neighborhoods according to the typology and concentration of restaurants.

In [15]:
from sklearn.cluster import KMeans

# Set number of clusters
kclusters = 3

madrid_grouped_clustering = madrid_grouped.drop('Barrio', 1)

# Run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(madrid_grouped_clustering)

# Check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 2, 1, 0, 1, 1, 1, 1, 0], dtype=int32)

In [16]:
madrid_merged = barrios_filtered
madrid_merged.sort_values(by='Barrio', ascending=True ,inplace=True)
madrid_merged.reset_index(drop=True, inplace=True)

# Add clustering labels
madrid_merged['Cluster Labels'] = kmeans.labels_

# Merge barrios_restaurants_sorted with madrid_merged to add Area, latitude and longitude for each neighborhood
madrid_merged = madrid_merged.join(barrios_restaurants_sorted.set_index('Barrio'), on='Barrio')
madrid_onehot_area = madrid_onehot[['Barrio', 'Rest km2']].drop_duplicates()
madrid_merged = madrid_merged.join(madrid_onehot_area.set_index('Barrio'), on='Barrio')

madrid_merged.head() # check the last columns!

Unnamed: 0,Barrio,Distrito,Superficie,Latitude,Longitude,Cluster Labels,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,...,12th Most Common Restaurant,13th Most Common Restaurant,14th Most Common Restaurant,15th Most Common Restaurant,16th Most Common Restaurant,17th Most Common Restaurant,18th Most Common Restaurant,19th Most Common Restaurant,20th Most Common Restaurant,Rest km2
0,Adelfas,Retiro,0.64,40.40028,-3.671767,1,Spanish Restaurant,Café,Diner,Asian Restaurant,...,Restaurant,Breakfast Spot,Cuban Restaurant,Deli / Bodega,Ethiopian Restaurant,Donut Shop,Dumpling Restaurant,Comfort Food Restaurant,Falafel Restaurant,37.5
1,Castellana,Salamanca,0.773,40.433823,-3.684004,0,Spanish Restaurant,Restaurant,Tapas Restaurant,Mediterranean Restaurant,...,Diner,Burrito Place,Breakfast Spot,Salad Place,Japanese Restaurant,Bakery,Bagel Shop,French Restaurant,Argentinian Restaurant,129.366106
2,Castilla,Chamartín,2.16,40.475094,-3.696343,2,Chinese Restaurant,Café,Bakery,Pizza Place,...,Fast Food Restaurant,Food,Food Court,Food Truck,French Restaurant,Gastropub,Greek Restaurant,Grilled Meat Restaurant,Cuban Restaurant,1.851852
3,Ciudad Jardín,Chamartín,0.762,40.44695,-3.680483,1,Spanish Restaurant,Restaurant,Tapas Restaurant,Café,...,Bakery,BBQ Joint,Food,Theme Restaurant,American Restaurant,Cuban Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,43.307087
4,Cortes,Centro,0.592,40.414348,-3.698525,0,Spanish Restaurant,Tapas Restaurant,Restaurant,Mediterranean Restaurant,...,Cuban Restaurant,Sushi Restaurant,BBQ Joint,Mexican Restaurant,Italian Restaurant,Molecular Gastronomy Restaurant,Indian Restaurant,Venezuelan Restaurant,Modern European Restaurant,168.918919


### Create Map of Clusters

We will display our cluster in a map, with a color code and useful information in each marker.

In [17]:
import matplotlib.cm as cm
import matplotlib.colors as colors

# Create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# Set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add markers to the map
markers_colors = []
for lat, lon, poi, cluster, fir, sec, thi, fou, fif, km2 in zip(madrid_merged['Latitude'], madrid_merged['Longitude'], madrid_merged['Barrio'], madrid_merged['Cluster Labels'], \
                                                                madrid_merged['1st Most Common Restaurant'], madrid_merged['2nd Most Common Restaurant'], madrid_merged['3rd Most Common Restaurant'], \
                                                                madrid_merged['4th Most Common Restaurant'], madrid_merged['5th Most Common Restaurant'], madrid_merged['Rest km2']):
    label = folium.Popup((
        "<b>Barrio: {barrio}</b><br>"
        "Cluster: {cluster}<br>"
        "Restaurants per km2: {rest}<br>"
        "1st Most Common Restaurant: {first}<br>"
        "2nd Most Common Restaurant: {second}<br>"
        "3rd Most Common Restaurant: {third}<br>"
        "4th Most Common Restaurant: {fourth}<br>"
        "5th Most Common Restaurant: {fifth}<br>"
        ).format(barrio=str(poi), cluster=str(cluster), first=str(fir), second=str(sec), third=str(thi), fourth=str(fou), fifth=str(fif), rest=str(round(km2,2))))
        #str(poi) + ' Cluster ' + str(cluster),
        #arse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Cluster Analysis

The central Madrid cluster seems to be characterized by a high concentration of restaurants. The typology seems pretty general places. A good way to refine this analysis, would be adding price and socieconomic data to our features.

In [18]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 0, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]

Unnamed: 0,Distrito,Cluster Labels,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant,6th Most Common Restaurant,7th Most Common Restaurant,8th Most Common Restaurant,...,12th Most Common Restaurant,13th Most Common Restaurant,14th Most Common Restaurant,15th Most Common Restaurant,16th Most Common Restaurant,17th Most Common Restaurant,18th Most Common Restaurant,19th Most Common Restaurant,20th Most Common Restaurant,Rest km2
1,Salamanca,0,Spanish Restaurant,Restaurant,Tapas Restaurant,Mediterranean Restaurant,Gastropub,Burger Joint,Café,Bistro,...,Diner,Burrito Place,Breakfast Spot,Salad Place,Japanese Restaurant,Bakery,Bagel Shop,French Restaurant,Argentinian Restaurant,129.366106
4,Centro,0,Spanish Restaurant,Tapas Restaurant,Restaurant,Mediterranean Restaurant,Café,Japanese Restaurant,Pizza Place,Argentinian Restaurant,...,Cuban Restaurant,Sushi Restaurant,BBQ Joint,Mexican Restaurant,Italian Restaurant,Molecular Gastronomy Restaurant,Indian Restaurant,Venezuelan Restaurant,Modern European Restaurant,168.918919
9,Salamanca,0,Spanish Restaurant,Tapas Restaurant,Restaurant,Japanese Restaurant,Italian Restaurant,Café,Bakery,Snack Place,...,Burger Joint,Thai Restaurant,Steakhouse,Mediterranean Restaurant,American Restaurant,Mexican Restaurant,Deli / Bodega,Indian Restaurant,North Indian Restaurant,129.701686
12,Retiro,0,Spanish Restaurant,Tapas Restaurant,Italian Restaurant,Seafood Restaurant,Café,Restaurant,Gastropub,Indian Restaurant,...,Falafel Restaurant,Burger Joint,Sandwich Place,Pizza Place,Breakfast Spot,Vegetarian / Vegan Restaurant,Argentinian Restaurant,Brazilian Restaurant,Food Truck,114.285714
14,Centro,0,Spanish Restaurant,Restaurant,Bakery,Italian Restaurant,Gastropub,Tapas Restaurant,Asian Restaurant,Mediterranean Restaurant,...,Vegetarian / Vegan Restaurant,American Restaurant,Burger Joint,Japanese Restaurant,Mexican Restaurant,Moroccan Restaurant,Café,Latin American Restaurant,Indian Restaurant,134.770889
15,Salamanca,0,Spanish Restaurant,Restaurant,Tapas Restaurant,Seafood Restaurant,Chinese Restaurant,Japanese Restaurant,Bakery,Mediterranean Restaurant,...,Mexican Restaurant,Greek Restaurant,Indian Restaurant,Paella Restaurant,Italian Restaurant,Breakfast Spot,Ramen Restaurant,Diner,South American Restaurant,121.153846
21,Salamanca,0,Restaurant,Spanish Restaurant,Japanese Restaurant,Italian Restaurant,Mexican Restaurant,Bakery,Tapas Restaurant,Café,...,Mediterranean Restaurant,Seafood Restaurant,Breakfast Spot,Salad Place,Diner,Moroccan Restaurant,Sandwich Place,Indian Restaurant,Thai Restaurant,114.942529
22,Centro,0,Spanish Restaurant,Tapas Restaurant,Restaurant,Italian Restaurant,Argentinian Restaurant,Bistro,Mexican Restaurant,Mediterranean Restaurant,...,Burger Joint,Sandwich Place,Café,Donut Shop,Noodle House,Comfort Food Restaurant,Creperie,Indian Restaurant,Cuban Restaurant,224.719101
23,Centro,0,Tapas Restaurant,Spanish Restaurant,Café,Gastropub,Restaurant,Italian Restaurant,Mediterranean Restaurant,Pizza Place,...,Vegetarian / Vegan Restaurant,Breakfast Spot,Peruvian Restaurant,Chinese Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Sandwich Place,Seafood Restaurant,Indian Restaurant,105.596621


The outer Madrid cluster is characterized by a medium concentration of restaurants. We can find a lot of ethnic food places there.

In [19]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 1, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]

Unnamed: 0,Distrito,Cluster Labels,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant,6th Most Common Restaurant,7th Most Common Restaurant,8th Most Common Restaurant,...,12th Most Common Restaurant,13th Most Common Restaurant,14th Most Common Restaurant,15th Most Common Restaurant,16th Most Common Restaurant,17th Most Common Restaurant,18th Most Common Restaurant,19th Most Common Restaurant,20th Most Common Restaurant,Rest km2
0,Retiro,1,Spanish Restaurant,Café,Diner,Asian Restaurant,Tapas Restaurant,Fast Food Restaurant,Bakery,Sandwich Place,...,Restaurant,Breakfast Spot,Cuban Restaurant,Deli / Bodega,Ethiopian Restaurant,Donut Shop,Dumpling Restaurant,Comfort Food Restaurant,Falafel Restaurant,37.5
3,Chamartín,1,Spanish Restaurant,Restaurant,Tapas Restaurant,Café,Pizza Place,Diner,Gastropub,Middle Eastern Restaurant,...,Bakery,BBQ Joint,Food,Theme Restaurant,American Restaurant,Cuban Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,43.307087
5,Chamartín,1,Restaurant,Italian Restaurant,Tapas Restaurant,Spanish Restaurant,Café,Bakery,Japanese Restaurant,Asian Restaurant,...,Snack Place,Deli / Bodega,Chinese Restaurant,BBQ Joint,Mediterranean Restaurant,Cafeteria,Burrito Place,Diner,Food Court,45.081967
6,Centro,1,Spanish Restaurant,Café,Tapas Restaurant,Pizza Place,Vegetarian / Vegan Restaurant,Argentinian Restaurant,Mexican Restaurant,Restaurant,...,Breakfast Spot,Sushi Restaurant,Mediterranean Restaurant,Seafood Restaurant,Bakery,Grilled Meat Restaurant,Venezuelan Restaurant,Deli / Bodega,Moroccan Restaurant,71.705426
7,Retiro,1,Spanish Restaurant,Café,Chinese Restaurant,Italian Restaurant,Asian Restaurant,Diner,Pizza Place,Ethiopian Restaurant,...,Dumpling Restaurant,Fast Food Restaurant,Falafel Restaurant,Food,Food Court,Food Truck,French Restaurant,Gastropub,Greek Restaurant,9.756098
8,Salamanca,1,Restaurant,Tapas Restaurant,Mediterranean Restaurant,Bakery,Spanish Restaurant,Vietnamese Restaurant,Ethiopian Restaurant,Cuban Restaurant,...,Dumpling Restaurant,Fast Food Restaurant,Falafel Restaurant,Comfort Food Restaurant,Food,Food Court,Food Truck,French Restaurant,Gastropub,7.042254
10,Salamanca,1,Restaurant,Tapas Restaurant,Spanish Restaurant,Bakery,Breakfast Spot,Café,Diner,Sandwich Place,...,Dumpling Restaurant,Vietnamese Restaurant,Ethiopian Restaurant,Comfort Food Restaurant,Falafel Restaurant,Fast Food Restaurant,Food,Food Court,Food Truck,8.760951
11,Chamartín,1,Spanish Restaurant,Restaurant,Italian Restaurant,Mediterranean Restaurant,Japanese Restaurant,Paella Restaurant,Café,Fast Food Restaurant,...,Burger Joint,Korean Restaurant,Pizza Place,Comfort Food Restaurant,Bakery,Steakhouse,Argentinian Restaurant,Turkish Restaurant,Donut Shop,25.190393
13,Retiro,1,Diner,Café,Snack Place,Breakfast Spot,Mediterranean Restaurant,Paella Restaurant,Restaurant,Bistro,...,Food,Fast Food Restaurant,Cuban Restaurant,Falafel Restaurant,Ethiopian Restaurant,Food Court,Dumpling Restaurant,Donut Shop,Food Truck,7.368421
16,Retiro,1,Spanish Restaurant,Café,Asian Restaurant,Tapas Restaurant,Restaurant,Mediterranean Restaurant,Chinese Restaurant,Pizza Place,...,Dumpling Restaurant,Deli / Bodega,Diner,Donut Shop,Vietnamese Restaurant,Ethiopian Restaurant,Creperie,Falafel Restaurant,Food,34.214619


The suburbs cluster is characterized by a low concentration of restaurants. The typology seems to be of low-cost places, but we need more data to confirm this.

In [20]:
madrid_merged.loc[madrid_merged['Cluster Labels'] == 2, madrid_merged.columns[[1] + list(range(5, madrid_merged.shape[1]))]]

Unnamed: 0,Distrito,Cluster Labels,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant,6th Most Common Restaurant,7th Most Common Restaurant,8th Most Common Restaurant,...,12th Most Common Restaurant,13th Most Common Restaurant,14th Most Common Restaurant,15th Most Common Restaurant,16th Most Common Restaurant,17th Most Common Restaurant,18th Most Common Restaurant,19th Most Common Restaurant,20th Most Common Restaurant,Rest km2
2,Chamartín,2,Chinese Restaurant,Café,Bakery,Pizza Place,Vietnamese Restaurant,Falafel Restaurant,Deli / Bodega,Diner,...,Fast Food Restaurant,Food,Food Court,Food Truck,French Restaurant,Gastropub,Greek Restaurant,Grilled Meat Restaurant,Cuban Restaurant,1.851852
