<h1 align=center><font size = 5>Restaurants in Barcelona Neighbourhoods</font></h1>

The aim of this notebook is to classify Barcelona neighbourhoods using the type of restaurants existing in each neighbourhood.

Let's import the required libraries:

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import re #for regular expressions

from bs4 import BeautifulSoup #for web scraping

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print('Libraries imported.')

Libraries imported.


## 1. Scraping coordinates of Barcelona neighbourhoods

We will start by getting a Pandas dataframe listing all Barcelona neighbourhoods and their coordinates. We will do so scraping Wikipedia website using BeautifulSoup. First, we will get the list of neighbourhood links from the list of neighbourhoods Wikipedia website: https://es.wikipedia.org/wiki/Categor%C3%ADa:Barrios_de_Barcelona

In [2]:
website_url = requests.get("https://es.wikipedia.org/wiki/Categor%C3%ADa:Barrios_de_Barcelona").text

soup = BeautifulSoup(website_url,"lxml")
main_tag = soup.findAll('div',{'id':'mw-content-text'})[0]

LinkList = list()
for a in main_tag.find_all('a', href=True):
    LinkList.append(a['href'])

r = re.compile("^/wiki")
barris = list(filter(r.match, LinkList))

LinkTotal = list()
for val in enumerate(barris):
    LinkTotal.append("https://es.wikipedia.org" + val[1])

Links of the website referring to Wikipedia pages are stored in the **val** tuple. Here is the first value:

In [3]:
LinkTotal[0]

'https://es.wikipedia.org/wiki/Distritos_de_Barcelona'

Now let's separate links of Wikipedia pages of districts and neighbourhoods:

In [4]:
posDistrictes = [1, 6, 13, 24, 28, 35, 41, 53, 67, 75]
LinkDistrictes = list()
for i in posDistrictes:
    LinkDistrictes.append(LinkTotal[i])

posBarris = list(range(2,6)) + list(range(7, 13)) + list(range(14, 24)) + list(range(25, 28)) + \
list(range(29, 35)) + list(range(36,41)) + list(range(42,53)) + list(range(54,67)) + list(range(68,75)) + \
list(range(76,86))
LinkBarris = list()
for i in posBarris:
    LinkBarris.append(LinkTotal[i])

print("In Barcelona there are", len(LinkDistrictes), "districts an", len(LinkBarris), "neighbourhoods.")

In Barcelona there are 10 districts an 75 neighbourhoods.


I will define a function which uses a Wikipedia neighbourhood website url as input, and returns the neighbourhood **name**, **latitude** and **longitude** registered in Wikipedia:

In [5]:
def NameAndLocation(link):
    barri_url= requests.get(link).text
    soupBarri = BeautifulSoup(barri_url,"lxml")
    name = str(soupBarri.findAll("h1", {"id": "firstHeading"})[0].contents[0])
    
    latlon = soupBarri.findAll("span", {"class": "geo"})
    if len(latlon)==0:
        latitude=np.nan
        longitude=np.nan
    else:
        latlon = latlon[0].contents
        latitude=float(latlon[0].contents[0].replace(", ", ""))
        longitude=float(latlon[1].contents[0])
    
    output = [name, latitude, longitude]
    return(output)    

I will apply that function to all neighbourhood links to obtain the data frame. One of the neighbourhoods has no registered coordinates. It is not properly a neighbourhood, but the city port, so I remove it from the list:

In [6]:
Locations = pd.DataFrame(columns = ['Neighbourhood', 'latitude', 'longitude'])

for i in range(len(LinkBarris)):
    r = NameAndLocation(LinkBarris[i])
    Locations.loc[i] = r

Locations.dropna(axis=0, inplace=True)
Locations.reset_index(drop=True, inplace=True)

Locations.shape

(74, 3)

In [14]:
Locations.head()

Unnamed: 0,Neighbourhood,latitude,longitude
0,La Barceloneta,41.37944,2.18917
1,Barrio Gótico de Barcelona,41.382778,2.176944
2,El Raval,41.38,2.16861
3,"Sant Pere, Santa Caterina i la Ribera",41.3847,2.1826
4,La Antigua Izquierda del Ensanche,41.390061,2.155061


I will store the data frame into a csv file for convenience:

In [15]:
Locations.to_csv("barrisBCN.csv", index=False)

## 2. Using Foursquare to obtain Barcelona venues

Once we have the position of Barcelona neighbourhoods, we need to obtain Barcelona restaurants. I will explore all Barcelona neighbourhoods, and then filter the venues that are restaurants.

Let's start defining a function to explore a specific location (I have masked Foursquare credentials for privacy):

In [23]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# function that explores a location
def explore_location(LocationName, LocationLatitude, LocationLongitude, radius, limit):
    CLIENT_ID = '---------------------' # your Foursquare ID
    CLIENT_SECRET = '----------------------' # your Foursquare Secret
    VERSION = '20190528'
    LIMIT = limit
    
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, LocationLatitude, LocationLongitude, VERSION, radius, LIMIT)
    results = requests.get(url).json()
    items = results['response']['groups'][0]['items']

    dataframe = json_normalize(items) # flatten JSON

    # filter columns
    filtered_columns = ['venue.name', 'venue.categories'] + [col for col in dataframe.columns if col.startswith('venue.location.')] + ['venue.id']
    dataframe_filtered = dataframe.loc[:, filtered_columns]

    # filter the category for each row
    dataframe_filtered['venue.categories'] = dataframe_filtered.apply(get_category_type, axis=1)

    # clean columns
    dataframe_filtered.columns = [col.split('.')[-1] for col in dataframe_filtered.columns]
    
    size = dataframe_filtered.shape[0]
    dataframe_filtered["Neighbourhood"] = [LocationName for i in range(dataframe_filtered.shape[0])]
    
    return dataframe_filtered

Let's test how it works in a location:

In [24]:
testPlace = Locations.loc[0]
resultTestPlace = explore_location(testPlace[0], testPlace[1], testPlace[2], radius=500, limit=10)
resultTestPlace.head()

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,postalCode,state,id,Neighbourhood
0,Baluard Barceloneta,Bakery,"C. del Baluard, 38-40",ES,Barcelona,España,C. d'Escuder,67,"[C. del Baluard, 38-40 (C. d'Escuder), 08003 B...","[{'label': 'display', 'lat': 41.38004687981699...",41.380047,2.18925,8003.0,Cataluña,4c5144e3375c0f475ed8ae93,La Barceloneta
1,Rumbanroll,Mediterranean Restaurant,"C. de la Maquinista, 6",ES,Barcelona,España,,171,"[C. de la Maquinista, 6, 08003 Barcelona Catal...","[{'label': 'display', 'lat': 41.38059695009252...",41.380597,2.187807,8003.0,Cataluña,53249684498ee054d4599642,La Barceloneta
2,La Cova Fumada,Tapas Restaurant,"Baluard, 56",ES,Barcelona,España,,21,"[Baluard, 56, 08003 Barcelona Cataluña, España]","[{'label': 'display', 'lat': 41.37925390526866...",41.379254,2.189254,8003.0,Cataluña,4b7a8de2f964a520fa302fe3,La Barceloneta
3,Plaça de la Barceloneta,Plaza,Plaça de la Barceloneta,ES,Barcelona,España,,92,"[Plaça de la Barceloneta, Barcelona Cataluña, ...","[{'label': 'display', 'lat': 41.3797389508097,...",41.379739,2.188135,,Cataluña,4c7a77822d3ba1432b4b91d0,La Barceloneta
4,BRO,Burger Joint,"C. de Baluard, 34",ES,Barcelona,España,entre Maquinista i Escuder,87,"[C. de Baluard, 34 (entre Maquinista i Escuder...","[{'label': 'display', 'lat': 41.38021387542235...",41.380214,2.189007,8003.0,Cataluña,535ea7db498e522c6b801a4f,La Barceloneta


With this testing, I have obtained the column names of the returned data frame, so I can explore all neighbourhoods:

In [25]:
venues = pd.DataFrame(columns=resultTestPlace.columns)

for i in range(len(Locations)):
    barri = Locations.loc[i]
    venuesBarri = explore_location(barri[0],  barri[1], barri[2], radius=1000, limit=400)
    venues = pd.concat([venues, venuesBarri], sort=False)
    print(barri[0])

La Barceloneta
Barrio Gótico de Barcelona
El Raval
Sant Pere, Santa Caterina i la Ribera
La Antigua Izquierda del Ensanche
La Nueva Izquierda del Ensanche
La Dreta de l'Eixample
El Fort Pienc
Sagrada Familia (barrio de Barcelona)
Barrio de Sant Antoni (Barcelona)
La Bordeta (Barcelona)
La Font de la Guatlla
Hostafrancs
La Marina del Prat Vermell
La Marina de Port
Pueblo Seco
Sants
Sants-Badal
Montjuic (Barcelona)
Barrio de Les Corts
La Maternidad y San Ramón
Pedralbes
Putget i Farró
Sant Gervasi-La Bonanova
Sant Gervasi-Galvany
Sarriá
Las Tres Torres
Vallvidrera, el Tibidabo i les Planes
Camp d'en Grassot i Gràcia Nova
El Coll
Villa de Gracia
La Salud (Barcelona)
Vallcarca y los Penitentes
El Baix Guinardó
Can Baró
El Carmelo
La Font d'en Fargues
El Guinardó
Horta (Barcelona)
La Clota
Montbau
Sant Genís dels Agudells
La Teixonera
El Valle de Hebrón
Can Peguera
Canyelles (barrio)
Ciudad Meridiana
La Guineueta
Porta (Barcelona)
La Prosperitat
Les Roquetes
Torre Baró
La Trinitat Nova
El T

Let's look at dimensions of venues and how does it look like:

In [21]:
venues.shape

(6026, 17)

In [26]:
venues.head()

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,postalCode,state,id,Neighbourhood,neighborhood
0,Baluard Barceloneta,Bakery,"C. del Baluard, 38-40",ES,Barcelona,España,C. d'Escuder,67,"[C. del Baluard, 38-40 (C. d'Escuder), 08003 B...","[{'label': 'display', 'lat': 41.38004687981699...",41.380047,2.18925,8003.0,Cataluña,4c5144e3375c0f475ed8ae93,La Barceloneta,
1,Rumbanroll,Mediterranean Restaurant,"C. de la Maquinista, 6",ES,Barcelona,España,,171,"[C. de la Maquinista, 6, 08003 Barcelona Catal...","[{'label': 'display', 'lat': 41.38059695009252...",41.380597,2.187807,8003.0,Cataluña,53249684498ee054d4599642,La Barceloneta,
2,La Cova Fumada,Tapas Restaurant,"Baluard, 56",ES,Barcelona,España,,21,"[Baluard, 56, 08003 Barcelona Cataluña, España]","[{'label': 'display', 'lat': 41.37925390526866...",41.379254,2.189254,8003.0,Cataluña,4b7a8de2f964a520fa302fe3,La Barceloneta,
3,Plaça de la Barceloneta,Plaza,Plaça de la Barceloneta,ES,Barcelona,España,,92,"[Plaça de la Barceloneta, Barcelona Cataluña, ...","[{'label': 'display', 'lat': 41.3797389508097,...",41.379739,2.188135,,Cataluña,4c7a77822d3ba1432b4b91d0,La Barceloneta,
4,BRO,Burger Joint,"C. de Baluard, 34",ES,Barcelona,España,entre Maquinista i Escuder,87,"[C. de Baluard, 34 (entre Maquinista i Escuder...","[{'label': 'display', 'lat': 41.38021387542235...",41.380214,2.189007,8003.0,Cataluña,535ea7db498e522c6b801a4f,La Barceloneta,


Let's save the results for convenience:

In [27]:
venues.to_csv("llocsBCN.csv", index=False)

## 3. Obtaining Barcelona restaurants

I have used Forsquare location data to obtain Barcelona venues, exploring each of its neighbourhoods. Now I need to filter these venues to retain restaurants. Let's look at venues categories:

In [30]:
categories = venues["categories"].unique()
print("There are", len(categories), "different categories in BCN venues.")

There are 286 different categories in BCN venues.


Using regular expressions, we retain categories related with restaurants:

In [32]:
r = re.compile('Restaurant|restaurant')
restaurants = list(filter(r.search, categories))
print("There are", len(restaurants), "different restaurant categories in BCN venues.")

There are 57 different restaurant categories in BCN venues.


From all obtained venues, we need to retain the restaurants, filtering by the selected categories:

In [35]:
venuesBCNrestaurants = venues.loc[venues['categories'].isin(restaurants)]
venuesBCNrestaurants.shape

(1908, 17)

## 3. Clustering the neighbourhoods of Barcelona

To cluster the neighbourhoods, we start generating dummy variables for each category and venue:

In [37]:
BCNrestaurants_onehot = pd.get_dummies(venuesBCNrestaurants[['categories']], prefix="", prefix_sep="")
BCNrestaurants_onehot['Neighbourhood'] = venuesBCNrestaurants['Neighbourhood']

fixed_columns = [BCNrestaurants_onehot.columns[-1]] + list(BCNrestaurants_onehot.columns[:-1])
BCNrestaurants_onehot = BCNrestaurants_onehot[fixed_columns]

BCNrestaurants_onehot.head()

Unnamed: 0,Neighbourhood,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Brazilian Restaurant,Cambodian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Fondue Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Hawaiian Restaurant,Hungarian Restaurant,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,Paella Restaurant,Peruvian Restaurant,Polish Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Russian Restaurant,Scandinavian Restaurant,Seafood Restaurant,South American Restaurant,Spanish Restaurant,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant
1,La Barceloneta,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,La Barceloneta,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
6,La Barceloneta,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
8,La Barceloneta,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,La Barceloneta,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


The size of the resulting data frame:

In [38]:
BCNrestaurants_onehot.shape

(1908, 58)

Now we obtain the average value of each category for each neighbourhood. This will be the data I will use for clustering:

In [39]:
BCNrestaurants_grouped = BCNrestaurants_onehot.groupby('Neighbourhood').mean().reset_index()
BCNrestaurants_grouped.head()

Unnamed: 0,Neighbourhood,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Brazilian Restaurant,Cambodian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Fondue Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Hawaiian Restaurant,Hungarian Restaurant,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,Paella Restaurant,Peruvian Restaurant,Polish Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Russian Restaurant,Scandinavian Restaurant,Seafood Restaurant,South American Restaurant,Spanish Restaurant,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant
0,Barrio Gótico de Barcelona,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.121212,0.090909,0.0,0.0,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.030303,0.0,0.0,0.030303,0.0,0.181818,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.030303,0.0,0.0
1,Barrio de Les Corts,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.027778,0.027778,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.055556,0.0,0.055556,0.0,0.0,0.0,0.055556,0.0,0.027778,0.0,0.0,0.305556,0.0,0.0,0.0,0.0,0.138889,0.027778,0.0,0.0,0.055556,0.027778,0.0,0.0,0.0,0.027778,0.0,0.0
2,Barrio de Sant Antoni (Barcelona),0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.025,0.0,0.075,0.075,0.0,0.0,0.0,0.125,0.025,0.025,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.1,0.0,0.025,0.025,0.0,0.075,0.0,0.0,0.0,0.225,0.025,0.0,0.0,0.0,0.025,0.025,0.0
3,Barón de Viver,0.0,0.064516,0.0,0.0,0.032258,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.129032,0.064516,0.0,0.0,0.0,0.096774,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.129032,0.0,0.0,0.0,0.0,0.16129,0.0,0.0,0.0,0.16129,0.0,0.0,0.0,0.032258,0.0,0.0,0.0
4,Camp d'en Grassot i Gràcia Nova,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.121951,0.097561,0.0,0.02439,0.02439,0.073171,0.02439,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.02439,0.121951,0.0,0.0,0.02439,0.0,0.073171,0.04878,0.0,0.0,0.146341,0.0,0.0,0.0,0.0,0.097561,0.02439,0.02439


To evaluate the characteristics of each neighbourhood, I will obtain the ten most common restaurants:

In [40]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
# create a new dataframe

neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = BCNrestaurants_grouped['Neighbourhood']

for ind in np.arange(BCNrestaurants_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(BCNrestaurants_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barrio Gótico de Barcelona,Tapas Restaurant,Spanish Restaurant,Italian Restaurant,Japanese Restaurant,Mediterranean Restaurant,Falafel Restaurant,Greek Restaurant,Ramen Restaurant,Seafood Restaurant,Restaurant
1,Barrio de Les Corts,Restaurant,Spanish Restaurant,Japanese Restaurant,Paella Restaurant,Tapas Restaurant,Middle Eastern Restaurant,Mediterranean Restaurant,Asian Restaurant,Vegetarian / Vegan Restaurant,Hawaiian Restaurant
2,Barrio de Sant Antoni (Barcelona),Tapas Restaurant,Mediterranean Restaurant,Restaurant,Japanese Restaurant,Spanish Restaurant,Italian Restaurant,Argentinian Restaurant,Peruvian Restaurant,Thai Restaurant,Gluten-free Restaurant
3,Barón de Viver,Tapas Restaurant,Spanish Restaurant,Restaurant,Italian Restaurant,Mediterranean Restaurant,Fast Food Restaurant,Japanese Restaurant,American Restaurant,Asian Restaurant,Molecular Gastronomy Restaurant
4,Camp d'en Grassot i Gràcia Nova,Tapas Restaurant,Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant,Japanese Restaurant,Mediterranean Restaurant,Spanish Restaurant,Sushi Restaurant,Vietnamese Restaurant,Korean Restaurant


Now that we have the information to do the clustering, I will proceed to cluster the neighbourhoods. Note that the study is exploratory, so I cannot evaluate model accuracy. I have chosen to use five clusters:

In [54]:
# set number of clusters
kclusters = 3

BCNrestaurants_grouped_clustering = BCNrestaurants_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(BCNrestaurants_grouped_clustering)

# check how many elements fall into each cluster
import collections
collections.Counter(kmeans.labels_)

Counter({0: 23, 2: 46, 1: 5})

For these data, some elements are isolated if we implement four or more clusters. So I have decided to retain five elements. To map the resulting clusters, we need to add the cluster labels and the coordinates for each neighbourhood.

In [55]:
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [56]:
BCNrestaurants_merged = Locations

BCNrestaurants_merged = BCNrestaurants_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), \
                                                   on='Neighbourhood')

BCNrestaurants_merged.head()

Unnamed: 0,Neighbourhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,La Barceloneta,41.37944,2.18917,0,Tapas Restaurant,Paella Restaurant,Mediterranean Restaurant,Spanish Restaurant,Restaurant,Seafood Restaurant,Mexican Restaurant,Indian Restaurant,Hawaiian Restaurant,Comfort Food Restaurant
1,Barrio Gótico de Barcelona,41.382778,2.176944,0,Tapas Restaurant,Spanish Restaurant,Italian Restaurant,Japanese Restaurant,Mediterranean Restaurant,Falafel Restaurant,Greek Restaurant,Ramen Restaurant,Seafood Restaurant,Restaurant
2,El Raval,41.38,2.16861,2,Tapas Restaurant,Spanish Restaurant,Mediterranean Restaurant,Restaurant,Italian Restaurant,Argentinian Restaurant,Empanada Restaurant,Japanese Restaurant,Indian Restaurant,Vegetarian / Vegan Restaurant
3,"Sant Pere, Santa Caterina i la Ribera",41.3847,2.1826,0,Tapas Restaurant,Spanish Restaurant,Mexican Restaurant,Vegetarian / Vegan Restaurant,Italian Restaurant,Seafood Restaurant,Asian Restaurant,Restaurant,Mediterranean Restaurant,Arepa Restaurant
4,La Antigua Izquierda del Ensanche,41.390061,2.155061,2,Spanish Restaurant,Mediterranean Restaurant,Japanese Restaurant,Tapas Restaurant,Restaurant,Peruvian Restaurant,Mexican Restaurant,Molecular Gastronomy Restaurant,Russian Restaurant,Seafood Restaurant


The most visual way to show the results is mapping them using Folium. Here is the result:

In [57]:
# create map
BCNlatitude = 41.3887901
BCNlongitude = 2.1589899
map_clusters = folium.Map(location=[BCNlatitude, BCNlongitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(BCNrestaurants_merged['latitude'], BCNrestaurants_merged['longitude'], BCNrestaurants_merged['Neighbourhood'], BCNrestaurants_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Let's review each of the resulting categories:

In [61]:
BCNrestaurants_merged.loc[BCNrestaurants_merged['Cluster Labels'] == 0, \
                          BCNrestaurants_merged.columns[[0] + list(range(4,14))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,La Barceloneta,Tapas Restaurant,Paella Restaurant,Mediterranean Restaurant,Spanish Restaurant,Restaurant,Seafood Restaurant,Mexican Restaurant,Indian Restaurant,Hawaiian Restaurant,Comfort Food Restaurant
1,Barrio Gótico de Barcelona,Tapas Restaurant,Spanish Restaurant,Italian Restaurant,Japanese Restaurant,Mediterranean Restaurant,Falafel Restaurant,Greek Restaurant,Ramen Restaurant,Seafood Restaurant,Restaurant
3,"Sant Pere, Santa Caterina i la Ribera",Tapas Restaurant,Spanish Restaurant,Mexican Restaurant,Vegetarian / Vegan Restaurant,Italian Restaurant,Seafood Restaurant,Asian Restaurant,Restaurant,Mediterranean Restaurant,Arepa Restaurant
6,La Dreta de l'Eixample,Tapas Restaurant,Spanish Restaurant,Restaurant,Mediterranean Restaurant,Seafood Restaurant,Japanese Restaurant,Hungarian Restaurant,Argentinian Restaurant,Italian Restaurant,Ramen Restaurant
10,La Bordeta (Barcelona),Tapas Restaurant,Spanish Restaurant,Restaurant,Mediterranean Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Italian Restaurant,Peruvian Restaurant,Ethiopian Restaurant,Mexican Restaurant
15,Pueblo Seco,Tapas Restaurant,Mediterranean Restaurant,Italian Restaurant,Spanish Restaurant,Peruvian Restaurant,Restaurant,Japanese Restaurant,Seafood Restaurant,Mexican Restaurant,Asian Restaurant
30,Villa de Gracia,Tapas Restaurant,Mediterranean Restaurant,Middle Eastern Restaurant,Restaurant,Italian Restaurant,Asian Restaurant,Sushi Restaurant,Spanish Restaurant,Mexican Restaurant,Seafood Restaurant
36,La Font d'en Fargues,Tapas Restaurant,Restaurant,Spanish Restaurant,Italian Restaurant,Mediterranean Restaurant,Vegetarian / Vegan Restaurant,Asian Restaurant,German Restaurant,Chinese Restaurant,Greek Restaurant
38,Horta (Barcelona),Tapas Restaurant,Spanish Restaurant,Restaurant,Italian Restaurant,Mediterranean Restaurant,Vegetarian / Vegan Restaurant,Asian Restaurant,German Restaurant,Chinese Restaurant,Greek Restaurant
41,Sant Genís dels Agudells,Mediterranean Restaurant,Argentinian Restaurant,Tapas Restaurant,Vietnamese Restaurant,Fast Food Restaurant,Italian Restaurant,Indonesian Restaurant,Indian Restaurant,Hungarian Restaurant,Hawaiian Restaurant


This category includes 23 neighbourhoods (in red in the map). In these neighbourhoods, tapas restaurant is among the most common venues. Tapas restaurants offer small plates (tapas, in Spanish) of Spanish food (e.g., patatas bravas). Tapas are eaten usually in a context of leisure, either by tourist (in the city centre) or by locals (in the north neighbourhoods). Tourist can visit these neighbourhoods (e.g. Horta or Turo de la Peira) for a more authentic tapas experience.

In [62]:
BCNrestaurants_merged.loc[BCNrestaurants_merged['Cluster Labels'] == 1, \
                          BCNrestaurants_merged.columns[[0] + list(range(4,14))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,La Marina del Prat Vermell,Spanish Restaurant,Mediterranean Restaurant,Vietnamese Restaurant,Falafel Restaurant,Italian Restaurant,Indonesian Restaurant,Indian Restaurant,Hungarian Restaurant,Hawaiian Restaurant,Greek Restaurant
18,Montjuic (Barcelona),Mediterranean Restaurant,Spanish Restaurant,Restaurant,Italian Restaurant,Asian Restaurant,Vietnamese Restaurant,Indonesian Restaurant,Indian Restaurant,Hungarian Restaurant,Hawaiian Restaurant
46,Ciudad Meridiana,Mediterranean Restaurant,Vietnamese Restaurant,Fast Food Restaurant,Japanese Restaurant,Italian Restaurant,Indonesian Restaurant,Indian Restaurant,Hungarian Restaurant,Hawaiian Restaurant,Greek Restaurant
51,Torre Baró,Mediterranean Restaurant,Spanish Restaurant,Vietnamese Restaurant,Falafel Restaurant,Italian Restaurant,Indonesian Restaurant,Indian Restaurant,Hungarian Restaurant,Hawaiian Restaurant,Greek Restaurant
67,Diagonal Mar i el Front Marítim del Poblenou,Mediterranean Restaurant,Restaurant,Spanish Restaurant,Italian Restaurant,Vietnamese Restaurant,Indonesian Restaurant,Indian Restaurant,Hungarian Restaurant,Hawaiian Restaurant,Greek Restaurant


The second cluster is the smallest, including only five neighbourhoods, painted in purple in the map. Here the predominant restaurant venue is the Mediterranean restaurant. These restaurants offer pizza and pasta, together with some Spanish specialities. These restaurants can be visited either by leisure or at a working pause at noon. These neighbourhoods have also a varied offer of international cuisine.

In [63]:
BCNrestaurants_merged.loc[BCNrestaurants_merged['Cluster Labels'] == 2, \
                          BCNrestaurants_merged.columns[[0] + list(range(4,14))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,El Raval,Tapas Restaurant,Spanish Restaurant,Mediterranean Restaurant,Restaurant,Italian Restaurant,Argentinian Restaurant,Empanada Restaurant,Japanese Restaurant,Indian Restaurant,Vegetarian / Vegan Restaurant
4,La Antigua Izquierda del Ensanche,Spanish Restaurant,Mediterranean Restaurant,Japanese Restaurant,Tapas Restaurant,Restaurant,Peruvian Restaurant,Mexican Restaurant,Molecular Gastronomy Restaurant,Russian Restaurant,Seafood Restaurant
5,La Nueva Izquierda del Ensanche,Mediterranean Restaurant,Japanese Restaurant,Tapas Restaurant,Spanish Restaurant,Restaurant,Indian Restaurant,Italian Restaurant,Thai Restaurant,Seafood Restaurant,Gluten-free Restaurant
7,El Fort Pienc,Spanish Restaurant,Restaurant,Mediterranean Restaurant,Chinese Restaurant,French Restaurant,Ramen Restaurant,Szechuan Restaurant,Mexican Restaurant,Brazilian Restaurant,Tapas Restaurant
8,Sagrada Familia (barrio de Barcelona),Restaurant,Italian Restaurant,Japanese Restaurant,Latin American Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,Mexican Restaurant,Spanish Restaurant,Seafood Restaurant,Asian Restaurant
9,Barrio de Sant Antoni (Barcelona),Tapas Restaurant,Mediterranean Restaurant,Restaurant,Japanese Restaurant,Spanish Restaurant,Italian Restaurant,Argentinian Restaurant,Peruvian Restaurant,Thai Restaurant,Gluten-free Restaurant
11,La Font de la Guatlla,Tapas Restaurant,Mediterranean Restaurant,Spanish Restaurant,Restaurant,Italian Restaurant,Peruvian Restaurant,Seafood Restaurant,Asian Restaurant,Ramen Restaurant,Paella Restaurant
12,Hostafrancs,Restaurant,Tapas Restaurant,Mediterranean Restaurant,Spanish Restaurant,Middle Eastern Restaurant,Peruvian Restaurant,Italian Restaurant,Vietnamese Restaurant,Japanese Restaurant,Asian Restaurant
14,La Marina de Port,Spanish Restaurant,Restaurant,Italian Restaurant,Mediterranean Restaurant,Latin American Restaurant,Asian Restaurant,Indonesian Restaurant,Indian Restaurant,Hungarian Restaurant,Hawaiian Restaurant
16,Sants,Japanese Restaurant,Mediterranean Restaurant,Tapas Restaurant,Spanish Restaurant,Indian Restaurant,Italian Restaurant,Restaurant,Seafood Restaurant,Korean Restaurant,Venezuelan Restaurant


The last group covers 46 neighbourhoods, so more than one half of Barcelona. They are painted in light blue in the map. In those neighbourhoods the predominant venues are the Restaurant or Spanish Restaurant, followed by Mediterranean restaurant. We can say that people is going to these restaurants in the working pause. In Spain there is a tradition of making a long work stop (sometimes from 14 to 16) to have a strong lunch, with two plates and dessert. This is offered in Spanish Restaurants and Restaurants, and sometimes in Mediterranean Restaurants as "Menu del dia". So we can expect that most of the revenue of these restaurants will be obtained in midday in working days.