# Battle of neighborhoods - Barcelona 

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

### Introduction: Business Problem <a name="introduction"></a> 
    
Barcelona, the capital of Catalonia, has a population of 1.6M people and is at the heart of
a metropolitan region of 5M inhabitants.

The cosmopolitan, diverse and intercultural spirit of Barcelona can be seen in the fact that
18.5% of the city’s residents are foreign, exceeding 300.000 residents. 

Despite the large and ethincally diverese population there is only a handful of Greek
restaurants which offer a high quality menu for a middle level target audience. 

The stake holder wants to fill this gap by opening a greek restaurant. Criteria to be considered: 

- Density of other restaurants 
- Other greek, or similar cuisine (spanish, mediterranean) restaurants in the neighborhood 
- Polulation density 
- Distance from city centre 
  


### Data <a name="data"></a> 

For this project I use data from 2 different sources : 

- wikipedia page for districts/neighborhoods in the city 
- https://www.bcn.cat/estadistica/, the official page of bvarcleona city, for population data 

 

### Methodology <a name="methodology"></a> 
     
We use the package *Beautiful Soup* for web scrapping tables with neighborhood/district data. 
To find the corresponding longitude/latitudes of these data we use the *Nominatim geolocator module*. 
For visualization, we use the *Folium* package. The machine learning technique used is the *k-means* cluseting from *sklearn* package.
The rest of analysis relies on pandas, numpy, matplotlib packages.              
       

### Analysis <a name="analysis"></a> 



The first step is to create a dataframe with all
the neighborhoods and districts in Barcelona.  
I can do this by scraping the names from the wikipedia page, 
and using the Nominatim module to find the corresponding latitudes/longitude


First, we we import libraries

In [6]:
# Import libraries 
import requests
import lxml.html as lh
import pandas as pd 
from bs4 import BeautifulSoup

# !conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

# !conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Libraries imported.')

Libraries imported.


In [7]:
import numpy as np 

Now I extract district/neighbohood data from the wikipedia page using the **Beautiful Soup** package 

In [8]:
# extract data using Beautiful Soup 
url='https://en.wikipedia.org/wiki/Districts_of_Barcelona'
res = requests.get(url)

soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')
data = pd.read_html(str(table))
df = pd.DataFrame(data[7])

The resulting dataframe is quite large: I change the default display  so I can visualize the dataframe properly

In [9]:
pd.set_option('display.max_columns', None)  
pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', -1)

In [10]:
df 

Unnamed: 0,vteDistricts and neighbourhoods of Barcelona,vteDistricts and neighbourhoods of Barcelona.1
0,Ciutat Vella,"La Barceloneta Gothic Quarter El Raval Sant Pere, Santa Caterina i la Ribera"
1,L'Eixample,L'Antiga Esquerra de l'Eixample La Nova Esquerra de l'Eixample Dreta de l'Eixample Fort Pienc Sagrada Família Sant Antoni
2,Sants-Montjuïc,La Bordeta La Font de la Guatlla Hostafrancs La Marina de Port La Marina del Prat Vermell El Poble-sec Sants Sants-Badal Montjuïc Zona Franca – Port
3,Les Corts,Les Corts La Maternitat i Sant Ramon Pedralbes
4,Sarrià-Sant Gervasi,"El Putget i Farró Sarrià Sant Gervasi – la Bonanova Sant Gervasi – Galvany les Tres Torres Vallvidrera, el Tibidabo i les Planes"
5,Gràcia,Vila de Gràcia Camp d'en Grassot i Gràcia Nova La Salut El Coll Vallcarca i els Penitents
6,Horta-Guinardó,El Baix Guinardó El Guinardó Can Baró El Carmel La Font d'en Fargues Horta La Clota Montbau Sant Genís dels Agudells La Teixonera Vall d'Hebron
7,Nou Barris,Can Peguera Canyelles Ciutat Meridiana La Guineueta Porta La Prosperitat Roquetes Torre Baró La Trinitat Nova El Turó de la Peira Vallbona Verdum Vilapicina i la Torre Llobeta
8,Sant Andreu,Baró de Viver Bon Pastor El Congrés i els Indians Navas Sant Andreu de Palomar La Sagrera Trinitat Vella
9,Sant Martí,El Besòs i el Maresme El Clot El Camp de l'Arpa del Clot Diagonal Mar i el Front Marítim del Poblenou El Parc i la Llacuna del Poblenou El Poblenou Provençals del Poblenou Sant Martí de Provençals La Verneda i la Pau La Vila Olímpica del Poblenou


Some data cleaning is necessary: for example changing column names,  separating the neighborhood values with commas (",") since while web scraping those commas were lost. 

In [11]:
df.rename(columns={'vteDistricts and neighbourhoods of Barcelona':'Districts'}, inplace=True)

In [12]:
df.rename(columns={'vteDistricts and neighbourhoods of Barcelona.1':'Neighborhoods'}, inplace=True)

In [13]:
# Replace the character " " with a comma in neighborhood  
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace (' ',', ',100) )

In [14]:
df 

Unnamed: 0,Districts,Neighborhoods
0,Ciutat Vella,"La, Barceloneta, Gothic, Quarter, El, Raval, Sant, Pere,, Santa, Caterina, i, la, Ribera"
1,L'Eixample,"L'Antiga, Esquerra, de, l'Eixample, La, Nova, Esquerra, de, l'Eixample, Dreta, de, l'Eixample, Fort, Pienc, Sagrada, Família, Sant, Antoni"
2,Sants-Montjuïc,"La, Bordeta, La, Font, de, la, Guatlla, Hostafrancs, La, Marina, de, Port, La, Marina, del, Prat, Vermell, El, Poble-sec, Sants, Sants-Badal, Montjuïc, Zona, Franca, –, Port"
3,Les Corts,"Les, Corts, La, Maternitat, i, Sant, Ramon, Pedralbes"
4,Sarrià-Sant Gervasi,"El, Putget, i, Farró, Sarrià, Sant, Gervasi, –, la, Bonanova, Sant, Gervasi, –, Galvany, les, Tres, Torres, Vallvidrera,, el, Tibidabo, i, les, Planes"
5,Gràcia,"Vila, de, Gràcia, Camp, d'en, Grassot, i, Gràcia, Nova, La, Salut, El, Coll, Vallcarca, i, els, Penitents"
6,Horta-Guinardó,"El, Baix, Guinardó, El, Guinardó, Can, Baró, El, Carmel, La, Font, d'en, Fargues, Horta, La, Clota, Montbau, Sant, Genís, dels, Agudells, La, Teixonera, Vall, d'Hebron"
7,Nou Barris,"Can, Peguera, Canyelles, Ciutat, Meridiana, La, Guineueta, Porta, La, Prosperitat, Roquetes, Torre, Baró, La, Trinitat, Nova, El, Turó, de, la, Peira, Vallbona, Verdum, Vilapicina, i, la, Torre, Llobeta"
8,Sant Andreu,"Baró, de, Viver, Bon, Pastor, El, Congrés, i, els, Indians, Navas, Sant, Andreu, de, Palomar, La, Sagrera, Trinitat, Vella"
9,Sant Martí,"El, Besòs, i, el, Maresme, El, Clot, El, Camp, de, l'Arpa, del, Clot, Diagonal, Mar, i, el, Front, Marítim, del, Poblenou, El, Parc, i, la, Llacuna, del, Poblenou, El, Poblenou, Provençals, del, Poblenou, Sant, Martí, de, Provençals, La, Verneda, i, la, Pau, La, Vila, Olímpica, del, Poblenou"


Some neighborhood names contain more than a word, which now are separated by commas. Here I fix such neighborhoods names 

In [15]:
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('La,','La',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('de,','de',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('del,','del',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace (', de',' de',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('El,','El',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('el,','el',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('Can,','Can',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('i,','i',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace (', i',' i',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('Sant,','Sant',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('Santa,','Santa',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('la,','la',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('les,','les',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('Les,','Les',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace (',,',',',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('Zona, Franca, –, Port','Zona Franca-Port',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('La Trinitat, Nova,','La Trinitat Nova,',100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('Sagrada, Familia','Sagrada Familia',100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('Gràcia, Nova,','Gràcia Nova,',100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('en,','en',100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('Nova, Esquerra','Nova Esquerra',100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('Sant Gervasi –,','Sant Gervasi ',100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('Fort,','Fort',100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace (' les Tres, Torres',' les Tres Torres,',100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('la Guatlla Hostafrancs,','la Guatlla, Hostafrancs,,',100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace (", d'en "," d'en ",100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ("Baix, Guinardó","Baix Guinardó",100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ("Carmel","Carmel,",100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ("Vall, ","Vall ",100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ("els,","els",100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ("Torre, Baró","Torre Baró",100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ("Bon,","Bon",100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ("Canyelles","Canyelles,",100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ("Ciutat, Meridiana,","Ciutat Meridiana,",100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ("Quarter, El Raval","Raval",100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace (",,",",",100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace (' Sagrada, Família',' Sagrada Família',100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ("L'Antiga, Esquerra ","L'Antiga Esquerra ",100))

# 
df

Unnamed: 0,Districts,Neighborhoods
0,Ciutat Vella,"La Barceloneta, Gothic, Raval, Sant Pere, Santa Caterina i la Ribera"
1,L'Eixample,"L'Antiga Esquerra de l'Eixample, La Nova Esquerra de l'Eixample, Dreta de l'Eixample, Fort Pienc, Sagrada Família, Sant Antoni"
2,Sants-Montjuïc,"La Bordeta, La Font de la Guatlla, Hostafrancs, La Marina de Port, La Marina del Prat, Vermell, El Poble-sec, Sants, Sants-Badal, Montjuïc, Zona Franca-Port"
3,Les Corts,"Les Corts, La Maternitat i Sant Ramon, Pedralbes"
4,Sarrià-Sant Gervasi,"El Putget i Farró, Sarrià, Sant Gervasi la Bonanova, Sant Gervasi Galvany, les Tres Torres, Vallvidrera, el Tibidabo i les Planes"
5,Gràcia,"Vila de Gràcia, Camp d'en Grassot i Gràcia Nova, La Salut, El Coll, Vallcarca i els Penitents"
6,Horta-Guinardó,"El Baix Guinardó, El Guinardó, Can Baró, El Carmel, La Font d'en Fargues, Horta, La Clota, Montbau, Sant Genís dels Agudells, La Teixonera, Vall d'Hebron"
7,Nou Barris,"Can Peguera, Canyelles, Ciutat Meridiana, La Guineueta, Porta, La Prosperitat, Roquetes, Torre Baró, La Trinitat Nova, El Turó de la Peira, Vallbona, Verdum, Vilapicina i la Torre, Llobeta"
8,Sant Andreu,"Baró de Viver, Bon Pastor, El Congrés i els Indians, Navas, Sant Andreu de Palomar, La Sagrera, Trinitat, Vella"
9,Sant Martí,"El Besòs i el Maresme, El Clot, El Camp de l'Arpa del Clot, Diagonal, Mar i el Front, Marítim del Poblenou, El Parc i la Llacuna del Poblenou, El Poblenou, Provençals del Poblenou, Sant Martí de Provençals, La Verneda i la Pau, La Vila Olímpica del Poblenou"


The comma separated neighborhoods are all stacked in the same rows if they belong to the same distirct. Here I separate all distinct neighborhoods into different rows  

In [16]:
# Step 1 
new_df = pd.DataFrame(df.Neighborhoods.str.split(',').tolist(), index=df.Districts).stack()
# Step 2 
new_df = new_df.reset_index([0, 'Districts'])
# Step 3 
new_df.columns = ['Districts', 'Neighborhoods']
new_df.tail ()  


Unnamed: 0,Districts,Neighborhoods
77,Sant Martí,El Poblenou
78,Sant Martí,Provençals del Poblenou
79,Sant Martí,Sant Martí de Provençals
80,Sant Martí,La Verneda i la Pau
81,Sant Martí,La Vila Olímpica del Poblenou


How many neighborhoods are there in Barcelona? 

In [17]:
new_df.shape 

(82, 2)

So there are 82 neighborhoods we can choose from.     
I want to extract latitude/longitude for all the neighborhoods. I will do this using the **Nominatim module** 

In [18]:
# address = 'Sant Andreu de Palomar,'
def find_lon_lat(address): 
    geolocator = Nominatim(user_agent="ny_explorer ")
    location = geolocator.geocode(address,timeout=10000)
    latitude = location.latitude
    longitude = location.longitude
    return [latitude, longitude]
# test it 
find_lon_lat('El Coll Barcelona, Spain')


[41.6512892, 1.9584116]

In [19]:
new_df.head ()

Unnamed: 0,Districts,Neighborhoods
0,Ciutat Vella,La Barceloneta
1,Ciutat Vella,Gothic
2,Ciutat Vella,Raval
3,Ciutat Vella,Sant Pere
4,Ciutat Vella,Santa Caterina i la Ribera


In [20]:
new_df.shape[0]

82

In [21]:
new_df.iloc[80:87,1]

80     La Verneda i la Pau          
81     La Vila Olímpica del Poblenou
Name: Neighborhoods, dtype: object

I extract the neighborhoods lat/lons 

In [22]:
lon = []
lat = []
for index in range(new_df.shape[0]):
# for index in range(10):
    # print (new_df.iloc[index,1]+" , Barcelona, Spain")
    # print (find_lon_lat(new_df.iloc[index,1]+" , Barcelona, Spain" )[1] )
    lat.append( find_lon_lat( new_df.iloc[index,1]+" , Barcelona, Spain" )  [0] ) 
    lon.append( find_lon_lat( new_df.iloc[index,1]+" , Barcelona, Spain" )  [1] )   


I append longitude and latitude to the dataframe 

In [23]:
new_df["Latitude"] = lat 
new_df["Longitude"] = lon 

In [24]:
new_df.tail ()

Unnamed: 0,Districts,Neighborhoods,Latitude,Longitude
77,Sant Martí,El Poblenou,41.400527,2.201729
78,Sant Martí,Provençals del Poblenou,41.41236,2.204885
79,Sant Martí,Sant Martí de Provençals,41.416519,2.198968
80,Sant Martí,La Verneda i la Pau,41.42322,2.20294
81,Sant Martí,La Vila Olímpica del Poblenou,41.389868,2.196846


I drop neighborhoods that are far from the city centre (their longitudes appears smaller than 2).   

In [25]:
new_df.drop ( new_df[new_df['Longitude'] < 2.].index, inplace=True )
new_df.shape


(73, 4)

I visualize the neighborhoods using a folium map 

In [26]:
lat_bcn = find_lon_lat('Barcelona, Spain')[0]
lon_bcn = find_lon_lat('Barcelona, Spain')[1]

In [27]:
# create map of Barcelona using latitude and longitude values
map_bcn = folium.Map(location=[lat_bcn, lon_bcn], zoom_start=10)

# add markers to map
for lat, lng, districts, neighborhoods in zip(new_df['Latitude'], new_df['Longitude'], new_df['Districts'], new_df['Neighborhoods']):
    label = '{}, {}'.format(neighborhoods, districts)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bcn)  
    
map_bcn

### Foursquare

I use Foursquare API to get info on restaurants in each neighborhood. First, we exctractr venues, later on we filter. 


In [28]:
CLIENT_ID = '02G45DAR5A4SZEQXV5ZJ5EKTZCEBQSTAETCINO5OSI231FE5' # your Foursquare ID
CLIENT_SECRET = '4VCMNSOP3VFRO5JTURCHXL4TDY3TGM0C2S1GCU5Y3FMUXPZE' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


Your credentails:
CLIENT_ID: 02G45DAR5A4SZEQXV5ZJ5EKTZCEBQSTAETCINO5OSI231FE5
CLIENT_SECRET:4VCMNSOP3VFRO5JTURCHXL4TDY3TGM0C2S1GCU5Y3FMUXPZE


I create a function to get the venues in the neighborhoods from Foursquare  

In [29]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [31]:


bcn_venues = getNearbyVenues(names=new_df['Neighborhoods'],
                                   latitudes=new_df['Latitude'],
                                   longitudes=new_df['Longitude']
                                  )



La Barceloneta
 Gothic
 Raval
 Santa Caterina i la Ribera
L'Antiga Esquerra de l'Eixample
 La Nova Esquerra de l'Eixample
 Dreta de l'Eixample
 Fort Pienc
 Sagrada Família
 Sant Antoni
La Bordeta
 La Font de la Guatlla
 Hostafrancs
 La Marina de Port
 La Marina del Prat
 El Poble-sec
 Sants
 Sants-Badal
 Zona Franca-Port
Les Corts
 La Maternitat i Sant Ramon
 Pedralbes
El Putget i Farró
 Sarrià
 Sant Gervasi  la Bonanova
 Sant Gervasi  Galvany
 les Tres Torres
 Vallvidrera
 el Tibidabo i les Planes
Vila de Gràcia
 Camp d'en Grassot i Gràcia Nova
 La Salut
 Vallcarca i els Penitents
El Baix Guinardó
 El Guinardó
 Can Baró
 El Carmel
 La Font d'en Fargues
 Horta
 La Clota
 Montbau
 Sant Genís dels Agudells
 La Teixonera
 Vall d'Hebron
Can Peguera
 Ciutat Meridiana
 La Guineueta
 Porta
 La Prosperitat
 Roquetes
 Torre Baró
 La Trinitat Nova
 El Turó de la Peira
 Verdum
 Vilapicina i la Torre
 Llobeta
Baró de Viver
 Bon Pastor
 El Congrés i els Indians
 Sant Andreu de Palomar
 La Sagrera
E

In [32]:
print(bcn_venues.shape)
bcn_venues.head()

(2955, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,La Barceloneta,41.380653,2.189927,Baluard Barceloneta,41.380047,2.18925,Bakery
1,La Barceloneta,41.380653,2.189927,BRO,41.380214,2.189007,Burger Joint
2,La Barceloneta,41.380653,2.189927,Somorrostro,41.379156,2.1891,Spanish Restaurant
3,La Barceloneta,41.380653,2.189927,La Cova Fumada,41.379254,2.189254,Tapas Restaurant
4,La Barceloneta,41.380653,2.189927,Plaça de la Barceloneta,41.379739,2.188135,Plaza


How many venues are there per neighborhood? 

In [33]:
bcn_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bon Pastor,6,6,6,6,6,6
Camp d'en Grassot i Gràcia Nova,22,22,22,22,22,22
Can Baró,27,27,27,27,27,27
Ciutat Meridiana,7,7,7,7,7,7
Diagonal,56,56,56,56,56,56
...,...,...,...,...,...,...
L'Antiga Esquerra de l'Eixample,100,100,100,100,100,100
La Barceloneta,100,100,100,100,100,100
La Bordeta,29,29,29,29,29,29
Les Corts,75,75,75,75,75,75


In [34]:
print('There are {} uniques categories.'.format(len(bcn_venues['Venue Category'].unique())))

There are 280 uniques categories.


In [35]:
# I print the categories to see which correspond to restaurants 
bcn_venues['Venue Category'].unique()

array(['Bakery', 'Burger Joint', 'Spanish Restaurant', 'Tapas Restaurant',
       'Plaza', 'Mediterranean Restaurant', 'Wine Shop', 'Restaurant',
       'Salon / Barbershop', 'Paella Restaurant', 'Beer Bar',
       'Pizza Place', 'Beach', 'Sushi Restaurant',
       'Argentinian Restaurant', 'Market', 'Fish & Chips Shop', 'Bar',
       'Italian Restaurant', 'Food & Drink Shop', 'Steakhouse',
       'Brazilian Restaurant', 'Cocktail Bar', 'Coffee Shop', 'Juice Bar',
       'South American Restaurant', 'Ice Cream Shop', 'BBQ Joint',
       'Hotel', 'College Residence Hall', 'History Museum',
       'Hawaiian Restaurant', 'Vegetarian / Vegan Restaurant',
       'Board Shop', 'Circus', 'Athletics & Sports', 'Surf Spot',
       'Seafood Restaurant', 'Breakfast Spot', 'Soccer Field', 'Food',
       'Fast Food Restaurant', 'Café', 'Museum', 'Deli / Bodega',
       'Turkish Restaurant', 'Park', 'Brewery', 'Hot Dog Joint',
       'Wine Bar', 'Neighborhood', 'Bridge', 'Snack Place',
       'Desse

We want to keep **retaurants** only into the resuting dataframe, so lets drop rows that do not contain Restaurant/Bodega in the venue category   

In [36]:
bcn_restaurants = bcn_venues[bcn_venues['Venue Category'].str.contains("Restaurant", case=False)|bcn_venues['Venue Category'].str.contains("Bodega", case=False)|bcn_venues['Venue Category'].str.contains("Food", case=False)] 

In [37]:
bcn_restaurants.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
2,La Barceloneta,41.380653,2.189927,Somorrostro,41.379156,2.1891,Spanish Restaurant
3,La Barceloneta,41.380653,2.189927,La Cova Fumada,41.379254,2.189254,Tapas Restaurant
5,La Barceloneta,41.380653,2.189927,Rumbanroll,41.380597,2.187807,Mediterranean Restaurant
7,La Barceloneta,41.380653,2.189927,La Bombeta,41.380521,2.187573,Tapas Restaurant
8,La Barceloneta,41.380653,2.189927,La Barra Carles Abellan,41.379838,2.187712,Restaurant


In [38]:
# How many restaurants are in the neighnorhoods? 
bcn_restaurants.shape

(979, 7)

There are 967 restaurants. 

In [39]:
# Let's print the categories 
bcn_restaurants ['Venue Category'].unique()

array(['Spanish Restaurant', 'Tapas Restaurant',
       'Mediterranean Restaurant', 'Restaurant', 'Paella Restaurant',
       'Sushi Restaurant', 'Argentinian Restaurant', 'Italian Restaurant',
       'Food & Drink Shop', 'Brazilian Restaurant',
       'South American Restaurant', 'Hawaiian Restaurant',
       'Vegetarian / Vegan Restaurant', 'Seafood Restaurant', 'Food',
       'Fast Food Restaurant', 'Deli / Bodega', 'Turkish Restaurant',
       'Greek Restaurant', 'Ramen Restaurant', 'Mexican Restaurant',
       'Asian Restaurant', 'Portuguese Restaurant', 'Japanese Restaurant',
       'Empanada Restaurant', 'Falafel Restaurant', 'Food Court',
       'Russian Restaurant', 'Molecular Gastronomy Restaurant',
       'Latin American Restaurant', 'Gluten-free Restaurant',
       'Indian Restaurant', 'Thai Restaurant', 'Peruvian Restaurant',
       'Korean Restaurant', 'Chinese Restaurant',
       'Eastern European Restaurant', 'Health Food Store',
       'Szechuan Restaurant', 'Food Truc

In [40]:
# How many greek restaurants? 
bcn_restaurants[bcn_restaurants['Venue Category'].str.contains("greek", case=False)] 


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
116,Gothic,41.381505,2.177418,Dionisos Quick Greek,41.380538,2.177297,Greek Restaurant


Apparently there is only 1 restaurant, so there is hardly any competition!  

The above implies we might need different criteria to choose. 
One important criterium is the density of restaurants. Another is the type of cousine: Italian/spanish/Mediterannean is closest to greek, so ideally the restaurant 
would be better off in a neighborhood with less mediterranean-type restaurants, i.e. with less competition.   

First, lets filter the neighborhoods which are further away from city centre 

In [41]:
lat_bcn_centre = find_lon_lat('Pl Catalunya, Barcelona, Spain')[0]
lon_bcn_centre = find_lon_lat('Pl Catalunya, Barcelona, Spain')[1]

print (lat_bcn_centre)
print (lon_bcn_centre)

41.3868794
2.170067825120773


And define a function to find the distance from the above centre 

In [42]:
def haversine_distance(lat1, lon1):
    lat2=lat_bcn_centre
    lon2=lon_bcn_centre
    r = 6371
    phi1 = np.radians(lat1)
    phi2 = np.radians(lat2)
    delta_phi = np.radians(lat2 - lat1)
    delta_lambda = np.radians(lon2 - lon1)
    a = np.sin(delta_phi / 2)**2 + np.cos(phi1) * np.cos(phi2) *   np.sin(delta_lambda / 2)**2
    res = r * (2 * np.arctan2(np.sqrt(a), np.sqrt(1 - a)))
    return np.round(res, 2)

Lets test this for one neighborhood 

In [43]:
test_neighborhood = bcn_restaurants.iloc[1,0]
test_lat = bcn_restaurants.iloc[1,1]
test_lon = bcn_restaurants.iloc[1,2]
print('Test the distance from neighborhood {} with lat={} and lon={}'.format(test_neighborhood, test_lat, test_lon))

Test the distance from neighborhood La Barceloneta with lat=41.3806533 and lon=2.1899274


In [44]:
dis = haversine_distance (test_lat, test_lon)
print('The distance of the neighborhood {} from centre is {} km'.format(  test_neighborhood, dis))

The distance of the neighborhood La Barceloneta from centre is 1.8 km


In [45]:
# Calculate distances 

distance = []
for index in range(bcn_restaurants.shape[0]):
# for index in range(10):
    # print (new_df.iloc[index,1]+" , Barcelona, Spain")
    # print (find_lon_lat(new_df.iloc[index,1]+" , Barcelona, Spain" )[1] )
    distance.append( haversine_distance( bcn_restaurants.iloc[index,1], bcn_restaurants.iloc[index,2]  )) 
    

In [46]:
# Append distance to bcn_restaursnts   
bcn_restaurants.insert(3, 'Distance from centre', distance)
bcn_restaurants.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Distance from centre,Venue,Venue Latitude,Venue Longitude,Venue Category
2,La Barceloneta,41.380653,2.189927,1.8,Somorrostro,41.379156,2.1891,Spanish Restaurant
3,La Barceloneta,41.380653,2.189927,1.8,La Cova Fumada,41.379254,2.189254,Tapas Restaurant
5,La Barceloneta,41.380653,2.189927,1.8,Rumbanroll,41.380597,2.187807,Mediterranean Restaurant
7,La Barceloneta,41.380653,2.189927,1.8,La Bombeta,41.380521,2.187573,Tapas Restaurant
8,La Barceloneta,41.380653,2.189927,1.8,La Barra Carles Abellan,41.379838,2.187712,Restaurant


Lets get rid off neighborhoods further than 4 km from city centre 

In [47]:
bcn_restaurants.drop ( bcn_restaurants[bcn_restaurants['Distance from centre'] > 4.].index, inplace=True )
bcn_restaurants.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Distance from centre,Venue,Venue Latitude,Venue Longitude,Venue Category
2,La Barceloneta,41.380653,2.189927,1.8,Somorrostro,41.379156,2.1891,Spanish Restaurant
3,La Barceloneta,41.380653,2.189927,1.8,La Cova Fumada,41.379254,2.189254,Tapas Restaurant
5,La Barceloneta,41.380653,2.189927,1.8,Rumbanroll,41.380597,2.187807,Mediterranean Restaurant
7,La Barceloneta,41.380653,2.189927,1.8,La Bombeta,41.380521,2.187573,Tapas Restaurant
8,La Barceloneta,41.380653,2.189927,1.8,La Barra Carles Abellan,41.379838,2.187712,Restaurant


In [48]:
bcn_restaurants.shape

(744, 8)

From 967, we are down to 735 restuarants 

### Analyze Neighborhoods 

In [49]:
# one hot encoding
bcn_onehot = pd.get_dummies(bcn_restaurants[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
bcn_onehot['Neighborhood'] = bcn_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [bcn_onehot.columns[-1]] + list(bcn_onehot.columns[:-1])
bcn_onehot = bcn_onehot[fixed_columns]


print (bcn_onehot.shape)

(744, 60)


In [50]:
bcn_onehot.head()

Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Brazilian Restaurant,Cambodian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Deli / Bodega,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Halal Restaurant,Hawaiian Restaurant,Health Food Store,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,Paella Restaurant,Peruvian Restaurant,Polish Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Russian Restaurant,Scandinavian Restaurant,Seafood Restaurant,South American Restaurant,Spanish Restaurant,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Tapas Restaurant,Thai Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant
2,La Barceloneta,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
3,La Barceloneta,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
5,La Barceloneta,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,La Barceloneta,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
8,La Barceloneta,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [51]:
bcn_grouped = bcn_onehot.groupby('Neighborhood').mean().reset_index()
bcn_grouped

Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Brazilian Restaurant,Cambodian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Deli / Bodega,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Halal Restaurant,Hawaiian Restaurant,Health Food Store,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,Paella Restaurant,Peruvian Restaurant,Polish Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Russian Restaurant,Scandinavian Restaurant,Seafood Restaurant,South American Restaurant,Spanish Restaurant,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Tapas Restaurant,Thai Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant
0,Camp d'en Grassot i Gràcia Nova,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.090909,0.0,0.090909,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.090909,0.0,0.0,0.181818,0.0,0.0,0.090909,0.0,0.0
1,Can Baró,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.363636,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0
2,Diagonal,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.2,0.0,0.0,0.066667,0.0,0.0
3,Dreta de l'Eixample,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.05,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.15,0.0,0.0,0.05,0.0,0.2,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0
4,El Camp de l'Arpa del Clot,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.12,0.0,0.0,0.0,0.04,0.0,0.12,0.0,0.08,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.16,0.0,0.0,0.0,0.04,0.2,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0
5,El Clot,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.133333,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.3,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.033333
6,El Guinardó,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0
7,El Parc i la Llacuna del Poblenou,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.185185,0.0,0.0,0.0,0.111111,0.037037,0.0,0.037037,0.0,0.0
8,El Poble-sec,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.043478,0.0,0.086957,0.043478,0.0,0.0,0.0,0.0,0.086957,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.043478,0.0,0.086957,0.0,0.0,0.0,0.434783,0.0,0.0,0.0,0.0,0.0
9,El Poblenou,0.0,0.0,0.0,0.040816,0.040816,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.020408,0.061224,0.102041,0.020408,0.020408,0.0,0.0,0.020408,0.142857,0.020408,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.122449,0.0,0.0,0.040816,0.0,0.142857,0.020408,0.0,0.0,0.081633,0.0,0.020408,0.020408,0.0,0.0


### Cluster Neighborhoods 

In [52]:
bcn_grouped.shape 

(33, 60)

In [53]:
# import k-means from clustering stage
from sklearn.cluster import KMeans


In [54]:
# set number of clusters
kclusters = 7

bcn_grouped_clustering = bcn_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bcn_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 


array([2, 0, 2, 6, 6, 6, 5, 6, 1, 2], dtype=int32)

In [55]:
len(kmeans.labels_)

33

In [56]:
bcn_grouped = bcn_restaurants.groupby('Neighborhood').mean().reset_index()
bcn_grouped.shape

(33, 6)

In [57]:
bcn_grouped.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Distance from centre,Venue Latitude,Venue Longitude
0,Camp d'en Grassot i Gràcia Nova,41.406706,2.165419,2.24,41.406125,2.16432
1,Can Baró,41.416092,2.162402,3.31,41.414885,2.160461
2,Diagonal,41.395291,2.159959,1.26,41.396491,2.160473
3,Dreta de l'Eixample,41.394124,2.166471,0.86,41.394208,2.166176
4,El Camp de l'Arpa del Clot,41.410754,2.182816,2.86,41.41049,2.182214


In [58]:
# add clustering labels
bcn_grouped.insert(1, 'Cluster Labels', kmeans.labels_)
bcn_grouped.head()


Unnamed: 0,Neighborhood,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Distance from centre,Venue Latitude,Venue Longitude
0,Camp d'en Grassot i Gràcia Nova,2,41.406706,2.165419,2.24,41.406125,2.16432
1,Can Baró,0,41.416092,2.162402,3.31,41.414885,2.160461
2,Diagonal,2,41.395291,2.159959,1.26,41.396491,2.160473
3,Dreta de l'Eixample,6,41.394124,2.166471,0.86,41.394208,2.166176
4,El Camp de l'Arpa del Clot,6,41.410754,2.182816,2.86,41.41049,2.182214


In [59]:


# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors



In [60]:
# create map
map_clusters = folium.Map(location=[lat_bcn, lon_bcn], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bcn_grouped['Neighborhood Latitude'], \
                                  bcn_grouped['Neighborhood Longitude'], \
                                  bcn_grouped['Neighborhood'], \
                                  bcn_grouped['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Analysis of the Clusters

We merge the two dataframes 

In [61]:
bcn_merged = pd.merge(bcn_restaurants, bcn_grouped, on='Neighborhood')
bcn_merged.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude_x,Neighborhood Longitude_x,Distance from centre_x,Venue,Venue Latitude_x,Venue Longitude_x,Venue Category,Cluster Labels,Neighborhood Latitude_y,Neighborhood Longitude_y,Distance from centre_y,Venue Latitude_y,Venue Longitude_y
0,La Barceloneta,41.380653,2.189927,1.8,Somorrostro,41.379156,2.1891,Spanish Restaurant,2,41.380653,2.189927,1.8,41.379777,2.188619
1,La Barceloneta,41.380653,2.189927,1.8,La Cova Fumada,41.379254,2.189254,Tapas Restaurant,2,41.380653,2.189927,1.8,41.379777,2.188619
2,La Barceloneta,41.380653,2.189927,1.8,Rumbanroll,41.380597,2.187807,Mediterranean Restaurant,2,41.380653,2.189927,1.8,41.379777,2.188619
3,La Barceloneta,41.380653,2.189927,1.8,La Bombeta,41.380521,2.187573,Tapas Restaurant,2,41.380653,2.189927,1.8,41.379777,2.188619
4,La Barceloneta,41.380653,2.189927,1.8,La Barra Carles Abellan,41.379838,2.187712,Restaurant,2,41.380653,2.189927,1.8,41.379777,2.188619


Lets see the venues categories for each cluster and single out the cluster with the only greek restaurant in town

In [200]:
for cluster in range(8):
    print('cluster number {}'.format(cluster)) 
    print(bcn_merged.loc[bcn_merged['Cluster Labels'] == cluster]['Venue Category'].unique())
    print ("Does this cluster contain the only BCN Greek venue?")
    print(bcn_merged.loc[bcn_merged['Cluster Labels'] == cluster]['Venue Category']\
          .str.contains("greek", case=False).unique())
           

cluster number 0
['Spanish Restaurant' 'Restaurant' 'Italian Restaurant' 'Tapas Restaurant'
 'Chinese Restaurant' 'Cambodian Restaurant']
Does this cluster contain the only BCN Greek venue?
[False]
cluster number 1
['Italian Restaurant' 'Spanish Restaurant' 'Tapas Restaurant'
 'Greek Restaurant' 'Ramen Restaurant' 'Mediterranean Restaurant'
 'Mexican Restaurant' 'Restaurant' 'Asian Restaurant'
 'Argentinian Restaurant' 'Seafood Restaurant' 'Food & Drink Shop'
 'Food Court' 'Japanese Restaurant' 'German Restaurant'
 'Peruvian Restaurant' 'Vietnamese Restaurant' 'Turkish Restaurant'
 'Chinese Restaurant' 'Health Food Store' 'Halal Restaurant'
 'Middle Eastern Restaurant' 'Ethiopian Restaurant' 'Fast Food Restaurant'
 'Molecular Gastronomy Restaurant']
Does this cluster contain the only BCN Greek venue?
[False  True]
cluster number 2
['Spanish Restaurant' 'Tapas Restaurant' 'Mediterranean Restaurant'
 'Restaurant' 'Paella Restaurant' 'Sushi Restaurant'
 'Argentinian Restaurant' 'Italian R

Cluster 7 is empty so we exclude it from now on. The only greek venue is in Cluster 1. 

#### We want to see the frequency of venues similar to Greek (italin, taps, spanish, mediterranean) at the different clusters (excluding the 7th cluster, which is empty). Density here is Greek-similar restaurants of each neighborhood the cluster divided by the total number of restaurants in the same neighborhood.  

#### Cluster 0

In [140]:
cluster0= bcn_merged.loc[bcn_merged['Cluster Labels'] == 0]

t1 = cluster0[ cluster0['Venue Category'].str.contains("greek|italian|tapas|mediterranean|spanish", case=False)].\
groupby('Neighborhood').count ()

t2 = cluster0.groupby('Neighborhood').count()

# t1/t2 is the ratio of similar-to-greek cuisine restaurants to total restaurants in each neighborhood  
t1/t2
# cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|spanish", case=False)].\
# groupby('Neighborhood').count ()/test 

Unnamed: 0_level_0,Neighborhood Latitude_x,Neighborhood Longitude_x,Distance from centre_x,Venue,Venue Latitude_x,Venue Longitude_x,Venue Category,Cluster Labels,Neighborhood Latitude_y,Neighborhood Longitude_y,Distance from centre_y,Venue Latitude_y,Venue Longitude_y
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Can Baró,0.636364,0.636364,0.636364,0.636364,0.636364,0.636364,0.636364,0.636364,0.636364,0.636364,0.636364,0.636364,0.636364


#### Cluster 1 

In [141]:
cluster1= bcn_merged.loc[bcn_merged['Cluster Labels'] == 1]

t1 = cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|spanish", case=False)].\
groupby('Neighborhood').count ()

t2 = cluster1.groupby('Neighborhood').count()

# t1/t2 is the ratio of similar-to-greek cuisine restaurants to total restaurants in each neighborhood  
t1/t2
# cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|spanish", case=False)].\
# groupby('Neighborhood').count ()/test 

Unnamed: 0_level_0,Neighborhood Latitude_x,Neighborhood Longitude_x,Distance from centre_x,Venue,Venue Latitude_x,Venue Longitude_x,Venue Category,Cluster Labels,Neighborhood Latitude_y,Neighborhood Longitude_y,Distance from centre_y,Venue Latitude_y,Venue Longitude_y
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
El Poble-sec,0.695652,0.695652,0.695652,0.695652,0.695652,0.695652,0.695652,0.695652,0.695652,0.695652,0.695652,0.695652,0.695652
Gothic,0.818182,0.818182,0.818182,0.818182,0.818182,0.818182,0.818182,0.818182,0.818182,0.818182,0.818182,0.818182,0.818182
Hostafrancs,0.565217,0.565217,0.565217,0.565217,0.565217,0.565217,0.565217,0.565217,0.565217,0.565217,0.565217,0.565217,0.565217
Santa Caterina i la Ribera,0.625,0.625,0.625,0.625,0.625,0.625,0.625,0.625,0.625,0.625,0.625,0.625,0.625
Sants,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55
El Putget i Farró,0.615385,0.615385,0.615385,0.615385,0.615385,0.615385,0.615385,0.615385,0.615385,0.615385,0.615385,0.615385,0.615385


#### Cluster 2

In [254]:
cluster2= bcn_merged.loc[bcn_merged['Cluster Labels'] == 2]

t1 = cluster2[ cluster2['Venue Category'].str.contains("greek|italian|tapas|mediterranean|spanish", case=False)].\
groupby('Neighborhood').count ()

t2 = cluster2.groupby('Neighborhood').count()

# t1/t2 is the ratio of similar-to-greek cuisine restaurants to total restaurants in each neighborhood  
t1/t2
# cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|spanish", case=False)].\
# groupby('Neighborhood').count ()/test 

Unnamed: 0_level_0,Neighborhood Latitude_x,Neighborhood Longitude_x,Distance from centre_x,Venue,Venue Latitude_x,Venue Longitude_x,Venue Category,Cluster Labels,Neighborhood Latitude_y,Neighborhood Longitude_y,Distance from centre_y,Venue Latitude_y,Venue Longitude_y
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Camp d'en Grassot i Gràcia Nova,0.454545,0.454545,0.454545,0.454545,0.454545,0.454545,0.454545,0.454545,0.454545,0.454545,0.454545,0.454545,0.454545
Diagonal,0.533333,0.533333,0.533333,0.533333,0.533333,0.533333,0.533333,0.533333,0.533333,0.533333,0.533333,0.533333,0.533333
El Poblenou,0.469388,0.469388,0.469388,0.469388,0.469388,0.469388,0.469388,0.469388,0.469388,0.469388,0.469388,0.469388,0.469388
Fort Pienc,0.153846,0.153846,0.153846,0.153846,0.153846,0.153846,0.153846,0.153846,0.153846,0.153846,0.153846,0.153846,0.153846
La Nova Esquerra de l'Eixample,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333
La Vila Olímpica del Poblenou,0.35,0.35,0.35,0.35,0.35,0.35,0.35,0.35,0.35,0.35,0.35,0.35,0.35
Sagrada Família,0.238095,0.238095,0.238095,0.238095,0.238095,0.238095,0.238095,0.238095,0.238095,0.238095,0.238095,0.238095,0.238095
Sant Antoni,0.518519,0.518519,0.518519,0.518519,0.518519,0.518519,0.518519,0.518519,0.518519,0.518519,0.518519,0.518519,0.518519
Sant Gervasi Galvany,0.457143,0.457143,0.457143,0.457143,0.457143,0.457143,0.457143,0.457143,0.457143,0.457143,0.457143,0.457143,0.457143
Sants-Badal,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333,0.333333


#### Cluster 3

In [144]:
cluster3= bcn_merged.loc[bcn_merged['Cluster Labels'] == 3]

t1 = cluster3[ cluster3['Venue Category'].str.contains("greek|italian|tapas|mediterranean|spanish", case=False)].\
groupby('Neighborhood').count ()

t2 = cluster3.groupby('Neighborhood').count()

# t1/t2 is the ratio of similar-to-greek cuisine restaurants to total restaurants in each neighborhood  
t1/t2
# cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|spanish", case=False)].\
# groupby('Neighborhood').count ()/test 

Unnamed: 0_level_0,Neighborhood Latitude_x,Neighborhood Longitude_x,Distance from centre_x,Venue,Venue Latitude_x,Venue Longitude_x,Venue Category,Cluster Labels,Neighborhood Latitude_y,Neighborhood Longitude_y,Distance from centre_y,Venue Latitude_y,Venue Longitude_y
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
La Salut,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2


#### Cluster 4

In [145]:
cluster4= bcn_merged.loc[bcn_merged['Cluster Labels'] == 4]

t1 = cluster4[ cluster4['Venue Category'].str.contains("greek|italian|tapas|mediterranean|spanish", case=False)].\
groupby('Neighborhood').count ()

t2 = cluster4.groupby('Neighborhood').count()

# t1/t2 is the ratio of similar-to-greek cuisine restaurants to total restaurants in each neighborhood  
t1/t2
# cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|spanish", case=False)].\
# groupby('Neighborhood').count ()/test 

Unnamed: 0_level_0,Neighborhood Latitude_x,Neighborhood Longitude_x,Distance from centre_x,Venue,Venue Latitude_x,Venue Longitude_x,Venue Category,Cluster Labels,Neighborhood Latitude_y,Neighborhood Longitude_y,Distance from centre_y,Venue Latitude_y,Venue Longitude_y
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Vallcarca i els Penitents,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5


#### Cluster 5

In [146]:
cluster5= bcn_merged.loc[bcn_merged['Cluster Labels'] == 5]

t1 = cluster5[ cluster5['Venue Category'].str.contains("greek|italian|tapas|mediterranean|spanish", case=False)].\
groupby('Neighborhood').count ()

t2 = cluster5.groupby('Neighborhood').count()

# t1/t2 is the ratio of similar-to-greek cuisine restaurants to total restaurants in each neighborhood  
t1/t2
# cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|spanish", case=False)].\
# groupby('Neighborhood').count ()/test 

Unnamed: 0_level_0,Neighborhood Latitude_x,Neighborhood Longitude_x,Distance from centre_x,Venue,Venue Latitude_x,Venue Longitude_x,Venue Category,Cluster Labels,Neighborhood Latitude_y,Neighborhood Longitude_y,Distance from centre_y,Venue Latitude_y,Venue Longitude_y
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
El Guinardó,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6
La Bordeta,0.444444,0.444444,0.444444,0.444444,0.444444,0.444444,0.444444,0.444444,0.444444,0.444444,0.444444,0.444444,0.444444


#### Cluster 6

In [147]:
cluster6= bcn_merged.loc[bcn_merged['Cluster Labels'] == 6]

t1 = cluster6[ cluster6['Venue Category'].str.contains("greek|italian|tapas|mediterranean|spanish", case=False)].\
groupby('Neighborhood').count ()

t2 = cluster6.groupby('Neighborhood').count()

# t1/t2 is the ratio of similar-to-greek cuisine restaurants to total restaurants in each neighborhood  
t1/t2
# cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|spanish", case=False)].\
# groupby('Neighborhood').count ()/test 

Unnamed: 0_level_0,Neighborhood Latitude_x,Neighborhood Longitude_x,Distance from centre_x,Venue,Venue Latitude_x,Venue Longitude_x,Venue Category,Cluster Labels,Neighborhood Latitude_y,Neighborhood Longitude_y,Distance from centre_y,Venue Latitude_y,Venue Longitude_y
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Dreta de l'Eixample,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75
El Camp de l'Arpa del Clot,0.52,0.52,0.52,0.52,0.52,0.52,0.52,0.52,0.52,0.52,0.52,0.52,0.52
El Clot,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6
El Parc i la Llacuna del Poblenou,0.592593,0.592593,0.592593,0.592593,0.592593,0.592593,0.592593,0.592593,0.592593,0.592593,0.592593,0.592593,0.592593
La Font de la Guatlla,0.647059,0.647059,0.647059,0.647059,0.647059,0.647059,0.647059,0.647059,0.647059,0.647059,0.647059,0.647059,0.647059
Raval,0.636364,0.636364,0.636364,0.636364,0.636364,0.636364,0.636364,0.636364,0.636364,0.636364,0.636364,0.636364,0.636364
Les Corts,0.44,0.44,0.44,0.44,0.44,0.44,0.44,0.44,0.44,0.44,0.44,0.44,0.44


We exclude clusters other than 2&6 (and the empty 7) because they 
have high density of similar-to-greek venues. What about Cluster 3?

In [206]:
cluster3.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude_x,Neighborhood Longitude_x,Distance from centre_x,Venue,Venue Latitude_x,Venue Longitude_x,Venue Category,Cluster Labels,Neighborhood Latitude_y,Neighborhood Longitude_y,Distance from centre_y,Venue Latitude_y,Venue Longitude_y,Population
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
La Salut,5,5,5,5,5,5,5,5,5,5,5,5,5,5


It seems like a residential area with few restaurants, lets confirm this by comparing with rest:  

In [207]:
cluster2.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude_x,Neighborhood Longitude_x,Distance from centre_x,Venue,Venue Latitude_x,Venue Longitude_x,Venue Category,Cluster Labels,Neighborhood Latitude_y,Neighborhood Longitude_y,Distance from centre_y,Venue Latitude_y,Venue Longitude_y,Population
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Camp d'en Grassot i Gràcia Nova,11,11,11,11,11,11,11,11,11,11,11,11,11,11
Sant Antoni,27,27,27,27,27,27,27,27,27,27,27,27,27,27
El Baix Guinardó,21,21,21,21,21,21,21,21,21,21,21,21,21,21
La Barceloneta,51,51,51,51,51,51,51,51,51,51,51,51,51,51
Vila de Gràcia,31,31,31,31,31,31,31,31,31,31,31,31,31,31


In [208]:
cluster6.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude_x,Neighborhood Longitude_x,Distance from centre_x,Venue,Venue Latitude_x,Venue Longitude_x,Venue Category,Cluster Labels,Neighborhood Latitude_y,Neighborhood Longitude_y,Distance from centre_y,Venue Latitude_y,Venue Longitude_y
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Dreta de l'Eixample,20,20,20,20,20,20,20,20,20,20,20,20,20
El Camp de l'Arpa del Clot,25,25,25,25,25,25,25,25,25,25,25,25,25
El Clot,30,30,30,30,30,30,30,30,30,30,30,30,30
El Parc i la Llacuna del Poblenou,27,27,27,27,27,27,27,27,27,27,27,27,27
La Font de la Guatlla,17,17,17,17,17,17,17,17,17,17,17,17,17
Raval,33,33,33,33,33,33,33,33,33,33,33,33,33
Les Corts,25,25,25,25,25,25,25,25,25,25,25,25,25


It seems like La Salut, with its only 5 restuarants, is residential area, so we also exclude it.
We are left iwth cluster 2 & 6. Cluster 6 seems to have a higher density of similar-to-greek venues per neighborhood compared to cluster 2, so we leave it out of the analysis and keep cluster 1. 

#### Let's visualize the remaining cluster 2   

In [222]:
cluster2['Neighborhood'].unique()

array(['La Barceloneta', ' Sant Antoni', 'Vila de Gràcia',
       " Camp d'en Grassot i Gràcia Nova", 'El Baix Guinardó'],
      dtype=object)

In [221]:
# create map
map_clusters = folium.Map(location=[lat_bcn, lon_bcn], zoom_start=11)

kclusters_reduced = 2
# set color scheme for the clusters
x = np.arange(kclusters_reduced+20)
ys = [i + x + (i*x)**2 for i in range(kclusters_reduced+20)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi in zip(cluster2['Neighborhood Latitude_x'],\
                                  cluster2['Neighborhood Longitude_x'],\
                                  cluster2['Neighborhood']): 
                                  
    label = folium.Popup(str(poi), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Results and discussion  <a name="results and discussion"></a> 

### We further refine the search by using the additional criteria: 
1. The population in each neighborhood/cluster
2. The total number of restaurants in each neighborhood/cluster    

Ideally we should opt for the cluster where restaurants/population is small, thus more potential customers 

Lets take population data from https://www.bcn.cat/estadistica/angles/dades/tpob/pad/padro/evo/t3.htm 
I downloaded them into an csv file, cleaned it and saved it locally

In [223]:
pop = pd.read_csv("Population_barrios_Bcn.txt", sep=',\t+',delimiter=',')
# pop[['Neighborhood']]

pop['Neighborhood'] = pop["Neighborhood"].str.strip()
pop['Population'] = pop["Population"].str.strip()
pop

Unnamed: 0,Neighborhood,Population
0,el Raval,48.297
1,el Barri Gòtic,19.180
2,la Barceloneta,15.173
3,Sant Pere Santa Caterina i la Ribera,23.170
4,el Fort Pienc,32.649
...,...,...
68,Diagonal Mar i el Front Marítim del Poblenou,13.625
69,el Besòs i el Maresme,24.660
70,Provençals del Poblenou,21.303
71,Sant Martí de Provençals,26.168


In [224]:
pop['Neighborhood'] = pop['Neighborhood'].apply(lambda x : x.replace ('el Barri Gòtic','Gothic',100) )
pop['Neighborhood'] = pop['Neighborhood'].apply(lambda x : x.replace ('Putxet','Putget',100) )
pop['Neighborhood'] = pop['Neighborhood'].apply(lambda x : x.replace ('la Font de la Guatlla','Font de  Guatl',100) )

 
pop.drop(pop[pop['Neighborhood']=="la Clota"].index, inplace= True)
#     bcn_grouped_reduced = bcn_grouped.drop ( bcn_grouped[bcn_grouped['Cluster Labels'] ==0.].index)
pop


Unnamed: 0,Neighborhood,Population
0,el Raval,48.297
1,Gothic,19.180
2,la Barceloneta,15.173
3,Sant Pere Santa Caterina i la Ribera,23.170
4,el Fort Pienc,32.649
...,...,...
68,Diagonal Mar i el Front Marítim del Poblenou,13.625
69,el Besòs i el Maresme,24.660
70,Provençals del Poblenou,21.303
71,Sant Martí de Provençals,26.168


In [225]:
neighborhoods_reduced  = cluster2['Neighborhood'].unique()
print(len(neighborhoods_reduced))
neighborhoods_reduced

5


array(['La Barceloneta', ' Sant Antoni', 'Vila de Gràcia',
       " Camp d'en Grassot i Gràcia Nova", 'El Baix Guinardó'],
      dtype=object)

In [227]:

# neighborhoods_reduced= list(map(lambda s: s.replace('La' , ''), neighborhoods_reduced))
# neighborhoods_reduced= list(map(lambda s: s.replace('El' , ''), neighborhoods_reduced))
# neighborhoods_reduced= list(map(lambda s: s.replace('la' , ''), neighborhoods_reduced))
# neighborhoods_reduced= list(map(lambda s: s.replace('i  Ribera' , 'i la Ribera'), neighborhoods_reduced))
# neighborhoods_reduced= list(map(lambda s: s.replace('Poble-sec' , 'Poble sec'), neighborhoods_reduced))
# neighborhoods_reduced= list(map(lambda s: s.replace('Vi de' , 'Vila de'), neighborhoods_reduced))
# neighborhoods_reduced= list(map(lambda s: s.replace('i  Lcuna' , 'i la Llacuna'), neighborhoods_reduced))
# neighborhoods_reduced= list(map(lambda s: s.replace('i Farró' , 'i el Farró'), neighborhoods_reduced))


neighborhoods_reduced= list(map(lambda s: s.strip(), neighborhoods_reduced))



# print (neighborhoods_reduced)

So we have 21 neighborhoods as candidates to choose from 

In [228]:
pattern = '|'.join(neighborhoods_reduced) #.replace(" ", "")
pattern

"La Barceloneta|Sant Antoni|Vila de Gràcia|Camp d'en Grassot i Gràcia Nova|El Baix Guinardó"

In [229]:
pop['Neighborhood'].unique()

array(['el Raval', 'Gothic', 'la Barceloneta',
       'Sant Pere Santa Caterina i la Ribera', 'el Fort Pienc',
       'la Sagrada Família', "la Dreta de l'Eixample",
       "l'Antiga Esquerra de l'Eixample",
       "la Nova Esquerra de l'Eixample", 'Sant Antoni',
       'el Poble Sec - AEI Parc Montjuïc',
       'la Marina del Prat Vermell - AEI Zona Franca',
       'la Marina de Port', 'Font de  Guatl', 'Hostafrancs', 'la Bordeta',
       'Sants - Badal', 'Sants', 'les Corts',
       'la Maternitat i Sant Ramon', 'Pedralbes',
       'Vallvidrera el Tibidabo i les Plane', 'Sarrià', 'les Tres Torres',
       'Sant Gervasi - la Bonanova', 'Sant Gervasi - Galvany',
       'el Putget i el Farró', 'Vallcarca i els Penitents', 'el Coll',
       'la Salut', 'la Vila de Gràcia',
       "el Camp d'en Grassot i Gràcia Nova", 'el Baix Guinardó',
       'Can Baró', 'el Guinardó', "la Font d'en Fargues", 'el Carmel',
       'la Teixonera', 'Sant Genís dels Agudells', 'Montbau',
       "la Vall d'He

In [230]:
pop_reduced = pop[pop['Neighborhood'].str.contains(pattern, case=False)] 
pop_reduced.shape


(5, 2)

In [231]:
pop_reduced

Unnamed: 0,Neighborhood,Population
2,la Barceloneta,15.173
9,Sant Antoni,38.566
30,la Vila de Gràcia,50.803
31,el Camp d'en Grassot i Gràcia Nova,35.199
32,el Baix Guinardó,25.99


In [236]:
cluster2['Neighborhood'].unique()

array(['La Barceloneta', ' Sant Antoni', 'Vila de Gràcia',
       " Camp d'en Grassot i Gràcia Nova", 'El Baix Guinardó'],
      dtype=object)

In [241]:
pop_reduced["Neighborhood"].replace({"la Barceloneta": "La Barceloneta",\
                                     'Sant Antoni':' Sant Antoni',\
                                     'la Vila de Gràcia': "Vila de Gràcia",\
                                     "el Camp d'en Grassot i Gràcia Nova":" Camp d'en Grassot i Gràcia Nova",\
                                     'el Baix Guinardó':'El Baix Guinardó'}, inplace=True)


In [256]:
cluster2_pop = pd.merge(cluster2, pop_reduced, on='Neighborhood')
cluster2_pop.shape

(141, 15)

In [255]:
cluster2.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude_x,Neighborhood Longitude_x,Distance from centre_x,Venue,Venue Latitude_x,Venue Longitude_x,Venue Category,Cluster Labels,Neighborhood Latitude_y,Neighborhood Longitude_y,Distance from centre_y,Venue Latitude_y,Venue Longitude_y
0,La Barceloneta,41.380653,2.189927,1.8,Somorrostro,41.379156,2.1891,Spanish Restaurant,2,41.380653,2.189927,1.8,41.379777,2.188619
1,La Barceloneta,41.380653,2.189927,1.8,La Cova Fumada,41.379254,2.189254,Tapas Restaurant,2,41.380653,2.189927,1.8,41.379777,2.188619
2,La Barceloneta,41.380653,2.189927,1.8,Rumbanroll,41.380597,2.187807,Mediterranean Restaurant,2,41.380653,2.189927,1.8,41.379777,2.188619
3,La Barceloneta,41.380653,2.189927,1.8,La Bombeta,41.380521,2.187573,Tapas Restaurant,2,41.380653,2.189927,1.8,41.379777,2.188619
4,La Barceloneta,41.380653,2.189927,1.8,La Barra Carles Abellan,41.379838,2.187712,Restaurant,2,41.380653,2.189927,1.8,41.379777,2.188619


Lets examine polulation per neighborhood

In [257]:
# pop2 = cluster2.groupby['Population']#.unique()#.astype(float).sum()*1.e3
# pop2/ 



t1 = cluster2_pop.groupby('Neighborhood').count ()
t1

# t1/t2 is the ratio of similar-to-greek cuisine restaurants to total restaurants in each neighborhood  
# t1/t2
# cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|spanish", case=False)].\
# groupby('Neighborhood').count ()/test 

# cluster3= bcn_pop.loc[bcn_pop['Cluster Labels'] == 3]
# cluster3
# pop3 = cluster3['Population'].unique().astype(float).sum()*1.e3


# print (pop2)
# print (pop3)

# cluster1.shape 
# test = cluster1.groupby('Venue Category').count().sum()['Neighborhood']
# # print (test['Neighborhood'])

# cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|italian", case=False)].\
# groupby('Neighborhood').count ()
# cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|spanish", case=False)].\
# groupby('Neighborhood').count ()/test 

Unnamed: 0_level_0,Neighborhood Latitude_x,Neighborhood Longitude_x,Distance from centre_x,Venue,Venue Latitude_x,Venue Longitude_x,Venue Category,Cluster Labels,Neighborhood Latitude_y,Neighborhood Longitude_y,Distance from centre_y,Venue Latitude_y,Venue Longitude_y,Population
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Camp d'en Grassot i Gràcia Nova,11,11,11,11,11,11,11,11,11,11,11,11,11,11
Sant Antoni,27,27,27,27,27,27,27,27,27,27,27,27,27,27
El Baix Guinardó,21,21,21,21,21,21,21,21,21,21,21,21,21,21
La Barceloneta,51,51,51,51,51,51,51,51,51,51,51,51,51,51
Vila de Gràcia,31,31,31,31,31,31,31,31,31,31,31,31,31,31


In [278]:
testdf = cluster2_pop.groupby('Population').count().reset_index()
testdf1= testdf['Population'].astype(float)
testdf2= testdf['Neighborhood']
testdf2/testdf1 
# cluster2_pop.head()

0    3.361234
1    0.808003
2    0.312509
3    0.700099
4    0.610200
dtype: float64

the Neighborhoods with less density (venues/population) is number 2, 4 & 3. Which nbeighborhoods are those? 

In [288]:

pop_reduced[pop_reduced['Population'].astype(float)==testdf1[2]]         

Unnamed: 0,Neighborhood,Population
31,Camp d'en Grassot i Gràcia Nova,35.199


In [289]:
pop_reduced[pop_reduced['Population'].astype(float)==testdf1[4]]     

Unnamed: 0,Neighborhood,Population
30,Vila de Gràcia,50.803


In [290]:
pop_reduced[pop_reduced['Population'].astype(float)==testdf1[3]]     

Unnamed: 0,Neighborhood,Population
9,Sant Antoni,38.566


### Conclusion <a name="conclusion"></a> 

After the final refinment, based on the number of restaurants per population, our final conlusion consists of 3 neihgborhoods:    
- **Camp d'en Grassot i Gràcia Nova** 
- **Vila de Gràcia** 
- **Sant Antoni** 

Of course in this analysis we did not take other factors into account, like: tourist movement, number of hotels in area, rental/buying prices for property, public transport and accessibility, crime rate. Based on the machine learning techniques, the conclusion above can be a first step to a more thorough analysis.    