### Neighborhoods in Barcelona  
My first step is to create a dataframe with all the neighborhoods and districts in Barcelona.  I can do this by scraping the names from the wikipedia page, and using the Nominatim module to find the corresponding latitudes/longitudes 

### Table of contents

Introduction where you discuss the business problem and who would be interested in this project.

Data where you describe the data that will be used to solve the problem and the source of the data.

Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.

Results section where you discuss the results.

Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.

Conclusion section where you conclude the report.

First I import limbraries 

In [102]:
# Import libraries 
import requests
import lxml.html as lh
import pandas as pd 
from bs4 import BeautifulSoup

# !conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

# !conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Libraries imported.')

Folium installed
Libraries imported.


In [142]:
import numpy as np 

Now I extract district/neighbohood data from the wikipedia page using the Beautiful Soup package 

In [103]:
# extract data using Beautiful Soup 
url='https://en.wikipedia.org/wiki/Districts_of_Barcelona'
res = requests.get(url)

soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')
data = pd.read_html(str(table))
df = pd.DataFrame(data[7])

The resulting dataframe is quite large: I change the default display  so I can visualize the dataframe properly

In [104]:
pd.set_option('display.max_columns', None)  
pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', -1)

In [105]:
df 

Unnamed: 0,vteDistricts and neighbourhoods of Barcelona,vteDistricts and neighbourhoods of Barcelona.1
0,Ciutat Vella,"La Barceloneta Gothic Quarter El Raval Sant Pere, Santa Caterina i la Ribera"
1,L'Eixample,L'Antiga Esquerra de l'Eixample La Nova Esquerra de l'Eixample Dreta de l'Eixample Fort Pienc Sagrada Família Sant Antoni
2,Sants-Montjuïc,La Bordeta La Font de la Guatlla Hostafrancs La Marina de Port La Marina del Prat Vermell El Poble-sec Sants Sants-Badal Montjuïc Zona Franca – Port
3,Les Corts,Les Corts La Maternitat i Sant Ramon Pedralbes
4,Sarrià-Sant Gervasi,"El Putget i Farró Sarrià Sant Gervasi – la Bonanova Sant Gervasi – Galvany les Tres Torres Vallvidrera, el Tibidabo i les Planes"
5,Gràcia,Vila de Gràcia Camp d'en Grassot i Gràcia Nova La Salut El Coll Vallcarca i els Penitents
6,Horta-Guinardó,El Baix Guinardó El Guinardó Can Baró El Carmel La Font d'en Fargues Horta La Clota Montbau Sant Genís dels Agudells La Teixonera Vall d'Hebron
7,Nou Barris,Can Peguera Canyelles Ciutat Meridiana La Guineueta Porta La Prosperitat Roquetes Torre Baró La Trinitat Nova El Turó de la Peira Vallbona Verdum Vilapicina i la Torre Llobeta
8,Sant Andreu,Baró de Viver Bon Pastor El Congrés i els Indians Navas Sant Andreu de Palomar La Sagrera Trinitat Vella
9,Sant Martí,El Besòs i el Maresme El Clot El Camp de l'Arpa del Clot Diagonal Mar i el Front Marítim del Poblenou El Parc i la Llacuna del Poblenou El Poblenou Provençals del Poblenou Sant Martí de Provençals La Verneda i la Pau La Vila Olímpica del Poblenou


Here I change the column names to "Districts" and "Neighborhoods" 

In [106]:
df.rename(columns={'vteDistricts and neighbourhoods of Barcelona':'Districts'}, inplace=True)

In [107]:
df.rename(columns={'vteDistricts and neighbourhoods of Barcelona.1':'Neighborhoods'}, inplace=True)

Here I separate the neighborhood values with commas (",") since while web scraping those commas were lost 

In [108]:
# Replace the character " " with a comma in neighborhood  
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace (' ',', ',100) )

In [109]:
df 

Unnamed: 0,Districts,Neighborhoods
0,Ciutat Vella,"La, Barceloneta, Gothic, Quarter, El, Raval, Sant, Pere,, Santa, Caterina, i, la, Ribera"
1,L'Eixample,"L'Antiga, Esquerra, de, l'Eixample, La, Nova, Esquerra, de, l'Eixample, Dreta, de, l'Eixample, Fort, Pienc, Sagrada, Família, Sant, Antoni"
2,Sants-Montjuïc,"La, Bordeta, La, Font, de, la, Guatlla, Hostafrancs, La, Marina, de, Port, La, Marina, del, Prat, Vermell, El, Poble-sec, Sants, Sants-Badal, Montjuïc, Zona, Franca, –, Port"
3,Les Corts,"Les, Corts, La, Maternitat, i, Sant, Ramon, Pedralbes"
4,Sarrià-Sant Gervasi,"El, Putget, i, Farró, Sarrià, Sant, Gervasi, –, la, Bonanova, Sant, Gervasi, –, Galvany, les, Tres, Torres, Vallvidrera,, el, Tibidabo, i, les, Planes"
5,Gràcia,"Vila, de, Gràcia, Camp, d'en, Grassot, i, Gràcia, Nova, La, Salut, El, Coll, Vallcarca, i, els, Penitents"
6,Horta-Guinardó,"El, Baix, Guinardó, El, Guinardó, Can, Baró, El, Carmel, La, Font, d'en, Fargues, Horta, La, Clota, Montbau, Sant, Genís, dels, Agudells, La, Teixonera, Vall, d'Hebron"
7,Nou Barris,"Can, Peguera, Canyelles, Ciutat, Meridiana, La, Guineueta, Porta, La, Prosperitat, Roquetes, Torre, Baró, La, Trinitat, Nova, El, Turó, de, la, Peira, Vallbona, Verdum, Vilapicina, i, la, Torre, Llobeta"
8,Sant Andreu,"Baró, de, Viver, Bon, Pastor, El, Congrés, i, els, Indians, Navas, Sant, Andreu, de, Palomar, La, Sagrera, Trinitat, Vella"
9,Sant Martí,"El, Besòs, i, el, Maresme, El, Clot, El, Camp, de, l'Arpa, del, Clot, Diagonal, Mar, i, el, Front, Marítim, del, Poblenou, El, Parc, i, la, Llacuna, del, Poblenou, El, Poblenou, Provençals, del, Poblenou, Sant, Martí, de, Provençals, La, Verneda, i, la, Pau, La, Vila, Olímpica, del, Poblenou"


Some neighborhood names contain more than a word, which now are separated by commas. Here I fix such neighborhoods names 

In [110]:
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('La,','La',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('de,','de',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('del,','del',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace (', de',' de',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('El,','El',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('el,','el',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('Can,','Can',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('i,','i',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace (', i',' i',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('Sant,','Sant',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('Santa,','Santa',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('la,','la',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('les,','les',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('Les,','Les',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace (',,',',',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('Zona, Franca, –, Port','Zona Franca-Port',100) )
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('La Trinitat, Nova,','La Trinitat Nova,',100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('Sagrada, Familia','Sagrada Familia',100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('Gràcia, Nova,','Gràcia Nova,',100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('en,','en',100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('Nova, Esquerra','Nova Esquerra',100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('Sant Gervasi –,','Sant Gervasi ',100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('Fort,','Fort',100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace (' les Tres, Torres',' les Tres Torres,',100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ('la Guatlla Hostafrancs,','la Guatlla, Hostafrancs,,',100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace (", d'en "," d'en ",100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ("Baix, Guinardó","Baix Guinardó",100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ("Carmel","Carmel,",100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ("Vall, ","Vall ",100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ("els,","els",100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ("Torre, Baró","Torre Baró",100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ("Bon,","Bon",100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ("Canyelles","Canyelles,",100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ("Ciutat, Meridiana,","Ciutat Meridiana,",100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace ("Quarter, El Raval","Raval",100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace (",,",",",100))
df['Neighborhoods'] = df['Neighborhoods'].apply(lambda x : x.replace (' Sagrada, Família',' Sagrada Família',100))
 
# 
df

Unnamed: 0,Districts,Neighborhoods
0,Ciutat Vella,"La Barceloneta, Gothic, Raval, Sant Pere, Santa Caterina i la Ribera"
1,L'Eixample,"L'Antiga, Esquerra de l'Eixample, La Nova Esquerra de l'Eixample, Dreta de l'Eixample, Fort Pienc, Sagrada Família, Sant Antoni"
2,Sants-Montjuïc,"La Bordeta, La Font de la Guatlla, Hostafrancs, La Marina de Port, La Marina del Prat, Vermell, El Poble-sec, Sants, Sants-Badal, Montjuïc, Zona Franca-Port"
3,Les Corts,"Les Corts, La Maternitat i Sant Ramon, Pedralbes"
4,Sarrià-Sant Gervasi,"El Putget i Farró, Sarrià, Sant Gervasi la Bonanova, Sant Gervasi Galvany, les Tres Torres, Vallvidrera, el Tibidabo i les Planes"
5,Gràcia,"Vila de Gràcia, Camp d'en Grassot i Gràcia Nova, La Salut, El Coll, Vallcarca i els Penitents"
6,Horta-Guinardó,"El Baix Guinardó, El Guinardó, Can Baró, El Carmel, La Font d'en Fargues, Horta, La Clota, Montbau, Sant Genís dels Agudells, La Teixonera, Vall d'Hebron"
7,Nou Barris,"Can Peguera, Canyelles, Ciutat Meridiana, La Guineueta, Porta, La Prosperitat, Roquetes, Torre Baró, La Trinitat Nova, El Turó de la Peira, Vallbona, Verdum, Vilapicina i la Torre, Llobeta"
8,Sant Andreu,"Baró de Viver, Bon Pastor, El Congrés i els Indians, Navas, Sant Andreu de Palomar, La Sagrera, Trinitat, Vella"
9,Sant Martí,"El Besòs i el Maresme, El Clot, El Camp de l'Arpa del Clot, Diagonal, Mar i el Front, Marítim del Poblenou, El Parc i la Llacuna del Poblenou, El Poblenou, Provençals del Poblenou, Sant Martí de Provençals, La Verneda i la Pau, La Vila Olímpica del Poblenou"


The comma separated neighborhoods are all stacked in the same rowns if they belong to the same distirct. Here I separate all distinct neighborhoods into different rows  

In [111]:
# Step 1 
new_df = pd.DataFrame(df.Neighborhoods.str.split(',').tolist(), index=df.Districts).stack()
# Step 2 
new_df = new_df.reset_index([0, 'Districts'])
# Step 3 
new_df.columns = ['Districts', 'Neighborhoods']
new_df.tail ()  


Unnamed: 0,Districts,Neighborhoods
78,Sant Martí,El Poblenou
79,Sant Martí,Provençals del Poblenou
80,Sant Martí,Sant Martí de Provençals
81,Sant Martí,La Verneda i la Pau
82,Sant Martí,La Vila Olímpica del Poblenou


How many neighborhoods are there in Barcelona? 

In [112]:
new_df.shape 

(83, 2)

So there are 84 beughborhoods we can choose from.     
I want to extract latitude/longitude for all the neighborhoods. I will do this using the Nominatim module 

In [113]:
# address = 'Sant Andreu de Palomar,'
def find_lon_lat(address): 
    geolocator = Nominatim(user_agent="ny_explorer ")
    location = geolocator.geocode(address,timeout=10000)
    latitude = location.latitude
    longitude = location.longitude
    return [latitude, longitude]

find_lon_lat('El Coll Barcelona, Spain')


[41.6512892, 1.9584116]

In [114]:
new_df.head ()

Unnamed: 0,Districts,Neighborhoods
0,Ciutat Vella,La Barceloneta
1,Ciutat Vella,Gothic
2,Ciutat Vella,Raval
3,Ciutat Vella,Sant Pere
4,Ciutat Vella,Santa Caterina i la Ribera


In [115]:
new_df.shape[0]

83

In [116]:
new_df.iloc[80:87,1]

80     Sant Martí de Provençals     
81     La Verneda i la Pau          
82     La Vila Olímpica del Poblenou
Name: Neighborhoods, dtype: object

I extract the neighborhoods lat/lons 

In [117]:
lon = []
lat = []
for index in range(new_df.shape[0]):
# for index in range(10):
    # print (new_df.iloc[index,1]+" , Barcelona, Spain")
    # print (find_lon_lat(new_df.iloc[index,1]+" , Barcelona, Spain" )[1] )
    lat.append( find_lon_lat( new_df.iloc[index,1]+" , Barcelona, Spain" )  [0] ) 
    lon.append( find_lon_lat( new_df.iloc[index,1]+" , Barcelona, Spain" )  [1] )   


I append longitude and latitude to the dataframe 

In [118]:
new_df["Latitude"] = lat 
new_df["Longitude"] = lon 

In [119]:
new_df.tail ()

Unnamed: 0,Districts,Neighborhoods,Latitude,Longitude
78,Sant Martí,El Poblenou,41.400527,2.201729
79,Sant Martí,Provençals del Poblenou,41.41236,2.204885
80,Sant Martí,Sant Martí de Provençals,41.416519,2.198968
81,Sant Martí,La Verneda i la Pau,41.42322,2.20294
82,Sant Martí,La Vila Olímpica del Poblenou,41.389868,2.196846


I drop neighborhoods that are far from the city centre 

In [120]:
new_df.drop ( new_df[new_df['Longitude'] < 2.].index, inplace=True )
new_df.shape


(74, 4)

I visualize the neighborhoods using a folium map 

In [121]:
lat_bcn = find_lon_lat('Barcelona, Spain')[0]
lon_bcn = find_lon_lat('Barcelona, Spain')[1]

In [122]:
# create map of Barcelona using latitude and longitude values
map_bcn = folium.Map(location=[lat_bcn, lon_bcn], zoom_start=10)

# add markers to map
for lat, lng, districts, neighborhoods in zip(new_df['Latitude'], new_df['Longitude'], new_df['Districts'], new_df['Neighborhoods']):
    label = '{}, {}'.format(neighborhoods, districts)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bcn)  
    
map_bcn

### Foursquare

I use Foursquare API to get info on restaurants in each neighborhood.

We're interested in venues in 'food' category, but only those that are proper restaurants - coffe shops, pizza places, bakeries etc. are not direct competitors so we don't care about those. So we will include in out list only venues that have 'restaurant' in category name, and we'll make sure to detect and include all the subcategories of specific 'Italian restaurant' category, as we need info on Italian restaurants in the neighborhood.



In [123]:
CLIENT_ID = '02G45DAR5A4SZEQXV5ZJ5EKTZCEBQSTAETCINO5OSI231FE5' # your Foursquare ID
CLIENT_SECRET = '4VCMNSOP3VFRO5JTURCHXL4TDY3TGM0C2S1GCU5Y3FMUXPZE' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


Your credentails:
CLIENT_ID: 02G45DAR5A4SZEQXV5ZJ5EKTZCEBQSTAETCINO5OSI231FE5
CLIENT_SECRET:4VCMNSOP3VFRO5JTURCHXL4TDY3TGM0C2S1GCU5Y3FMUXPZE


I create a function to get the venues in the neighborhoodsfrom Foursquare  

In [124]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [125]:
# type your answer here

bcn_venues = getNearbyVenues(names=new_df['Neighborhoods'],
                                   latitudes=new_df['Latitude'],
                                   longitudes=new_df['Longitude']
                                  )



La Barceloneta
 Gothic
 Raval
 Santa Caterina i la Ribera
L'Antiga
 Esquerra de l'Eixample
 La Nova Esquerra de l'Eixample
 Dreta de l'Eixample
 Fort Pienc
 Sagrada Família
 Sant Antoni
La Bordeta
 La Font de la Guatlla
 Hostafrancs
 La Marina de Port
 La Marina del Prat
 El Poble-sec
 Sants
 Sants-Badal
 Zona Franca-Port
Les Corts
 La Maternitat i Sant Ramon
 Pedralbes
El Putget i Farró
 Sarrià
 Sant Gervasi  la Bonanova
 Sant Gervasi  Galvany
 les Tres Torres
 Vallvidrera
 el Tibidabo i les Planes
Vila de Gràcia
 Camp d'en Grassot i Gràcia Nova
 La Salut
 Vallcarca i els Penitents
El Baix Guinardó
 El Guinardó
 Can Baró
 El Carmel
 La Font d'en Fargues
 Horta
 La Clota
 Montbau
 Sant Genís dels Agudells
 La Teixonera
 Vall d'Hebron
Can Peguera
 Ciutat Meridiana
 La Guineueta
 Porta
 La Prosperitat
 Roquetes
 Torre Baró
 La Trinitat Nova
 El Turó de la Peira
 Verdum
 Vilapicina i la Torre
 Llobeta
Baró de Viver
 Bon Pastor
 El Congrés i els Indians
 Sant Andreu de Palomar
 La Sagrera


In [126]:
print(bcn_venues.shape)
bcn_venues.head()

(3040, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,La Barceloneta,41.380653,2.189927,Baluard Barceloneta,41.380047,2.18925,Bakery
1,La Barceloneta,41.380653,2.189927,BRO,41.380214,2.189007,Burger Joint
2,La Barceloneta,41.380653,2.189927,Somorrostro,41.379156,2.1891,Spanish Restaurant
3,La Barceloneta,41.380653,2.189927,La Cova Fumada,41.379254,2.189254,Tapas Restaurant
4,La Barceloneta,41.380653,2.189927,Plaça de la Barceloneta,41.379739,2.188135,Plaza


How many venues per neighborhood? 

In [127]:
bcn_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bon Pastor,5,5,5,5,5,5
Camp d'en Grassot i Gràcia Nova,18,18,18,18,18,18
Can Baró,28,28,28,28,28,28
Ciutat Meridiana,7,7,7,7,7,7
Diagonal,58,58,58,58,58,58
...,...,...,...,...,...,...
L'Antiga,100,100,100,100,100,100
La Barceloneta,100,100,100,100,100,100
La Bordeta,30,30,30,30,30,30
Les Corts,74,74,74,74,74,74


In [128]:
print('There are {} uniques categories.'.format(len(bcn_venues['Venue Category'].unique())))

There are 282 uniques categories.


In [129]:
# I print the categories to see which correspond to restaurants 
bcn_venues['Venue Category'].unique()

array(['Bakery', 'Burger Joint', 'Spanish Restaurant', 'Tapas Restaurant',
       'Plaza', 'Mediterranean Restaurant', 'Wine Shop', 'Restaurant',
       'Salon / Barbershop', 'Paella Restaurant', 'Beer Bar',
       'Pizza Place', 'Beach', 'Sushi Restaurant',
       'Argentinian Restaurant', 'Market', 'Fish & Chips Shop', 'Bar',
       'Italian Restaurant', 'Food & Drink Shop', 'Steakhouse',
       'Brazilian Restaurant', 'Cocktail Bar', 'Coffee Shop', 'Juice Bar',
       'South American Restaurant', 'Ice Cream Shop', 'BBQ Joint',
       'Hotel', 'College Residence Hall', 'History Museum',
       'Hawaiian Restaurant', 'Vegetarian / Vegan Restaurant',
       'Board Shop', 'Circus', 'Athletics & Sports', 'Surf Spot',
       'Seafood Restaurant', 'Breakfast Spot', 'Soccer Field', 'Food',
       'Fast Food Restaurant', 'Café', 'Museum', 'Deli / Bodega',
       'Turkish Restaurant', 'Park', 'Brewery', 'Hot Dog Joint',
       'Wine Bar', 'Neighborhood', 'Bridge', 'Dessert Shop',
       'Gree

We want to keep retaurants into the resuting datafraome, so lets drop rows that do not contain Restaurant/Bodega in the venue category   

In [130]:
bcn_restaurants = bcn_venues[bcn_venues['Venue Category'].str.contains("Restaurant", case=False)|bcn_venues['Venue Category'].str.contains("Bodega", case=False)|bcn_venues['Venue Category'].str.contains("Food", case=False)] 

In [131]:
bcn_restaurants.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
2,La Barceloneta,41.380653,2.189927,Somorrostro,41.379156,2.1891,Spanish Restaurant
3,La Barceloneta,41.380653,2.189927,La Cova Fumada,41.379254,2.189254,Tapas Restaurant
5,La Barceloneta,41.380653,2.189927,Rumbanroll,41.380597,2.187807,Mediterranean Restaurant
7,La Barceloneta,41.380653,2.189927,La Bombeta,41.380521,2.187573,Tapas Restaurant
8,La Barceloneta,41.380653,2.189927,La Barra Carles Abellan,41.379838,2.187712,Restaurant


In [132]:
# How many restaurants are in the city? 
bcn_restaurants.shape

(1002, 7)

In [133]:
# Let's print the categories 
bcn_restaurants ['Venue Category'].unique()

array(['Spanish Restaurant', 'Tapas Restaurant',
       'Mediterranean Restaurant', 'Restaurant', 'Paella Restaurant',
       'Sushi Restaurant', 'Argentinian Restaurant', 'Italian Restaurant',
       'Food & Drink Shop', 'Brazilian Restaurant',
       'South American Restaurant', 'Hawaiian Restaurant',
       'Vegetarian / Vegan Restaurant', 'Seafood Restaurant', 'Food',
       'Fast Food Restaurant', 'Deli / Bodega', 'Turkish Restaurant',
       'Greek Restaurant', 'Ramen Restaurant', 'Mexican Restaurant',
       'Asian Restaurant', 'Portuguese Restaurant', 'Japanese Restaurant',
       'Empanada Restaurant', 'Food Court', 'Russian Restaurant',
       'Molecular Gastronomy Restaurant', 'Latin American Restaurant',
       'Gluten-free Restaurant', 'Indian Restaurant', 'Thai Restaurant',
       'Peruvian Restaurant', 'Falafel Restaurant', 'Korean Restaurant',
       'Chinese Restaurant', 'Eastern European Restaurant',
       'Health Food Store', 'Szechuan Restaurant', 'Food Truck',
   

In [134]:
# How many greek restaurants? 
bcn_restaurants[bcn_restaurants['Venue Category'].str.contains("greek", case=False)] 


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
114,Gothic,41.381505,2.177418,Dionisos Quick Greek,41.380538,2.177297,Greek Restaurant


Apparently there is only 1 restaurant, so there is hardly any competition!  

The above implies we might need different criteria to choose. One criteria would be density of restaurants. Another wold be type of cousine: Mediterannean is closest to greek, so ideally the restaurant would be better in a neighborhood with less mediterranean restaurants.   

First, lets filter the neighborhoods which are further away from city centre 

In [135]:
lat_bcn_centre = find_lon_lat('Pl Catalunya, Barcelona, Spain')[0]
lon_bcn_centre = find_lon_lat('Pl Catalunya, Barcelona, Spain')[1]

print (lat_bcn_centre)
print (lon_bcn_centre)

41.3868794
2.170067825120773


And define a function to find the distance from the above centre 

In [143]:
def haversine_distance(lat1, lon1):
    lat2=lat_bcn_centre
    lon2=lon_bcn_centre
    r = 6371
    phi1 = np.radians(lat1)
    phi2 = np.radians(lat2)
    delta_phi = np.radians(lat2 - lat1)
    delta_lambda = np.radians(lon2 - lon1)
    a = np.sin(delta_phi / 2)**2 + np.cos(phi1) * np.cos(phi2) *   np.sin(delta_lambda / 2)**2
    res = r * (2 * np.arctan2(np.sqrt(a), np.sqrt(1 - a)))
    return np.round(res, 2)

Lets test this for one neighborhood 

In [140]:
test_neighborhood = bcn_restaurants.iloc[1,0]
test_lat = bcn_restaurants.iloc[1,1]
test_lon = bcn_restaurants.iloc[1,2]
print('Test the distance from neighborhood {} with lat={} and lon={}'.format(test_neighborhood, test_lat, test_lon))

Test the distance from neighborhood La Barceloneta with lat=41.3806533 and lon=2.1899274


In [145]:
dis = haversine_distance (test_lat, test_lon)
print('The distance of the neighborhood {} from centre is {} km'.format(  test_neighborhood, dis))

The distance of the neighborhood La Barceloneta from centre is 1.8 km


In [146]:
# Calculate distances 

distance = []
for index in range(bcn_restaurants.shape[0]):
# for index in range(10):
    # print (new_df.iloc[index,1]+" , Barcelona, Spain")
    # print (find_lon_lat(new_df.iloc[index,1]+" , Barcelona, Spain" )[1] )
    distance.append( haversine_distance( bcn_restaurants.iloc[index,1], bcn_restaurants.iloc[index,2]  )) 
    

In [147]:
# Append distance to bcn_restaursnts   
bcn_restaurants.insert(3, 'Distance from centre', distance)
bcn_restaurants.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Distance from centre,Venue,Venue Latitude,Venue Longitude,Venue Category
2,La Barceloneta,41.380653,2.189927,1.8,Somorrostro,41.379156,2.1891,Spanish Restaurant
3,La Barceloneta,41.380653,2.189927,1.8,La Cova Fumada,41.379254,2.189254,Tapas Restaurant
5,La Barceloneta,41.380653,2.189927,1.8,Rumbanroll,41.380597,2.187807,Mediterranean Restaurant
7,La Barceloneta,41.380653,2.189927,1.8,La Bombeta,41.380521,2.187573,Tapas Restaurant
8,La Barceloneta,41.380653,2.189927,1.8,La Barra Carles Abellan,41.379838,2.187712,Restaurant


Lets get rid off neighborhoods further than 4 km from city centre 

In [152]:
bcn_restaurants.drop ( bcn_restaurants[bcn_restaurants['Distance from centre'] > 4.].index, inplace=True )
bcn_restaurants.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Distance from centre,Venue,Venue Latitude,Venue Longitude,Venue Category
2,La Barceloneta,41.380653,2.189927,1.8,Somorrostro,41.379156,2.1891,Spanish Restaurant
3,La Barceloneta,41.380653,2.189927,1.8,La Cova Fumada,41.379254,2.189254,Tapas Restaurant
5,La Barceloneta,41.380653,2.189927,1.8,Rumbanroll,41.380597,2.187807,Mediterranean Restaurant
7,La Barceloneta,41.380653,2.189927,1.8,La Bombeta,41.380521,2.187573,Tapas Restaurant
8,La Barceloneta,41.380653,2.189927,1.8,La Barra Carles Abellan,41.379838,2.187712,Restaurant


In [153]:
bcn_restaurants.shape

(770, 8)

### Analyze Neighborhoods 

In [154]:
# one hot encoding
bcn_onehot = pd.get_dummies(bcn_restaurants[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
bcn_onehot['Neighborhood'] = bcn_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [bcn_onehot.columns[-1]] + list(bcn_onehot.columns[:-1])
bcn_onehot = bcn_onehot[fixed_columns]


print (bcn_onehot.shape)

(770, 60)


In [155]:
bcn_onehot.head()

Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Brazilian Restaurant,Cambodian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Deli / Bodega,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Halal Restaurant,Hawaiian Restaurant,Health Food Store,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,Paella Restaurant,Peruvian Restaurant,Polish Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Russian Restaurant,Scandinavian Restaurant,Seafood Restaurant,South American Restaurant,Spanish Restaurant,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Tapas Restaurant,Thai Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant
2,La Barceloneta,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
3,La Barceloneta,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
5,La Barceloneta,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,La Barceloneta,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
8,La Barceloneta,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [156]:
bcn_grouped = bcn_onehot.groupby('Neighborhood').mean().reset_index()
bcn_grouped

Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Brazilian Restaurant,Cambodian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Deli / Bodega,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Halal Restaurant,Hawaiian Restaurant,Health Food Store,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,Paella Restaurant,Peruvian Restaurant,Polish Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Russian Restaurant,Scandinavian Restaurant,Seafood Restaurant,South American Restaurant,Spanish Restaurant,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Tapas Restaurant,Thai Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant
0,Camp d'en Grassot i Gràcia Nova,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.125,0.0,0.0,0.25,0.0,0.0,0.125,0.0,0.0
1,Can Baró,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.363636,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0
2,Diagonal,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.2,0.0,0.0,0.066667,0.0,0.0
3,Dreta de l'Eixample,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.190476,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.047619,0.0,0.238095,0.0,0.0,0.0,0.238095,0.0,0.0,0.0,0.0,0.0
4,El Camp de l'Arpa del Clot,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.115385,0.0,0.0,0.0,0.038462,0.0,0.153846,0.0,0.076923,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.153846,0.0,0.0,0.0,0.038462,0.153846,0.0,0.0,0.0,0.115385,0.0,0.0,0.0,0.0,0.0
5,El Clot,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.068966,0.034483,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.034483,0.0,0.137931,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.103448,0.0,0.0,0.0,0.0,0.241379,0.0,0.0,0.0,0.172414,0.0,0.0,0.0,0.0,0.034483
6,El Guinardó,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0
7,El Parc i la Llacuna del Poblenou,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.0,0.0,0.0,0.0,0.173913,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.217391,0.0,0.0,0.0,0.0,0.173913,0.0,0.0,0.0,0.130435,0.043478,0.0,0.043478,0.0,0.0
8,El Poble-sec,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.1,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0
9,El Poblenou,0.0,0.0,0.0,0.040816,0.040816,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.020408,0.061224,0.102041,0.020408,0.020408,0.0,0.0,0.020408,0.142857,0.020408,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.122449,0.0,0.0,0.040816,0.0,0.142857,0.020408,0.0,0.0,0.081633,0.0,0.020408,0.020408,0.0,0.0


### Cluster Neighborhoods 

In [157]:
bcn_grouped.shape 

(34, 60)

In [158]:
# import k-means from clustering stage
from sklearn.cluster import KMeans


In [159]:
# set number of clusters
kclusters = 7

bcn_grouped_clustering = bcn_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bcn_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 


array([1, 4, 5, 4, 0, 4, 3, 0, 6, 0], dtype=int32)

In [160]:
len(kmeans.labels_)

34

In [161]:
bcn_grouped = bcn_restaurants.groupby('Neighborhood').mean().reset_index()
bcn_grouped.shape

(34, 6)

In [162]:
bcn_grouped.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Distance from centre,Venue Latitude,Venue Longitude
0,Camp d'en Grassot i Gràcia Nova,41.406706,2.165419,2.24,41.405701,2.16418
1,Can Baró,41.416092,2.162402,3.31,41.414885,2.160461
2,Diagonal,41.395291,2.159959,1.26,41.396491,2.160473
3,Dreta de l'Eixample,41.394124,2.166471,0.86,41.393734,2.165533
4,El Camp de l'Arpa del Clot,41.410754,2.182816,2.86,41.410706,2.18205


In [163]:
# add clustering labels
bcn_grouped.insert(1, 'Cluster Labels', kmeans.labels_)
bcn_grouped.head()


Unnamed: 0,Neighborhood,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Distance from centre,Venue Latitude,Venue Longitude
0,Camp d'en Grassot i Gràcia Nova,1,41.406706,2.165419,2.24,41.405701,2.16418
1,Can Baró,4,41.416092,2.162402,3.31,41.414885,2.160461
2,Diagonal,5,41.395291,2.159959,1.26,41.396491,2.160473
3,Dreta de l'Eixample,4,41.394124,2.166471,0.86,41.393734,2.165533
4,El Camp de l'Arpa del Clot,0,41.410754,2.182816,2.86,41.410706,2.18205


In [164]:


# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors



In [165]:
# create map
map_clusters = folium.Map(location=[lat_bcn, lon_bcn], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bcn_grouped['Neighborhood Latitude'], bcn_grouped['Neighborhood Longitude'], bcn_grouped['Neighborhood'], bcn_grouped['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Analysis of the Clusters

We merge the two dataframes 

In [166]:
bcn_merged = pd.merge(bcn_restaurants, bcn_grouped, on='Neighborhood')
bcn_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Neighborhood Latitude_x,Neighborhood Longitude_x,Distance from centre_x,Venue,Venue Latitude_x,Venue Longitude_x,Venue Category,Cluster Labels,Neighborhood Latitude_y,Neighborhood Longitude_y,Distance from centre_y,Venue Latitude_y,Venue Longitude_y
0,La Barceloneta,41.380653,2.189927,1.8,Somorrostro,41.379156,2.1891,Spanish Restaurant,5,41.380653,2.189927,1.8,41.379777,2.188619
1,La Barceloneta,41.380653,2.189927,1.8,La Cova Fumada,41.379254,2.189254,Tapas Restaurant,5,41.380653,2.189927,1.8,41.379777,2.188619
2,La Barceloneta,41.380653,2.189927,1.8,Rumbanroll,41.380597,2.187807,Mediterranean Restaurant,5,41.380653,2.189927,1.8,41.379777,2.188619
3,La Barceloneta,41.380653,2.189927,1.8,La Bombeta,41.380521,2.187573,Tapas Restaurant,5,41.380653,2.189927,1.8,41.379777,2.188619
4,La Barceloneta,41.380653,2.189927,1.8,La Barra Carles Abellan,41.379838,2.187712,Restaurant,5,41.380653,2.189927,1.8,41.379777,2.188619


Lets see the venues categories for each cluster and single out the cluster with the only greek restyaurant in town

In [201]:
for cluster in range(7):
    print('cluster number {}'.format(cluster)) 
    print(bcn_merged.loc[bcn_merged['Cluster Labels'] == cluster]['Venue Category'].unique())
    print ("Does this cluster contain the only BCN Greek venue?")
    print(bcn_merged.loc[bcn_merged['Cluster Labels'] == cluster]['Venue Category']\
          .str.contains("greek", case=False).unique())
           

cluster number 0
['Health Food Store' 'Restaurant' 'Szechuan Restaurant' 'Food Truck'
 'Chinese Restaurant' 'Food & Drink Shop' 'Spanish Restaurant'
 'Mediterranean Restaurant' 'Deli / Bodega' 'Sushi Restaurant'
 'Mexican Restaurant' 'Portuguese Restaurant' 'Italian Restaurant'
 'Seafood Restaurant' 'Vegetarian / Vegan Restaurant'
 'Vietnamese Restaurant' 'Latin American Restaurant' 'Japanese Restaurant'
 'Asian Restaurant' 'Ramen Restaurant' 'Tapas Restaurant'
 'Thai Restaurant' 'Polish Restaurant' 'Paella Restaurant'
 'Cambodian Restaurant' 'South American Restaurant'
 'Middle Eastern Restaurant' 'Moroccan Restaurant'
 'Gluten-free Restaurant' 'Food Court' 'Fast Food Restaurant'
 'Argentinian Restaurant' 'Empanada Restaurant' 'Indian Restaurant'
 'Kebab Restaurant' 'Lebanese Restaurant' 'Turkish Restaurant'
 'American Restaurant']
Does this cluster contain the only BCN Greek venue?
[False]
cluster number 1
['Italian Restaurant' 'Spanish Restaurant' 'Tapas Restaurant'
 'Greek Restaura

Cluster 7 is empty so we exclude it from no on . The only greek venue is in Cluster 1. 

### We want to see the frequency of venues similar to Greek at the different clusters (excluding the 7th) 

#### Cluster 0

In [302]:
cluster1= bcn_merged.loc[bcn_merged['Cluster Labels'] == 0]
cluster1.shape 
test = cluster1.groupby('Venue Category').count().sum()['Neighborhood']
# print (test['Neighborhood'])

cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|italian", case=False)].\
groupby('Neighborhood').count ()
cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|spanish", case=False)].\
groupby('Neighborhood').count ()/test 

Unnamed: 0_level_0,Neighborhood Latitude_x,Neighborhood Longitude_x,Distance from centre_x,Venue,Venue Latitude_x,Venue Longitude_x,Venue Category,Cluster Labels,Neighborhood Latitude_y,Neighborhood Longitude_y,Distance from centre_y,Venue Latitude_y,Venue Longitude_y
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
El Camp de l'Arpa del Clot,0.063348,0.063348,0.063348,0.063348,0.063348,0.063348,0.063348,0.063348,0.063348,0.063348,0.063348,0.063348,0.063348
El Parc i la Llacuna del Poblenou,0.058824,0.058824,0.058824,0.058824,0.058824,0.058824,0.058824,0.058824,0.058824,0.058824,0.058824,0.058824,0.058824
El Poblenou,0.104072,0.104072,0.104072,0.104072,0.104072,0.104072,0.104072,0.104072,0.104072,0.104072,0.104072,0.104072,0.104072
Fort Pienc,0.013575,0.013575,0.013575,0.013575,0.013575,0.013575,0.013575,0.013575,0.013575,0.013575,0.013575,0.013575,0.013575
La Salut,0.013575,0.013575,0.013575,0.013575,0.013575,0.013575,0.013575,0.013575,0.013575,0.013575,0.013575,0.013575,0.013575
La Vila Olímpica del Poblenou,0.063348,0.063348,0.063348,0.063348,0.063348,0.063348,0.063348,0.063348,0.063348,0.063348,0.063348,0.063348,0.063348
Sagrada Família,0.022624,0.022624,0.022624,0.022624,0.022624,0.022624,0.022624,0.022624,0.022624,0.022624,0.022624,0.022624,0.022624
les Tres Torres,0.0181,0.0181,0.0181,0.0181,0.0181,0.0181,0.0181,0.0181,0.0181,0.0181,0.0181,0.0181,0.0181
La Bordeta,0.0181,0.0181,0.0181,0.0181,0.0181,0.0181,0.0181,0.0181,0.0181,0.0181,0.0181,0.0181,0.0181
Les Corts,0.049774,0.049774,0.049774,0.049774,0.049774,0.049774,0.049774,0.049774,0.049774,0.049774,0.049774,0.049774,0.049774


#### Cluster 1 

In [303]:
cluster1= bcn_merged.loc[bcn_merged['Cluster Labels'] == 1]
cluster1.shape 
test = cluster1.groupby('Venue Category').count().sum()['Neighborhood']
# print (test['Neighborhood'])

cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|italian", case=False)].\
groupby('Neighborhood').count ()
cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|spanish", case=False)].\
groupby('Neighborhood').count ()/test 

Unnamed: 0_level_0,Neighborhood Latitude_x,Neighborhood Longitude_x,Distance from centre_x,Venue,Venue Latitude_x,Venue Longitude_x,Venue Category,Cluster Labels,Neighborhood Latitude_y,Neighborhood Longitude_y,Distance from centre_y,Venue Latitude_y,Venue Longitude_y
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Camp d'en Grassot i Gràcia Nova,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03
Gothic,0.28,0.28,0.28,0.28,0.28,0.28,0.28,0.28,0.28,0.28,0.28,0.28,0.28
Hostafrancs,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13
Sants-Badal,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05
El Baix Guinardó,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1


#### Cluster 2

In [304]:
cluster1= bcn_merged.loc[bcn_merged['Cluster Labels'] == 2]
cluster1.shape 
test = cluster1.groupby('Venue Category').count().sum()['Neighborhood']
# print (test['Neighborhood'])

cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|italian", case=False)].\
groupby('Neighborhood').count ()
cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|spanish", case=False)].\
groupby('Neighborhood').count ()/test 

Unnamed: 0_level_0,Neighborhood Latitude_x,Neighborhood Longitude_x,Distance from centre_x,Venue,Venue Latitude_x,Venue Longitude_x,Venue Category,Cluster Labels,Neighborhood Latitude_y,Neighborhood Longitude_y,Distance from centre_y,Venue Latitude_y,Venue Longitude_y
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Vallcarca i els Penitents,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5


#### Cluster 3

In [305]:
cluster1= bcn_merged.loc[bcn_merged['Cluster Labels'] == 3]
cluster1.shape 
test = cluster1.groupby('Venue Category').count().sum()['Neighborhood']
# print (test['Neighborhood'])

cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|italian", case=False)].\
groupby('Neighborhood').count ()
cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|spanish", case=False)].\
groupby('Neighborhood').count ()/test 

Unnamed: 0_level_0,Neighborhood Latitude_x,Neighborhood Longitude_x,Distance from centre_x,Venue,Venue Latitude_x,Venue Longitude_x,Venue Category,Cluster Labels,Neighborhood Latitude_y,Neighborhood Longitude_y,Distance from centre_y,Venue Latitude_y,Venue Longitude_y
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
El Guinardó,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6


#### Cluster 4

In [307]:
cluster1= bcn_merged.loc[bcn_merged['Cluster Labels'] == 4]
cluster1.shape 
test = cluster1.groupby('Venue Category').count().sum()['Neighborhood']
# print (test['Neighborhood'])

cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|italian", case=False)].\
groupby('Neighborhood').count ()
cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|spanish", case=False)].\
groupby('Neighborhood').count ()/test 

Unnamed: 0_level_0,Neighborhood Latitude_x,Neighborhood Longitude_x,Distance from centre_x,Venue,Venue Latitude_x,Venue Longitude_x,Venue Category,Cluster Labels,Neighborhood Latitude_y,Neighborhood Longitude_y,Distance from centre_y,Venue Latitude_y,Venue Longitude_y
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Can Baró,0.035176,0.035176,0.035176,0.035176,0.035176,0.035176,0.035176,0.035176,0.035176,0.035176,0.035176,0.035176,0.035176
Dreta de l'Eixample,0.070352,0.070352,0.070352,0.070352,0.070352,0.070352,0.070352,0.070352,0.070352,0.070352,0.070352,0.070352,0.070352
El Clot,0.080402,0.080402,0.080402,0.080402,0.080402,0.080402,0.080402,0.080402,0.080402,0.080402,0.080402,0.080402,0.080402
Esquerra de l'Eixample,0.100503,0.100503,0.100503,0.100503,0.100503,0.100503,0.100503,0.100503,0.100503,0.100503,0.100503,0.100503,0.100503
La Font de la Guatlla,0.050251,0.050251,0.050251,0.050251,0.050251,0.050251,0.050251,0.050251,0.050251,0.050251,0.050251,0.050251,0.050251
Raval,0.105528,0.105528,0.105528,0.105528,0.105528,0.105528,0.105528,0.105528,0.105528,0.105528,0.105528,0.105528,0.105528
L'Antiga,0.100503,0.100503,0.100503,0.100503,0.100503,0.100503,0.100503,0.100503,0.100503,0.100503,0.100503,0.100503,0.100503


#### Cluster 5

In [308]:
cluster1= bcn_merged.loc[bcn_merged['Cluster Labels'] == 5]
cluster1.shape 
test = cluster1.groupby('Venue Category').count().sum()['Neighborhood']
# print (test['Neighborhood'])

cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|italian", case=False)].\
groupby('Neighborhood').count ()
cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|spanish", case=False)].\
groupby('Neighborhood').count ()/test 

Unnamed: 0_level_0,Neighborhood Latitude_x,Neighborhood Longitude_x,Distance from centre_x,Venue,Venue Latitude_x,Venue Longitude_x,Venue Category,Cluster Labels,Neighborhood Latitude_y,Neighborhood Longitude_y,Distance from centre_y,Venue Latitude_y,Venue Longitude_y
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Diagonal,0.038278,0.038278,0.038278,0.038278,0.038278,0.038278,0.038278,0.038278,0.038278,0.038278,0.038278,0.038278,0.038278
La Nova Esquerra de l'Eixample,0.028708,0.028708,0.028708,0.028708,0.028708,0.028708,0.028708,0.028708,0.028708,0.028708,0.028708,0.028708,0.028708
Sant Antoni,0.066986,0.066986,0.066986,0.066986,0.066986,0.066986,0.066986,0.066986,0.066986,0.066986,0.066986,0.066986,0.066986
Sant Gervasi Galvany,0.076555,0.076555,0.076555,0.076555,0.076555,0.076555,0.076555,0.076555,0.076555,0.076555,0.076555,0.076555,0.076555
Sants,0.047847,0.047847,0.047847,0.047847,0.047847,0.047847,0.047847,0.047847,0.047847,0.047847,0.047847,0.047847,0.047847
El Putget i Farró,0.038278,0.038278,0.038278,0.038278,0.038278,0.038278,0.038278,0.038278,0.038278,0.038278,0.038278,0.038278,0.038278
La Barceloneta,0.114833,0.114833,0.114833,0.114833,0.114833,0.114833,0.114833,0.114833,0.114833,0.114833,0.114833,0.114833,0.114833
Vila de Gràcia,0.086124,0.086124,0.086124,0.086124,0.086124,0.086124,0.086124,0.086124,0.086124,0.086124,0.086124,0.086124,0.086124


#### Cluster 6

In [309]:
cluster1= bcn_merged.loc[bcn_merged['Cluster Labels'] == 6]
cluster1.shape 
test = cluster1.groupby('Venue Category').count().sum()['Neighborhood']
# print (test['Neighborhood'])

cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|italian", case=False)].\
groupby('Neighborhood').count ()
cluster1[ cluster1['Venue Category'].str.contains("greek|italian|tapas|mediterranean|spanish", case=False)].\
groupby('Neighborhood').count ()/test 

Unnamed: 0_level_0,Neighborhood Latitude_x,Neighborhood Longitude_x,Distance from centre_x,Venue,Venue Latitude_x,Venue Longitude_x,Venue Category,Cluster Labels,Neighborhood Latitude_y,Neighborhood Longitude_y,Distance from centre_y,Venue Latitude_y,Venue Longitude_y
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
El Poble-sec,0.441176,0.441176,0.441176,0.441176,0.441176,0.441176,0.441176,0.441176,0.441176,0.441176,0.441176,0.441176,0.441176
Santa Caterina i la Ribera,0.264706,0.264706,0.264706,0.264706,0.264706,0.264706,0.264706,0.264706,0.264706,0.264706,0.264706,0.264706,0.264706


### We can already exclude clusters 2,3 & 6 that have a high frequency of venues similar to greek venues. We are left with clusters 0, 1, 4 & 5 and need now to further refine our search 

#### Let's visualize the remaining clusters 0,1,4 & 5  

In [342]:
bcn_grouped_reduced = bcn_grouped.drop ( bcn_grouped[bcn_grouped['Cluster Labels'] ==2.].index)
bcn_grouped.drop( bcn_grouped[bcn_grouped['Cluster Labels'] ==3.].index, inplace = True)
bcn_grouped.drop( bcn_grouped[bcn_grouped['Cluster Labels'] ==6.].index, inplace = True)
bcn_grouped_reduced.head ()

Unnamed: 0,Neighborhood,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Distance from centre,Venue Latitude,Venue Longitude
0,Camp d'en Grassot i Gràcia Nova,1,41.406706,2.165419,2.24,41.405701,2.16418
1,Can Baró,4,41.416092,2.162402,3.31,41.414885,2.160461
2,Diagonal,5,41.395291,2.159959,1.26,41.396491,2.160473
3,Dreta de l'Eixample,4,41.394124,2.166471,0.86,41.393734,2.165533
4,El Camp de l'Arpa del Clot,0,41.410754,2.182816,2.86,41.410706,2.18205


In [344]:
# create map
map_clusters = folium.Map(location=[lat_bcn, lon_bcn], zoom_start=11)

kclusters_reduced = 4 
# set color scheme for the clusters
x = np.arange(kclusters_reduced+1)
ys = [i + x + (i*x)**2 for i in range(kclusters_reduced+1)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bcn_grouped_reduced['Neighborhood Latitude'],\
                                  bcn_grouped_reduced['Neighborhood Longitude'],\
                                  bcn_grouped_reduced['Neighborhood'], \
                                  bcn_grouped_reduced['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

###  What We can do from now on is to refine the search by using 2 additional criteria: 
1. The population in each neighborhood/cluster
2. The total number of restaurants in each neighborhood/cluster    

Ideally we should opt for the cluster where restaurants/population is small. 

Lets take population data from https://www.bcn.cat/estadistica/angles/dades/tpob/pad/padro/evo/t3.htm 
I downloaded them into an csv file, cleaned it and saved it locally

In [400]:
pop = pd.read_csv("Population_barrios_Bcn.txt", sep=',\t+',delimiter=',')
# pop[['Neighborhood']]

pop['Neighborhood'] = pop["Neighborhood"].str.strip()
pop['Population'] = pop["Population"].str.strip()
pop

Unnamed: 0,Neighborhood,Population
0,el Raval,48.297
1,el Barri Gòtic,19.180
2,la Barceloneta,15.173
3,Sant Pere Santa Caterina i la Ribera,23.170
4,el Fort Pienc,32.649
...,...,...
68,Diagonal Mar i el Front Marítim del Poblenou,13.625
69,el Besòs i el Maresme,24.660
70,Provençals del Poblenou,21.303
71,Sant Martí de Provençals,26.168


In [404]:
bcn_merged.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude_x,Neighborhood Longitude_x,Distance from centre_x,Venue,Venue Latitude_x,Venue Longitude_x,Venue Category,Cluster Labels,Neighborhood Latitude_y,Neighborhood Longitude_y,Distance from centre_y,Venue Latitude_y,Venue Longitude_y
0,La Barceloneta,41.380653,2.189927,1.8,Somorrostro,41.379156,2.1891,Spanish Restaurant,5,41.380653,2.189927,1.8,41.379777,2.188619
1,La Barceloneta,41.380653,2.189927,1.8,La Cova Fumada,41.379254,2.189254,Tapas Restaurant,5,41.380653,2.189927,1.8,41.379777,2.188619
2,La Barceloneta,41.380653,2.189927,1.8,Rumbanroll,41.380597,2.187807,Mediterranean Restaurant,5,41.380653,2.189927,1.8,41.379777,2.188619
3,La Barceloneta,41.380653,2.189927,1.8,La Bombeta,41.380521,2.187573,Tapas Restaurant,5,41.380653,2.189927,1.8,41.379777,2.188619
4,La Barceloneta,41.380653,2.189927,1.8,La Barra Carles Abellan,41.379838,2.187712,Restaurant,5,41.380653,2.189927,1.8,41.379777,2.188619


In [355]:
# extract data using Beautiful Soup 

# import requests
# from requests.adapters import HTTPAdapter
# from requests.packages.urllib3.util.retry import Retry
# url='https://www.bcn.cat/estadistica/angles/dades/tpob/pad/padro/evo/t3.htm'

# session = requests.Session()
# retry = Retry(connect=3, backoff_factor=0.5)
# adapter = HTTPAdapter(max_retries=retry)
# session.mount('http://', adapter)
# session.mount('https://', adapter)

# session.get(url)

res = requests.get(url, verify=False)
soup = BeautifulSoup(res.content,'lxml')
# print( soup.prettify())
table = soup.find_all('table')
data = pd.read_html(str(table))
df = pd.DataFrame(data[0])
df 



Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
0,1. Evolución de la población,1. Evolución de la población,1. Evolución de la población,1. Evolución de la población,1. Evolución de la población,1. Evolución de la población,1. Evolución de la población,1. Evolución de la población,1. Evolución de la población,1. Evolución de la población,1. Evolución de la población,1. Evolución de la población,1. Evolución de la población,1. Evolución de la población,1. Evolución de la población,1. Evolución de la población,1. Evolución de la población
1,"document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')"
2,3. Barrios (73),3. Barrios (73),3. Barrios (73),3. Barrios (73),3. Barrios (73),3. Barrios (73),3. Barrios (73),3. Barrios (73),3. Barrios (73),3. Barrios (73),3. Barrios (73),3. Barrios (73),3. Barrios (73),3. Barrios (73),3. Barrios (73),3. Barrios (73),3. Barrios (73)
3,"document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')"
4,Dto. Barrios,Dto. Barrios,Dto. Barrios,Dto. Barrios,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
82,No consta,No consta,No consta,No consta,720,952,499,251,70,145,0,1,1,0,0,0,0
83,"document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')","document.write('<img src=""' + whpath + 'images/cpbk.gif"" border=""0"" width=""100%"" height=""2"">')"
84,,,,,,,,,,,,,,,,,
85,Fuente: Ajuntament de Barcelona. Departament d'Estadística i Difusió de Dades. Lecturas del Padrón Municipal de Habitantes a 30 junio del 2007 al 2015 y a 1 enero del 2016 al 2019.,Fuente: Ajuntament de Barcelona. Departament d'Estadística i Difusió de Dades. Lecturas del Padrón Municipal de Habitantes a 30 junio del 2007 al 2015 y a 1 enero del 2016 al 2019.,Fuente: Ajuntament de Barcelona. Departament d'Estadística i Difusió de Dades. Lecturas del Padrón Municipal de Habitantes a 30 junio del 2007 al 2015 y a 1 enero del 2016 al 2019.,Fuente: Ajuntament de Barcelona. Departament d'Estadística i Difusió de Dades. Lecturas del Padrón Municipal de Habitantes a 30 junio del 2007 al 2015 y a 1 enero del 2016 al 2019.,Fuente: Ajuntament de Barcelona. Departament d'Estadística i Difusió de Dades. Lecturas del Padrón Municipal de Habitantes a 30 junio del 2007 al 2015 y a 1 enero del 2016 al 2019.,Fuente: Ajuntament de Barcelona. Departament d'Estadística i Difusió de Dades. Lecturas del Padrón Municipal de Habitantes a 30 junio del 2007 al 2015 y a 1 enero del 2016 al 2019.,Fuente: Ajuntament de Barcelona. Departament d'Estadística i Difusió de Dades. Lecturas del Padrón Municipal de Habitantes a 30 junio del 2007 al 2015 y a 1 enero del 2016 al 2019.,Fuente: Ajuntament de Barcelona. Departament d'Estadística i Difusió de Dades. Lecturas del Padrón Municipal de Habitantes a 30 junio del 2007 al 2015 y a 1 enero del 2016 al 2019.,Fuente: Ajuntament de Barcelona. Departament d'Estadística i Difusió de Dades. Lecturas del Padrón Municipal de Habitantes a 30 junio del 2007 al 2015 y a 1 enero del 2016 al 2019.,Fuente: Ajuntament de Barcelona. Departament d'Estadística i Difusió de Dades. Lecturas del Padrón Municipal de Habitantes a 30 junio del 2007 al 2015 y a 1 enero del 2016 al 2019.,Fuente: Ajuntament de Barcelona. Departament d'Estadística i Difusió de Dades. Lecturas del Padrón Municipal de Habitantes a 30 junio del 2007 al 2015 y a 1 enero del 2016 al 2019.,Fuente: Ajuntament de Barcelona. Departament d'Estadística i Difusió de Dades. Lecturas del Padrón Municipal de Habitantes a 30 junio del 2007 al 2015 y a 1 enero del 2016 al 2019.,Fuente: Ajuntament de Barcelona. Departament d'Estadística i Difusió de Dades. Lecturas del Padrón Municipal de Habitantes a 30 junio del 2007 al 2015 y a 1 enero del 2016 al 2019.,Fuente: Ajuntament de Barcelona. Departament d'Estadística i Difusió de Dades. Lecturas del Padrón Municipal de Habitantes a 30 junio del 2007 al 2015 y a 1 enero del 2016 al 2019.,Fuente: Ajuntament de Barcelona. Departament d'Estadística i Difusió de Dades. Lecturas del Padrón Municipal de Habitantes a 30 junio del 2007 al 2015 y a 1 enero del 2016 al 2019.,Fuente: Ajuntament de Barcelona. Departament d'Estadística i Difusió de Dades. Lecturas del Padrón Municipal de Habitantes a 30 junio del 2007 al 2015 y a 1 enero del 2016 al 2019.,Fuente: Ajuntament de Barcelona. Departament d'Estadística i Difusió de Dades. Lecturas del Padrón Municipal de Habitantes a 30 junio del 2007 al 2015 y a 1 enero del 2016 al 2019.


In [242]:
bcn_merged.loc[bcn_merged['Cluster Labels'] == 6]['Venue Category'].unique()

array(['Mexican Restaurant', 'Food & Drink Shop', 'Asian Restaurant',
       'Tapas Restaurant', 'Mediterranean Restaurant',
       'Italian Restaurant', 'Food Court', 'Restaurant',
       'Spanish Restaurant', 'Seafood Restaurant', 'Halal Restaurant'],
      dtype=object)

In [163]:
len(bcn_merged.loc[bcn_merged['Cluster Labels'] == 1] )

88

### Cluster 2

In [294]:
bcn_merged.loc[bcn_merged['Cluster Labels'] == 2 ] 

Unnamed: 0,Neighborhood,Neighborhood Latitude_x,Neighborhood Longitude_x,Distance from centre_x,Venue,Venue Latitude_x,Venue Longitude_x,Venue Category,Cluster Labels,Neighborhood Latitude_y,Neighborhood Longitude_y,Distance from centre_y,Venue Latitude_y,Venue Longitude_y
549,Vallcarca i els Penitents,41.415712,2.141469,4.0,Granja Bar Antonio,41.41317,2.139023,Spanish Restaurant,2,41.415712,2.141469,4.0,41.412711,2.138632
550,Vallcarca i els Penitents,41.415712,2.141469,4.0,Koh-Ndal,41.412251,2.13824,Thai Restaurant,2,41.415712,2.141469,4.0,41.412711,2.138632
