### Data Sources



1. Number of tourists in Catalunya per month from 2015 to 2019:
https://datos.gob.es/es/catalogo/ea0010587-numero-de-turistas-segun-comunidad-autonoma-de-destino-principal-mensual-comunidades-autonomas-movimientos-turisticos-en-fronteras-identificador-api-108231

2. Barcelona list of tourist attractions and number of annual visitors from 1994 to 2019: Observatorio de Turismo de Barcelona

3. Barcelona list of gastronomic equipment: https://opendata-ajuntament.barcelona.cat/data/es/dataset/equipament-restaurants

4. Barcelona tourist attractions geolocation: https://opendata-ajuntament.barcelona.cat/data/es/dataset/punts-informacio-turistica

5. Lonely Planet list of Barcelona tourist attractions with features: Web Scraping from Web Site
https://www.lonelyplanet.com/spain/barcelona/attractions?page=1

6. TripAdvisor Restaurants Info for 31 Euro-Cities: https://www.kaggle.com/damienbeneschi/krakow-ta-restaurans-data-raw

7. Barcelona neighbourhoods and districs geodata: https://github.com/martgnz/bcn-geodata



### Package import

In [2]:
import pandas as pd
from re import search
from math import sqrt, radians
from sklearn.neighbors import DistanceMetric
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

#### Loading datasets

In [2]:
#od is OpenData, pits is Points of Interest
pits_od=pd.read_excel('pits_opendata.xls')

#ob is Observsatorium de Turismo de Barcelona, pits is Points of Interest
pits_ob=pd.read_excel('barcelona-cultura-y-ocio-1994-2019.xlsx')

#TA is TripAdvisor
TA_data=pd.read_csv('TA_restaurants_curated.csv')

#lp is Lonely Planet
pits_lp=pd.read_excel('pits_lp.xlsx', sheet_name='Curado')

restaurants_op=pd.read_csv('restaurants_opendata.csv')

#cat is Catalunya
cat_visitors=pd.read_excel('catalunya_visitors2015-2019.xlsx')

#POI_barcleona
poi_bcn=pd.read_excel('POI_barcelona.xlsx')

#### Formatting, dropping columns, identifying missing values

##### Catalunya visitors dataset

In [3]:
cat_visitors.head(2)

Unnamed: 0,Año,Mes,Mes2,Dato base,Tasa de variación anual,Acumulado en lo que va de año,Tasa de variación acumulada,Total por año,Porcentaje
0,2019,12,December,993169,2.46,19375152,0.93,19375153,0.05126
1,2019,11,November,1047153,1.21,18381984,0.85,19375153,0.054046


In [4]:
cat_visitors.shape

(51, 9)

In [5]:
cat_visitors.dtypes

Año                                        int64
Mes                                        int64
Mes2                                      object
        Dato base                          int64
        Tasa de variación anual          float64
        Acumulado en lo que va de año     object
        Tasa de variación acumulada       object
Total por año                              int64
Porcentaje                               float64
dtype: object

In [6]:
cat_visitors.columns

Index(['Año', 'Mes', 'Mes2', '        Dato base',
       '        Tasa de variación anual',
       '        Acumulado en lo que va de año',
       '        Tasa de variación acumulada', 'Total por año', 'Porcentaje'],
      dtype='object')

In [7]:
#Removing space from benining and end of columns names
cat_visitors.columns = cat_visitors.columns.str.lstrip()

In [8]:
cat_visitors.columns

Index(['Año', 'Mes', 'Mes2', 'Dato base', 'Tasa de variación anual',
       'Acumulado en lo que va de año', 'Tasa de variación acumulada',
       'Total por año', 'Porcentaje'],
      dtype='object')

In [9]:
#Dropping unnecessary columns
cat_visitors=cat_visitors.drop(['Mes2', 'Tasa de variación anual', 'Acumulado en lo que va de año', 'Tasa de variación acumulada'], axis=1)

In [10]:
cat_visitors.head()

Unnamed: 0,Año,Mes,Dato base,Total por año,Porcentaje
0,2019,12,993169,19375153,0.05126
1,2019,11,1047153,19375153,0.054046
2,2019,10,1676430,19375153,0.086525
3,2019,9,2011704,19375153,0.103829
4,2019,8,2363331,19375153,0.121977


##### Points of interest Observatorium of Tourism in Barcelona

In [11]:
pits_ob.head(2)

Unnamed: 0,numero,Visitantes de museos y colecciones (MC),Type,1994,2000,2011,2014,2015,2016,2017,2018,2019
0,1,Museu FC Barcelona,Museums and Collections,538077,1156090,0,1530484,1785903,1947014,1848198,1730335,1661156
1,2,L'Aquàrium de Barcelona,Museums and Collections,0,1563493,0,1590420,1549480,1587828,1626193,1631108,1609373


In [12]:
pits_ob.dtypes

numero                                      int64
Visitantes de museos y colecciones (MC)    object
Type                                       object
1994                                        int64
2000                                        int64
2011                                        int64
2014                                        int64
2015                                       object
2016                                       object
2017                                        int64
2018                                        int64
2019                                        int64
dtype: object

In [13]:
#Dropping unnecessary columns with index number
pits_ob=pits_ob.drop('numero', axis=1)

In [14]:
#Replace string values with 0 in columns 2015 and 2016
pits_ob[2015].replace(['Cerrado', 'nd'], 0, inplace=True)
pits_ob[2016].replace(['Cerrado', 'nd'], 0, inplace=True)

In [15]:
#Changing columns 2015 and 2016 to int type
pits_ob[2015]=pits_ob[2015].astype(int)
pits_ob[2016]=pits_ob[2016].astype(int)

In [16]:
#Renaming columns
pits_ob.rename(columns={'Visitantes de museos y colecciones (MC)':'name', 'Type':'type'}, inplace=True)

In [17]:
#Apply upper case for names
pits_ob['name'] = pits_ob['name'].str.upper()
pits_ob['type']= pits_ob['type'].str.upper()

#Remove white space at the beginning and end of the name
pits_ob['name'] = pits_ob['name'].str.lstrip()
pits_ob['type']= pits_ob['type'].str.lstrip()

In [18]:
#Check for duplicate names
pits_ob.name.value_counts()

MUSEU TORRE BALLDOVINA - SANTA COLOMA DE GRAMANET                     1
MUSEU D’IDEES I INVENTS DE BARCELONA. MIBA (4)                        1
LA PEDRERA                                                            1
IMAX(1)                                                               1
CENTRE D'INTERPRETACIÓ DEL CAVA-CIC FASSINA (SANT SADURNÍ D'ANOIA)    1
                                                                     ..
PARC NATURAL DE LA SERRA DE COLLSEROLA                                1
PALAU GÜELL                                                           1
MUSEU DE LA MOTO                                                      1
PARC D'ATRACCIONS TIBIDABO                                            1
MUSEU DE LA COLÒNIA VIDAL (PUIG-REIG)                                 1
Name: name, Length: 134, dtype: int64

In [19]:
#Checking every column has a value
pits_ob.count()

name    134
type    134
1994    134
2000    134
2011    134
2014    134
2015    134
2016    134
2017    134
2018    134
2019    134
dtype: int64

##### Points of interest Open Data Barcelona

In [20]:
pits_od.dtypes

pos                    int64
num                    int64
address               object
city                  object
code_url              object
district              object
atencio_eq            object
gmapx                float64
gmapy                float64
id                     int64
name                  object
phonenumber           object
type                  object
tp                    object
title                 object
excerpt               object
content               object
date                  object
author                object
categories           float64
tags                 float64
language              object
originalpost           int64
related_post         float64
attachments          float64
vignette             float64
moreinfo             float64
usergroup             object
post_modified         object
original_modified     object
wt                    object
sigla                 object
sectionname           object
dtype: object

In [21]:
pits_od.shape

(851, 33)

In [22]:
#Drop unnecessary columns
pits_od=pits_od.drop(['originalpost','categories','atencio_eq', 'phonenumber','type','tp','date','author','tags','language','pos', 'num', 'city', 'address', 'code_url', 'related_post', 'attachments', 'vignette', 'moreinfo', 'usergroup', 'post_modified', 'original_modified', 'wt', 'sigla', 'sectionname'], axis=1)

In [23]:
pits_od.head(2)

Unnamed: 0,district,gmapx,gmapy,id,name,title,excerpt,content
0,Sants-Montjuïc,41.361374,2.159711,190829,Parc de Montjuïc,El parc de Montjuïc,<p>El gran espai natural del parc de Montjuïc ...,<h3>Un turó amb història</h3>\r\n<p>Assentat s...
1,Sants-Montjuïc,41.361374,2.159711,190829,Parc de Montjuïc,El parc de Montjuïc,<p>El gran espai natural del parc de Montjuïc ...,<h3>Un turó amb història</h3>\r\n<p>Assentat s...


In [24]:
#Apply upper case for names
pits_od['name'] = pits_od['name'].str.upper()

#Remove white space at the beginning and end of the name
pits_od['name'] = pits_od['name'].str.lstrip()


In [25]:
#Check for duplicates
pits_od.id.value_counts()

190829    2
286714    1
285228    1
191031    1
190989    1
         ..
541754    1
284089    1
541751    1
285749    1
541205    1
Name: id, Length: 850, dtype: int64

In [26]:
#Drop duplicate values
pits_od=pits_od.drop_duplicates('id')
pits_od.head(2)

Unnamed: 0,district,gmapx,gmapy,id,name,title,excerpt,content
0,Sants-Montjuïc,41.361374,2.159711,190829,PARC DE MONTJUÏC,El parc de Montjuïc,<p>El gran espai natural del parc de Montjuïc ...,<h3>Un turó amb història</h3>\r\n<p>Assentat s...
2,Sant Martí,41.411034,2.221698,190952,EL PARC DEL FÒRUM,El parc del Fòrum,<p>El recinte projectat arran de mar per acoll...,<h3>Un gran esdeveniment</h3>\r\n<p>El parc de...


In [27]:
#Removing noise strings from exceprt
pits_od['excerpt'] = pits_od['excerpt'].str.replace('<p>', '')
pits_od['excerpt'] = pits_od['excerpt'].str.replace('</p', '')


#Removing noise strings from content
pits_od['content']=pits_od['content'].str.replace('<h3>', '' )
pits_od['content']=pits_od['content'].str.replace('</h3>', '' )
pits_od['content']=pits_od['content'].str.replace('\r\n', '' )
pits_od['content']=pits_od['content'].str.replace('<p>', '' )
pits_od['content']=pits_od['content'].str.replace('<strong>', '' )
pits_od['content']=pits_od['content'].str.replace('</strong>', '' )
pits_od['content']=pits_od['content'].str.replace('</p>', '' )
pits_od['content']=pits_od['content'].str.replace('<!-- .photo-galleria -->', '' )

print(pits_od.excerpt[0])
print(" ")
print(pits_od.content[0])

El gran espai natural del parc de Montjuïc és el millor lloc per gaudir de la natura i la cultura alhora, perquè és ple de magnífics jardins i d’instal·lacions culturals.>
 
Un turó amb històriaAssentat sobre el turó que recorre el barri de Sants i mira cap al mar, Montjuïc ha estat testimoni i escenari de múltiples fets transcendentals en la història de Barcelona. Es va començar a urbanitzar a partir de l’Exposició Universal del 1929. Després dels successos dramàtics de la Guerra Civil, en què el castell va funcionar com a presó, l’indret va canviar i, amb els Jocs Olímpics de 1992, es va renovar completament i va tornar a adquirir un caràcter festiu i alegre per als barcelonins.El nom del turó, de 177 metres d’alçària, ha estat un tema de controvèrsia, ja que Montjuïc en català medieval podria traduir-se com a &#8216;Mont dels jueus&#8216;, la qual cosa està avalada per l’existència d’un cementiri jueu a la muntanya.Natura, cultura i esportEn aquest gran pulmó verd de la ciutat s’hi 

In [28]:
pits_od.district.value_counts()

Gràcia                 134
Sarrià-Sant Gervasi    119
Ciutat Vella           116
Sant Andreu             95
Sants-Montjuïc          88
Eixample                85
Les Corts               75
Sant Martí              57
Horta-Guinardó          52
Nou Barris              27
Name: district, dtype: int64

In [29]:
pits_od.count()

district    848
gmapx       850
gmapy       850
id          850
name        850
title       850
excerpt     850
content     850
dtype: int64

##### Trip Advisor dataset

In [30]:
TA_data.head(2)

Unnamed: 0.1,Unnamed: 0,Name,City,Cuisine Style,Ranking,Rating,Price Range,Number of Reviews,Reviews,URL_TA,ID_TA
0,0,Martine of Martine's Table,Amsterdam,"['French', 'Dutch', 'European']",1.0,5.0,$$ - $$$,136.0,"[['Just like home', 'A Warm Welcome to Wintry ...",/Restaurant_Review-g188590-d11752080-Reviews-M...,d11752080
1,1,De Silveren Spiegel,Amsterdam,"['Dutch', 'European', 'Vegetarian Friendly', '...",2.0,4.5,$$$$,812.0,"[['Great food and staff', 'just perfect'], ['0...",/Restaurant_Review-g188590-d693419-Reviews-De_...,d693419


In [31]:
TA_data.shape

(125527, 11)

In [32]:
TA_data.dtypes

Unnamed: 0             int64
Name                  object
City                  object
Cuisine Style         object
Ranking              float64
Rating               float64
Price Range           object
Number of Reviews    float64
Reviews               object
URL_TA                object
ID_TA                 object
dtype: object

In [33]:
TA_data_bcn=TA_data[TA_data['City']=='Barcelona']

In [34]:
TA_data_bcn.shape

(8425, 11)

In [35]:
TA_data_bcn.head(2)
TA_data_bcn=TA_data_bcn.drop(['Unnamed: 0', 'URL_TA', 'ID_TA', 'City'], axis=1)

In [36]:
TA_data_bcn.replace('NaN', np.nan, inplace=True)
TA_data_bcn.columns= TA_data_bcn.columns.str.lower()

In [37]:
TA_data_bcn.head(2)

Unnamed: 0,name,cuisine style,ranking,rating,price range,number of reviews,reviews
5372,Uma,"['International', 'Mediterranean', 'Fusion', '...",1.0,5.0,$$$$,792.0,"[['Perfect atmosphere and location', 'Perfecti..."
5373,Viana,"['International', 'Mediterranean', 'Spanish', ...",2.0,5.0,$$ - $$$,2707.0,"[['Wow! Best ever!', 'Small menu-- GET A RESER..."


In [38]:
TA_data_bcn=TA_data_bcn.reset_index().drop('index', axis=1)

In [39]:
TA_data_bcn['price range'].value_counts().sum()

5407

In [40]:
TA_data_bcn.count()

name                 8425
cuisine style        6388
ranking              7795
rating               7793
price range          5407
number of reviews    7264
reviews              7793
dtype: int64

In [41]:
TA_data_bcn.shape

(8425, 7)

In [42]:
pits_od=pits_od.drop_duplicates()

In [43]:
pits_od[pits_od['name']=='El Parc del Fòrum']['title']

Series([], Name: title, dtype: object)

## Modelado

In [57]:
#Loading data set with points of intereset, categories and geolocation
poi_dataset=pd.read_excel('POI_barcelona.xlsx', sheet_name='POI base')


In [59]:
poi_dataset.tipo.value_counts()

Museums and Collections            82
Natural spaces                     14
Areas of architectural interest    11
Exposition Centers                  9
Leisure Spaces                      4
Cableway                            3
Missing                             1
Name: tipo, dtype: int64

In [45]:
poi_dataset.drop('name', axis=1, inplace=True)
poi_dataset=poi_dataset.rename(columns={'name (upercase)':'name', 'tipo1':'category', 'longitud2':'longitud', 'latitud2':'latitud'})
poi_dataset.drop('part name', axis=1, inplace=True)


In [46]:
pits_lp['Title']=pits_lp['Title'].str.upper()
pits_lp.rename(columns={'Title':'name'}, inplace=True)

In [47]:
pits_lp

Unnamed: 0,name,POPULARITY,CLASSIFICATION,SUBCLASSIFICATION1,SUBCLASSIFICATION2,SUBCLASSIFICATION3
0,LA SAGRADA FAMÍLIA,TOP CHOICE,BASILICA IN L'EIXAMPLE,BASILICA,BASILICA,L'EIXAMPLE
1,PARK GÜELL,TOP CHOICE,PARK IN GRÀCIA & PARK GÜELL,PARK,PARK,GRÀCIA&PARKGÜELL
2,LA PEDRERA,TOP CHOICE,ARCHITECTURE IN L'EIXAMPLE,ARCHITECTURE,ARCHITECTURE,L'EIXAMPLE
3,CASA BATLLÓ,TOP CHOICE,ARCHITECTURE IN L'EIXAMPLE,ARCHITECTURE,ARCHITECTURE,L'EIXAMPLE
4,MUSEU PICASSO,TOP CHOICE,MUSEUM IN LA RIBERA,MUSEUM,MUSEUM,LARIBERA
...,...,...,...,...,...,...
217,PLATJA DEL LLEVANT,,"BEACH IN BARCELONETA, THE WATERFRONT & EL POBL...",BEACH,BEACH,"BARCELONETA,THEWATERFRONT&ELPOBLENOU"
218,PLAÇA DEL SOL,,SQUARE IN GRÀCIA & PARK GÜELL,SQUARE,SQUARE,GRÀCIA&PARKGÜELL
219,PLAÇA DE LA LLIBERTAT,,SQUARE IN GRÀCIA & PARK GÜELL,SQUARE,SQUARE,GRÀCIA&PARKGÜELL
220,PLAÇA DE LA REVOLUCIÓ DE SETEMBRE DE 1868,,SQUARE IN GRÀCIA & PARK GÜELL,SQUARE,SQUARE,GRÀCIA&PARKGÜELL


In [48]:
poi_dataset=pd.merge(poi_dataset, pits_lp, on='name')

In [56]:
poi_dataset.head()

NameError: name 'poi_dataset' is not defined

#### Distance Matrix

In [50]:
distance_matrix=poi_dataset.copy()

In [51]:
#We create a distance matrix to create a distance feature between the searched attraction and the rest 
distance_matrix=distance_matrix.drop(['Average',1994, 2000, 2011, 2014, 2015, 2016, 2017, 2018, 2019], axis=1)
distance_matrix=distance_matrix.rename(columns={'name (upercase)':'name'})

In [52]:
distance_matrix.head(2)

Unnamed: 0,latitud,longitud,name,category,1,2,3,4,5,6,...,8,9,10,11,12,POPULARITY,CLASSIFICATION,SUBCLASSIFICATION1,SUBCLASSIFICATION2,SUBCLASSIFICATION3
0,41.364433,2.167106,CASTELL DE MONTJUÏC,Museums and Collections,37473.901759,43391.508523,52225.124478,71440.552584,80648.358602,90169.512164,...,105277.04102,86043.943424,69172.296055,43791.287896,39817.19644,,"FORTRESS IN MONTJUÏC, POBLE SEC & SANT ANTONI",FORTRESS,FORTRESS,"MONTJUÏC,POBLESEC&SANTANTONI"
1,41.383884,2.166795,CENTRE DE CULTURA CONTEMPORÀNIA DE BARCELONA,Exposition Centers,20682.337228,23948.341912,28823.730264,39428.976726,44510.885475,49765.734838,...,58103.777897,47488.779412,38177.096237,24168.985385,21975.632256,,GALLERY IN EL RAVAL,GALLERY,GALLERY,ELRAVAL


In [53]:
#Importing packages to calculate distances between geolocations
import haversine as hs
from haversine import Unit

#To calculate distance in meters 
distance_matrix['coor']=list(zip(distance_matrix.longitud, distance_matrix.latitud))

In [54]:
#Function that calcualtes distance in km between two geolocations
def distance_from(loc1,loc2): 
    dist=hs.haversine(loc1,loc2, unit=Unit.MILES)
    return round(dist,4)

In [55]:
#Create a matrix that calculates distances between each point of interest
for _,row in distance_matrix.iterrows():
    distance_matrix[row['name']]=distance_matrix['coor'].apply(lambda x: distance_from(row.coor,x))

In [56]:
#Creating months and years lists to facilitate manipulation
months=list(range(1,13))
years=[1994, 2000, 2011, 2014, 2015, 2016, 2017, 2018, 2019]

In [57]:
distance_matrix.head()

Unnamed: 0,latitud,longitud,name,category,1,2,3,4,5,6,...,MUSEU DE LA XOCOLATA,REIAL MONESTIR DE SANTA MARIA DE PEDRALBES,MUSEU DE LA MÚSICA,COL·LECCIÓ DE CARROSSES FÚNEBRES,LA CAPELLA,ARXIU FOTOGRÀFIC DE BARCELONA,FABRA I COATS,MIRADOR DE COLOM,PARK GÜELL,PAVELLÓ MIES VAN DER ROHE
0,41.364433,2.167106,CASTELL DE MONTJUÏC,Museums and Collections,37473.901759,43391.508523,52225.124478,71440.552584,80648.358602,90169.512164,...,1.942,4.3745,2.7629,0.9905,1.1699,1.9207,5.1797,1.1689,3.654,1.2894
1,41.383884,2.166795,CENTRE DE CULTURA CONTEMPORÀNIA DE BARCELONA,Exposition Centers,20682.337228,23948.341912,28823.730264,39428.976726,44510.885475,49765.734838,...,1.0946,3.8512,1.703,2.1298,0.3229,1.0551,3.9452,0.9508,2.3787,1.4664
2,41.36882,2.163033,TELEFÈRIC DE MONTJUÏC,Cableway,69316.84448,80262.857799,96602.7198,132146.198837,149178.213861,166789.83928,...,1.8775,3.9803,2.6518,1.0678,0.9885,1.8491,4.9993,1.2108,3.2922,0.9341
3,41.403942,2.17497,LA SAGRADA FAMÍLIA,Areas of architectural interest,210637.369153,243899.694719,293552.64084,401560.802076,453317.036262,506834.568294,...,1.1862,4.3468,0.7753,3.6245,1.6116,1.1696,2.4546,1.8883,1.7165,2.8521
4,41.38515,2.180835,MUSEU PICASSO,Museums and Collections,46916.101072,54324.751467,65384.150129,89441.238525,100969.110925,112889.284216,...,0.2308,4.7861,1.0729,2.687,0.7686,0.2199,3.5429,0.594,2.8433,2.3534


In [58]:
#Dropping unncessary columns for the distance matrix and formatting
distance_matrix=distance_matrix.drop(months, axis=1)
distance_matrix=distance_matrix.drop(['latitud', 'longitud', 'category', 'coor'], axis=1)
#distance_matrix.set_index('name', inplace=True)

In [59]:
#Verifying shape of the matrix
distance_matrix.shape

(30, 36)

In [60]:
#Dropping unncessary columns and formatting
poi_dataset.drop(['latitud', 'longitud', 'Average']+years+months ,axis=1, inplace=True)

#Creating one hot encode variable for categories
poi_dataset=pd.concat([poi_dataset, pd.get_dummies(poi_dataset['category'])], axis=1)
poi_dataset.drop('category', axis=1, inplace=True)

In [61]:
poi_dataset.columns

Index(['name', 'POPULARITY', 'CLASSIFICATION', 'SUBCLASSIFICATION1',
       'SUBCLASSIFICATION2', 'SUBCLASSIFICATION3',
       'Areas of architectural interest', 'Cableway', 'Exposition Centers',
       'Leisure Spaces', 'Museums and Collections'],
      dtype='object')

In [62]:
poi_dataset=pd.concat([poi_dataset, pd.get_dummies(poi_dataset['SUBCLASSIFICATION2'])], axis=1)
poi_dataset['POPULARITY']=poi_dataset['POPULARITY'].replace([np.nan, 'TOP CHOICE'], [0,1])

In [63]:
poi_dataset=pd.concat([poi_dataset, pd.get_dummies(poi_dataset['SUBCLASSIFICATION3'])], axis=1)

In [64]:
poi_dataset.head(2)

Unnamed: 0,name,POPULARITY,CLASSIFICATION,SUBCLASSIFICATION1,SUBCLASSIFICATION2,SUBCLASSIFICATION3,Areas of architectural interest,Cableway,Exposition Centers,Leisure Spaces,...,ZOO,BARCELONA,"BARCELONETA,THEWATERFRONT&ELPOBLENOU","CAMPNOU,PEDRALBES&LAZONAALTA",ELRAVAL,GRÀCIA&PARKGÜELL,L'EIXAMPLE,LARAMBLA&BARRIGÒTIC,LARIBERA,"MONTJUÏC,POBLESEC&SANTANTONI"
0,CASTELL DE MONTJUÏC,0,"FORTRESS IN MONTJUÏC, POBLE SEC & SANT ANTONI",FORTRESS,FORTRESS,"MONTJUÏC,POBLESEC&SANTANTONI",0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
1,CENTRE DE CULTURA CONTEMPORÀNIA DE BARCELONA,0,GALLERY IN EL RAVAL,GALLERY,GALLERY,ELRAVAL,0,0,1,0,...,0,0,0,0,1,0,0,0,0,0


In [65]:
poi_dataset.drop(['CLASSIFICATION','SUBCLASSIFICATION1' ,'SUBCLASSIFICATION2', 'SUBCLASSIFICATION3'], axis=1, inplace=True)

In [66]:
poi_dataset.head()

Unnamed: 0,name,POPULARITY,Areas of architectural interest,Cableway,Exposition Centers,Leisure Spaces,Museums and Collections,ARCHITECTURE,BASILICA,CABLE CAR,...,ZOO,BARCELONA,"BARCELONETA,THEWATERFRONT&ELPOBLENOU","CAMPNOU,PEDRALBES&LAZONAALTA",ELRAVAL,GRÀCIA&PARKGÜELL,L'EIXAMPLE,LARAMBLA&BARRIGÒTIC,LARIBERA,"MONTJUÏC,POBLESEC&SANTANTONI"
0,CASTELL DE MONTJUÏC,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,1
1,CENTRE DE CULTURA CONTEMPORÀNIA DE BARCELONA,0,0,0,1,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
2,TELEFÈRIC DE MONTJUÏC,0,0,1,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,1
3,LA SAGRADA FAMÍLIA,1,1,0,0,0,0,0,1,0,...,0,0,0,0,0,0,1,0,0,0
4,MUSEU PICASSO,1,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,1,0


In [67]:
#Simulation of a random search of a user

In [68]:
df_user=poi_dataset.sample(2)
df_user

Unnamed: 0,name,POPULARITY,Areas of architectural interest,Cableway,Exposition Centers,Leisure Spaces,Museums and Collections,ARCHITECTURE,BASILICA,CABLE CAR,...,ZOO,BARCELONA,"BARCELONETA,THEWATERFRONT&ELPOBLENOU","CAMPNOU,PEDRALBES&LAZONAALTA",ELRAVAL,GRÀCIA&PARKGÜELL,L'EIXAMPLE,LARAMBLA&BARRIGÒTIC,LARIBERA,"MONTJUÏC,POBLESEC&SANTANTONI"
4,MUSEU PICASSO,1,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,1,0
10,FUNDACIÓ JOAN MIRÓ,1,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,1


In [69]:
#Obtaining the poi name that the user searched
df_user.index.values[0]
user_poi_name=df_user.loc[df_user.index.values[0],'name']

In [70]:
user_poi_name

'MUSEU PICASSO'

In [71]:
#Adding a distance columns with the distance of all poi's relative to the poi the user searched for
poi_dataset['distance']=distance_matrix[user_poi_name]

In [72]:
poi_dataset.head()

Unnamed: 0,name,POPULARITY,Areas of architectural interest,Cableway,Exposition Centers,Leisure Spaces,Museums and Collections,ARCHITECTURE,BASILICA,CABLE CAR,...,BARCELONA,"BARCELONETA,THEWATERFRONT&ELPOBLENOU","CAMPNOU,PEDRALBES&LAZONAALTA",ELRAVAL,GRÀCIA&PARKGÜELL,L'EIXAMPLE,LARAMBLA&BARRIGÒTIC,LARIBERA,"MONTJUÏC,POBLESEC&SANTANTONI",distance
0,CASTELL DE MONTJUÏC,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,1,1.7163
1,CENTRE DE CULTURA CONTEMPORÀNIA DE BARCELONA,0,0,0,1,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0.974
2,TELEFÈRIC DE MONTJUÏC,0,0,1,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,1,1.6685
3,LA SAGRADA FAMÍLIA,1,1,0,0,0,0,0,1,0,...,0,0,0,0,0,1,0,0,0,1.3593
4,MUSEU PICASSO,1,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,1,0,0.0


In [73]:
#Creating bins for each the distances and adding a categorical variable
bins=[0, 2, 4, 6, 10, 20, 30, 40, 50, 80]
labels=['<2', '<4', '<6', '<10', '<20', '<30', '<40', '<50', '<80']
poi_dataset['distancia']=pd.cut(poi_dataset['distance'], bins, include_lowest=True,labels=labels)

In [74]:
poi_dataset.head()

Unnamed: 0,name,POPULARITY,Areas of architectural interest,Cableway,Exposition Centers,Leisure Spaces,Museums and Collections,ARCHITECTURE,BASILICA,CABLE CAR,...,"BARCELONETA,THEWATERFRONT&ELPOBLENOU","CAMPNOU,PEDRALBES&LAZONAALTA",ELRAVAL,GRÀCIA&PARKGÜELL,L'EIXAMPLE,LARAMBLA&BARRIGÒTIC,LARIBERA,"MONTJUÏC,POBLESEC&SANTANTONI",distance,distancia
0,CASTELL DE MONTJUÏC,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,1,1.7163,<2
1,CENTRE DE CULTURA CONTEMPORÀNIA DE BARCELONA,0,0,0,1,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0.974,<2
2,TELEFÈRIC DE MONTJUÏC,0,0,1,0,0,0,0,0,1,...,0,0,0,0,0,0,0,1,1.6685,<2
3,LA SAGRADA FAMÍLIA,1,1,0,0,0,0,0,1,0,...,0,0,0,0,1,0,0,0,1.3593,<2
4,MUSEU PICASSO,1,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,1,0,0.0,<2


In [75]:
#Dropping unnecessary columns
poi_dataset.drop('distance', axis=1, inplace=True)

#Creating categorical variables for distances
poi_dataset=pd.concat([poi_dataset, pd.get_dummies(poi_dataset['distancia'])], axis=1)

#Dropping unncessary column
poi_dataset.drop('distancia', axis=1, inplace=True)

In [76]:
poi_dataset.head()

Unnamed: 0,name,POPULARITY,Areas of architectural interest,Cableway,Exposition Centers,Leisure Spaces,Museums and Collections,ARCHITECTURE,BASILICA,CABLE CAR,...,"MONTJUÏC,POBLESEC&SANTANTONI",<2,<4,<6,<10,<20,<30,<40,<50,<80
0,CASTELL DE MONTJUÏC,0,0,0,0,0,1,0,0,0,...,1,1,0,0,0,0,0,0,0,0
1,CENTRE DE CULTURA CONTEMPORÀNIA DE BARCELONA,0,0,0,1,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
2,TELEFÈRIC DE MONTJUÏC,0,0,1,0,0,0,0,0,1,...,1,1,0,0,0,0,0,0,0,0
3,LA SAGRADA FAMÍLIA,1,1,0,0,0,0,0,1,0,...,0,1,0,0,0,0,0,0,0,0
4,MUSEU PICASSO,1,0,0,0,0,1,0,0,0,...,0,1,0,0,0,0,0,0,0,0


In [77]:
df_user_profile=poi_dataset[poi_dataset['name'].isin(df_user['name'].tolist())]

In [78]:
df_user_profile=df_user_profile.set_index('name')

In [79]:
df_user_profile

Unnamed: 0_level_0,POPULARITY,Areas of architectural interest,Cableway,Exposition Centers,Leisure Spaces,Museums and Collections,ARCHITECTURE,BASILICA,CABLE CAR,FORTRESS,...,"MONTJUÏC,POBLESEC&SANTANTONI",<2,<4,<6,<10,<20,<30,<40,<50,<80
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
MUSEU PICASSO,1,0,0,0,0,1,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
FUNDACIÓ JOAN MIRÓ,1,0,0,0,0,1,0,0,0,0,...,1,1,0,0,0,0,0,0,0,0


In [80]:
userProfile=df_user_profile.transpose().sum(axis=1)

In [85]:
userProfile.to_frame().transpose()
userProfile

POPULARITY                              2
Areas of architectural interest         0
Cableway                                0
Exposition Centers                      0
Leisure Spaces                          0
Museums and Collections                 2
ARCHITECTURE                            0
BASILICA                                0
CABLE CAR                               0
FORTRESS                                0
GALLERY                                 1
MONASTERY                               0
MUSEUM                                  1
PALACE                                  0
PARK                                    0
VIEWPOINT                               0
ZOO                                     0
BARCELONA                               0
BARCELONETA,THEWATERFRONT&ELPOBLENOU    0
CAMPNOU,PEDRALBES&LAZONAALTA            0
ELRAVAL                                 0
GRÀCIA&PARKGÜELL                        0
L'EIXAMPLE                              0
LARAMBLA&BARRIGÒTIC               

In [86]:
df_user_profile.index.values[0]

'MUSEU PICASSO'

In [87]:
df_user_profile.columns.values[0]

'POPULARITY'

In [128]:
poi_dataset

Unnamed: 0_level_0,POPULARITY,Areas of architectural interest,Cableway,Exposition Centers,Leisure Spaces,Museums and Collections,ARCHITECTURE,BASILICA,CABLE CAR,FORTRESS,...,"MONTJUÏC,POBLESEC&SANTANTONI",<2,<4,<6,<10,<20,<30,<40,<50,<80
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
CASTELL DE MONTJUÏC,0,0,0,0,0,1,0,0,0,1,...,1,1,0,0,0,0,0,0,0,0
CENTRE DE CULTURA CONTEMPORÀNIA DE BARCELONA,0,0,0,1,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
TELEFÈRIC DE MONTJUÏC,0,0,1,0,0,0,0,0,1,0,...,1,1,0,0,0,0,0,0,0,0
LA SAGRADA FAMÍLIA,1,1,0,0,0,0,0,1,0,0,...,0,1,0,0,0,0,0,0,0,0
MUSEU PICASSO,1,0,0,0,0,1,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
MUSEU FREDERIC MARÈS,1,0,0,0,0,1,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
ZOO DE BARCELONA,0,0,0,0,0,1,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
FUNDACIÓ ANTONI TÀPIES,1,0,0,0,0,1,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
PALAU ROBERT,0,0,0,1,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
PALAU GÜELL,1,1,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0


In [92]:
recommendationTable_df = ((poi_dataset*userProfile).sum(axis=1))/(userProfile.sum())

In [93]:
recommendationTable_df = recommendationTable_df.sort_values(ascending=False)
recommendationTable_df.head(2)

name
FUNDACIÓ JOAN MIRÓ    0.8
MUSEU PICASSO         0.8
dtype: float64

In [94]:
from datetime import date

today = date.today()
print("Today's date:", today)

Today's date: 2021-04-29


In [95]:
today.month

4

In [96]:
rm_df=recommendationTable_df.to_frame()

In [97]:

rm_df=rm_df.reset_index()

In [98]:
rm_df=rm_df.rename(columns={0:'score'})

In [99]:
df=pd.read_excel('POI_barcelona.xlsx',sheet_name='POI base')
df['XP']=pd.cut(df[3], bins=[0, 1000, 2000, 5000, 10000, 20000,50000, 75000, 100000, 200000, 600000], labels=['1500', '1400', '1250','1000', '750','600','500', '400', '300', '150' ], include_lowest=True)
df.drop('name', axis=1, inplace=True)
df.rename(columns={'name (upercase)':'name'}, inplace=True)


In [100]:
pd.merge(rm_df, df, on='name').head()

Unnamed: 0,name,score,latitud2,longitud2,part name,tipo1,1994,2000,2011,2014,...,4,5,6,7,8,9,10,11,12,XP
0,FUNDACIÓ JOAN MIRÓ,0.8,41.36826,2.160131,,Museums and Collections,236196,497295,0,489928,...,31623.181735,35699.019792,39913.561233,46596.591328,46600.913351,38087.37701,30619.137311,19384.226542,17625.093783,600
1,MUSEU PICASSO,0.8,41.38515,2.180835,,Museums and Collections,711103,1026549,0,919814,...,89441.238525,100969.110925,112889.284216,131791.192752,131803.416919,107724.206898,86601.455434,54825.262185,49849.829479,500
2,FUNDACIÓ ANTONI TÀPIES,0.7,41.39152,2.163757,,Museums and Collections,55338,79783,0,66058,...,4121.291589,4652.475239,5201.73541,6072.701398,6073.264665,4963.737926,3990.439486,2526.249587,2296.990587,1250
3,MUSEU FREDERIC MARÈS,0.7,41.384228,2.17677,,Museums and Collections,26682,23470,0,38811,...,4644.06875,5242.631928,5861.564595,6843.010785,6843.645502,5593.377632,4496.618333,2846.699029,2588.359007,1250
4,MUSEU DE LA XOCOLATA,0.6,41.388276,2.182018,,Museums and Collections,0,0,0,136384,...,11500.520052,12982.795229,14515.513181,16945.955578,16947.527384,13851.377977,11135.375484,7049.533722,6409.783372,1000


In [112]:
df_user_profile

Unnamed: 0_level_0,POPULARITY,Areas of architectural interest,Cableway,Exposition Centers,Leisure Spaces,Museums and Collections,ARCHITECTURE,BASILICA,CABLE CAR,FORTRESS,...,"MONTJUÏC,POBLESEC&SANTANTONI",<2,<4,<6,<10,<20,<30,<40,<50,<80
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
MUSEU PICASSO,1,0,0,0,0,1,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
FUNDACIÓ JOAN MIRÓ,1,0,0,0,0,1,0,0,0,0,...,1,1,0,0,0,0,0,0,0,0


In [101]:
df_user_profile.columns.values[0]

'POPULARITY'

In [114]:
final_rm=pd.merge(rm_df, df, on='name')
final_rm.head()

Unnamed: 0,name,score,latitud2,longitud2,part name,tipo1,1994,2000,2011,2014,...,4,5,6,7,8,9,10,11,12,XP
0,FUNDACIÓ JOAN MIRÓ,0.8,41.36826,2.160131,,Museums and Collections,236196,497295,0,489928,...,31623.181735,35699.019792,39913.561233,46596.591328,46600.913351,38087.37701,30619.137311,19384.226542,17625.093783,600
1,MUSEU PICASSO,0.8,41.38515,2.180835,,Museums and Collections,711103,1026549,0,919814,...,89441.238525,100969.110925,112889.284216,131791.192752,131803.416919,107724.206898,86601.455434,54825.262185,49849.829479,500
2,FUNDACIÓ ANTONI TÀPIES,0.7,41.39152,2.163757,,Museums and Collections,55338,79783,0,66058,...,4121.291589,4652.475239,5201.73541,6072.701398,6073.264665,4963.737926,3990.439486,2526.249587,2296.990587,1250
3,MUSEU FREDERIC MARÈS,0.7,41.384228,2.17677,,Museums and Collections,26682,23470,0,38811,...,4644.06875,5242.631928,5861.564595,6843.010785,6843.645502,5593.377632,4496.618333,2846.699029,2588.359007,1250
4,MUSEU DE LA XOCOLATA,0.6,41.388276,2.182018,,Museums and Collections,0,0,0,136384,...,11500.520052,12982.795229,14515.513181,16945.955578,16947.527384,13851.377977,11135.375484,7049.533722,6409.783372,1000


In [122]:
final_rm=pd.merge(rm_df, df, on='name')

final_rm=final_rm.head(10)

final_rm=final_rm[['name', 'score', 'XP',today.month]].sort_values(4, ascending=False)
final_rm.rename(columns={4:'tourist pressure'}, inplace=True)
final_rm

Unnamed: 0,name,score,XP,tourist pressure
1,MUSEU PICASSO,0.8,500,89441.238525
9,CASTELL DE MONTJUÏC,0.5,500,71440.552584
8,ZOO DE BARCELONA,0.5,600,67650.800755
0,FUNDACIÓ JOAN MIRÓ,0.8,600,31623.181735
6,MUSEU DEL DISSENY DE BARCELONA,0.5,750,15884.225275
4,MUSEU DE LA XOCOLATA,0.6,1000,11500.520052
3,MUSEU FREDERIC MARÈS,0.7,1250,4644.06875
2,FUNDACIÓ ANTONI TÀPIES,0.7,1250,4121.291589
5,MUSEU D'ARQUEOLOGIA DE CATALUNYA,0.6,1250,3240.785278
7,MUSEU DEL PERFUM,0.5,1500,250.343993


In [123]:
final_rm.sort_values(['tourist pressure'], ascending=True, inplace=True)
final_rm['score']=final_rm['score'].round(2)
final_rm=final_rm.drop('tourist pressure', axis=1)
recommendations=final_rm.copy()

In [124]:
recommendations=recommendations.reset_index(drop=True)
recommendations

Unnamed: 0,name,score,XP
0,MUSEU DEL PERFUM,0.5,1500
1,MUSEU D'ARQUEOLOGIA DE CATALUNYA,0.6,1250
2,FUNDACIÓ ANTONI TÀPIES,0.7,1250
3,MUSEU FREDERIC MARÈS,0.7,1250
4,MUSEU DE LA XOCOLATA,0.6,1000
5,MUSEU DEL DISSENY DE BARCELONA,0.5,750
6,FUNDACIÓ JOAN MIRÓ,0.8,600
7,ZOO DE BARCELONA,0.5,600
8,CASTELL DE MONTJUÏC,0.5,500
9,MUSEU PICASSO,0.8,500


In [2095]:
pip install dijkstra

Collecting dijkstra
  Downloading dijkstra-0.2.1-py3-none-any.whl (5.5 kB)
Installing collected packages: dijkstra
Successfully installed dijkstra-0.2.1
Note: you may need to restart the kernel to use updated packages.


In [2146]:
recommended_itinerary=[]
for i in recommendations['name'].head(4):
    recommended_itinerary.append(i)

In [2147]:
recommended_itinerary

['FUNDACIÓ ANTONI TÀPIES', 'LA CAPELLA', 'CASA VICENS', 'PALAU GÜELL']

In [2105]:
import dijkstra

In [2148]:
graph={recommended_itinerary[0]:{recommended_itinerary[1]:6, recommended_itinerary[2]:2}, 
      recommended_itinerary[1]:{recommended_itinerary[0]:6,recommended_itinerary[2]:2, recommended_itinerary[3]:1},
       recommended_itinerary[2]:{recommended_itinerary[1]:2, recommended_itinerary[3]:7},
      recommended_itinerary[3]:{recommended_itinerary[1]:1, recommended_itinerary[2]:7}}


In [2149]:
def dijkstra(graph,src,dest,visited=[],distances={},predecessors={}):
    """ calculates a shortest path tree routed in src
    """    
    # a few sanity checks
    if src not in graph:
        raise TypeError('The root of the shortest path tree cannot be found')
    if dest not in graph:
        raise TypeError('The target of the shortest path cannot be found')    
    # ending condition
    if src == dest:
        # We build the shortest path and display it
        path=[]
        pred=dest
        while pred != None:
            path.append(pred)
            pred=predecessors.get(pred,None)
        # reverses the array, to display the path nicely
        readable=path[0]
        for index in range(1,len(path)): readable = path[index]+'--->'+readable
        #prints it 
        print('shortest path - array: '+str(path))
        print("path: "+readable+",   cost="+str(distances[dest]))     
    else:     
        # if it is the initial  run, initializes the cost
        if not visited: 
            distances[src]=0
        # visit the neighbors
        for neighbor in graph[src] :
            if neighbor not in visited:
                new_distance = distances[src] + graph[src][neighbor]
                if new_distance < distances.get(neighbor,float('inf')):
                    distances[neighbor] = new_distance
                    predecessors[neighbor] = src
        # mark as visited
        visited.append(src)
        # now that all neighbors have been visited: recurse                         
        # select the non visited node with lowest distance 'x'
        # run Dijskstra with src='x'
        unvisited={}
        for k in graph:
            if k not in visited:
                unvisited[k] = distances.get(k,float('inf'))        
        x=min(unvisited, key=unvisited.get)
        dijkstra(graph,x,dest,visited,distances,predecessors)
        


if __name__ == "__main__":
    #import sys;sys.argv = ['', 'Test.testName']
    #unittest.main()
    
    dijkstra(graph,nodes[0], nodes[3])

shortest path - array: ['PALAU GÜELL', 'LA CAPELLA', 'CASA VICENS', 'FUNDACIÓ ANTONI TÀPIES']
path: FUNDACIÓ ANTONI TÀPIES--->CASA VICENS--->LA CAPELLA--->PALAU GÜELL,   cost=5


In [3]:
df=pd.read_json('google_bcn_poi_data.json')

In [13]:
df[['title']]

Unnamed: 0,title
0,Montjuïc Castle
1,CosmoCaixa Barcelona
2,Museu Nacional d'Art de Catalunya
3,Centre de Cultura Contemporània de Barcelona
4,Parc del Laberint d'Horta
...,...
118,Espai Natural de les Guilleries-Savassona
119,Pantà de Foix
120,Parc de la Serralada Litoral
121,Museu de Ciències Naturals de Barcelona


In [33]:
df.columns

Index(['title', 'subTitle', 'categoryName', 'address', 'locatedIn', 'plusCode',
       'website', 'phone', 'temporarilyClosed', 'permanentlyClosed',
       'totalScore', 'placeId', 'url', 'searchString', 'location', 'scrapedAt',
       'popularTimesLiveText', 'popularTimesLivePercent',
       'popularTimesHistogram', 'reviewsCount', 'reviewsDistribution',
       'reviews', 'imageUrls'],
      dtype='object')

In [55]:
df['popularTimesHistogram'][0]

{'Su': [{'hour': 6, 'occupancyPercent': 0},
  {'hour': 7, 'occupancyPercent': 0},
  {'hour': 8, 'occupancyPercent': 0},
  {'hour': 9, 'occupancyPercent': 0},
  {'hour': 10, 'occupancyPercent': 28},
  {'hour': 11, 'occupancyPercent': 68},
  {'hour': 12, 'occupancyPercent': 100},
  {'hour': 13, 'occupancyPercent': 89},
  {'hour': 14, 'occupancyPercent': 58},
  {'hour': 15, 'occupancyPercent': 51},
  {'hour': 16, 'occupancyPercent': 68},
  {'hour': 17, 'occupancyPercent': 77},
  {'hour': 18, 'occupancyPercent': 60},
  {'hour': 19, 'occupancyPercent': 32},
  {'hour': 20, 'occupancyPercent': 0},
  {'hour': 21, 'occupancyPercent': 0},
  {'hour': 22, 'occupancyPercent': 0},
  {'hour': 23, 'occupancyPercent': 0}],
 'Mo': [{'hour': 6, 'occupancyPercent': 0},
  {'hour': 7, 'occupancyPercent': 0},
  {'hour': 8, 'occupancyPercent': 0},
  {'hour': 9, 'occupancyPercent': 0},
  {'hour': 10, 'occupancyPercent': 14},
  {'hour': 11, 'occupancyPercent': 19},
  {'hour': 12, 'occupancyPercent': 21},
  {'ho

In [53]:
df[df['popularTimesHistogram'].isnull()]

Unnamed: 0,title,subTitle,categoryName,address,locatedIn,plusCode,website,phone,temporarilyClosed,permanentlyClosed,...,searchString,location,scrapedAt,popularTimesLiveText,popularTimesLivePercent,popularTimesHistogram,reviewsCount,reviewsDistribution,reviews,imageUrls
7,La Sagrada Familia,,Basilica,"Carrer de Mallorca, 401, 08013 Barcelona, Spain",,"C53F+FP Barcelona, Spain",sagradafamilia.org,+34 932 08 04 14,True,False,...,,"{'lat': 41.4036299, 'lng': 2.1743558}",2021-04-30T13:26:04.823Z,,,,154790,"{'oneStar': 1727, 'twoStar': 1387, 'threeStar'...",[],[https://lh5.googleusercontent.com/p/AF1QipPXz...
12,Tibidabo Amusement Park,,Amusement park,"Plaça del Tibidabo, 3, 4, 08035 Barcelona, Spain",,"C4C9+JJ Barcelona, Spain",tibidabo.cat,+34 932 11 79 42,True,False,...,,"{'lat': 41.4215056, 'lng': 2.1190917}",2021-04-30T13:26:47.676Z,,,,29062,"{'oneStar': 684, 'twoStar': 708, 'threeStar': ...",[],[https://lh5.googleusercontent.com/p/AF1QipMw5...
20,Poble Espanyol,,Museum,"Av. Francesc Ferrer i Guàrdia, 13, 08038 Barce...",,"949X+G8 Barcelona, Spain",poble-espanyol.com,+34 935 08 63 00,False,False,...,,"{'lat': 41.3688085, 'lng': 2.1482732}",2021-04-30T13:28:47.873Z,,,,21091,"{'oneStar': 863, 'twoStar': 820, 'threeStar': ...",[],[https://lh5.googleusercontent.com/p/AF1QipOL3...
23,Museu de Cera de Barcelona,,Wax museum,"Passatge de la Banca, 7, 08002 Barcelona, Spain",,"95GG+RQ Barcelona, Spain",museocerabcn.com,+34 933 17 26 49,False,False,...,,"{'lat': 41.3771152, 'lng': 2.1769948}",2021-04-30T13:29:25.812Z,,,,2788,"{'oneStar': 170, 'twoStar': 212, 'threeStar': ...",[],[https://lh5.googleusercontent.com/p/AF1QipPDm...
24,Museu d'Autòmats,,Museum,"Parc d'Atraccions del Tibidabo, 3-4, 08035 Bar...",Tibidabo Amusement Park,"C4C9+JV Barcelona, Spain",tibidabo.cat,+34 932 11 79 42,True,False,...,,"{'lat': 41.4215107, 'lng': 2.1196355}",2021-04-30T13:29:36.141Z,,,,49,"{'oneStar': 5, 'twoStar': 1, 'threeStar': 4, '...",[],[https://lh5.googleusercontent.com/p/AF1QipNYs...
25,LAS GOLONDRINAS / Port de Barcelona,,Tourist attraction,"Moll de les Drassanes, s/n, 08039 Barcelona, S...",,"95GH+4G Barcelona, Spain",lasgolondrinas.com,+34 934 42 31 06,False,False,...,,"{'lat': 41.3753031, 'lng': 2.1788549}",2021-04-30T13:29:50.252Z,,,,725,"{'oneStar': 31, 'twoStar': 33, 'threeStar': 10...",[],[https://lh5.googleusercontent.com/p/AF1QipOcy...
28,Fundació Vila Casas,,Foundation,"Carrer d'Ausiàs Marc, 20, 08010 Barcelona, Spain",,"95RG+84 Barcelona, Spain",fundaciovilacasas.com,+34 934 81 79 80,False,False,...,,"{'lat': 41.3908126, 'lng': 2.1752295999999998}",2021-04-30T13:30:24.004Z,,,,182,"{'oneStar': 3, 'twoStar': 1, 'threeStar': 9, '...",[],[https://lh5.googleusercontent.com/p/AF1QipO4s...
29,Fundació Suñol,,Modern art museum,"Carrer de Mejía Lequerica, 14, 08028 Barcelona...",,"94MG+3J Barcelona, Spain",fundaciosunol.org,+34 934 96 10 32,False,False,...,,"{'lat': 41.382647, 'lng': 2.1264088}",2021-04-30T13:30:35.134Z,,,,81,"{'oneStar': 6, 'twoStar': 0, 'threeStar': 12, ...",[],[https://lh5.googleusercontent.com/p/AF1QipNqE...
31,Torre de Collserola,,Tower,"Ctra. de Vallvidrera al Tibidabo, S/N, 08017 B...",,"C487+WP Barcelona, Spain",torredecollserola.com,+34 934 06 93 54,True,False,...,,"{'lat': 41.4172581, 'lng': 2.1142762}",2021-04-30T13:31:06.617Z,,,,391,"{'oneStar': 8, 'twoStar': 8, 'threeStar': 30, ...",[],[https://lh5.googleusercontent.com/p/AF1QipPVT...
33,Tibidabo Funicular,,Tourist attraction,"Plaça del Doctor Andreu, s/n, 08035 Barcelona,...",,"C48J+HF Barcelona, Spain",,+34 932 11 79 42,True,False,...,,"{'lat': 41.4163981, 'lng': 2.131217}",2021-04-30T13:32:05.083Z,,,,3958,"{'oneStar': 343, 'twoStar': 164, 'threeStar': ...",[],[https://lh5.googleusercontent.com/p/AF1QipNmP...


In [34]:
df[df['title'].popularTimesHistogram[0]

{'Su': [{'hour': 6, 'occupancyPercent': 0},
  {'hour': 7, 'occupancyPercent': 0},
  {'hour': 8, 'occupancyPercent': 0},
  {'hour': 9, 'occupancyPercent': 0},
  {'hour': 10, 'occupancyPercent': 28},
  {'hour': 11, 'occupancyPercent': 68},
  {'hour': 12, 'occupancyPercent': 100},
  {'hour': 13, 'occupancyPercent': 89},
  {'hour': 14, 'occupancyPercent': 58},
  {'hour': 15, 'occupancyPercent': 51},
  {'hour': 16, 'occupancyPercent': 68},
  {'hour': 17, 'occupancyPercent': 77},
  {'hour': 18, 'occupancyPercent': 60},
  {'hour': 19, 'occupancyPercent': 32},
  {'hour': 20, 'occupancyPercent': 0},
  {'hour': 21, 'occupancyPercent': 0},
  {'hour': 22, 'occupancyPercent': 0},
  {'hour': 23, 'occupancyPercent': 0}],
 'Mo': [{'hour': 6, 'occupancyPercent': 0},
  {'hour': 7, 'occupancyPercent': 0},
  {'hour': 8, 'occupancyPercent': 0},
  {'hour': 9, 'occupancyPercent': 0},
  {'hour': 10, 'occupancyPercent': 14},
  {'hour': 11, 'occupancyPercent': 19},
  {'hour': 12, 'occupancyPercent': 21},
  {'ho