# What can affect how you rate a beer?

When we taste a beer and rate it we might think that our rating is based solely on how much we liked or disliked the beer. However what we might not realize is that we may not be completely rational when it comes to our most liked or disliked beers/breweries, and we might rate them the way we do because of underlying intrinsic and extrinsic factors that wer are not consciously aware of.
For instance you might really like Pale Ale because it was the first beer you tried in college in the US, but maybe if you had went to college in Germany your palate would've been more accustomed to Lager and you would've rated it better than Pale Ale style beers.
With this project we wanted to dig deeper into the ratings and reviews given by users to highlight factors that might create a conscious or subconscious bias to how you rate a beer. However the list of such biases is unlimited, so we decided to focus on main biases that could affect how you construct your opinion and explored how they do so. <br>
So without further ado, here's a data exploration of how your senses, location and political events you country might've witnessed can shape the way you rate a beer.

## Table of Contents
- [Imports and paths](#imports)
- [Data extraction](#dataextraction)
    - [BeerAdvocate data](#beeradvocate)
    - [RateBeer data](#ratebeer)
    - [Matched data](#matched_data)
    - [All reviews](#all_reviews)
- [Data description](#datadescription)
    - [Beers](#beer_data)
    - [Breweries](#brewery_data)
    - [Users](#users)
    - [Reviews](#reviews)
- [Data pre-processing](#data_preprocessing)
    - [Users](#users_processing)
    - [Breweries](#breweries_processing)
    - [Reviews](reviews_processing)
- [Data exploration](#data_exploration)
- [Sensorial evaluation](#sensory)
- [Location bias](#location)
- [Political/Economic climate effect](#political)
- [Chapter 3](#chapter_3)
    - [Section 3.1](#section_3_1)
        - [Sub Section 3.1.1](#sub_section_3_1_1)
        - [Sub Section 3.1.2](#sub_section_3_1_2)
    - [Section 3.2](#section_3_2)
        - [Sub Section 3.2.1](#sub_section_3_2_1)
        - [Sub Section 3.2.2](#sub_section_3_2_2)

## Imports and paths<a class="anchor" id="imports"></a>

In [1]:
#reading data
import gzip #to read gzip files

#manipulating data 
import pandas as pd
import numpy as np
import datetime

#plotting
import matplotlib.pyplot as plt

#serializing data
import pickle

#statistics
from scipy.stats import linregress

#for machine learning algorithms 
import torch

#set the seed for reproducibility 
np.random.seed(4)

In [2]:
PATH = '../'
DATA_PATH = './data/' 
DEFAULT_ENCODING = 'UTF8'
DEFAULT_COMPRESSION = 'gzip'

## Data extraction <a class="anchor" id="dataextraction"></a>

We start by extracting the data for beers, breweries and users from the two rating websites BeerAdvocate and RateBeer as well as the matched data. <br>
We view the dataframe first few rows to get a firt glance at what the data contains before pre-processing it:

### BeerAdvocate data <a class="anchor" id="beeradvocate"></a>

In [3]:
BA_beers = pd.read_csv(PATH+'BeerAdvocate/beers.csv', index_col='beer_id')
BA_beers.head(1)

Unnamed: 0_level_0,beer_name,brewery_id,brewery_name,style,nbr_ratings,nbr_reviews,avg,ba_score,bros_score,abv,avg_computed,zscore,nbr_matched_valid_ratings,avg_matched_valid_ratings
beer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
166064,Nashe Moskovskoe,39912,Abdysh-Ata (Абдыш Ата),Euro Pale Lager,0,0,,,,4.7,,,0,


In [4]:
BA_breweries = pd.read_csv(PATH+'BeerAdvocate/breweries.csv', index_col='id')
BA_breweries.head(1)

Unnamed: 0_level_0,location,name,nbr_beers
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
39912,Kyrgyzstan,Abdysh-Ata (Абдыш Ата),5


In [5]:
BA_users = pd.read_csv(PATH+'BeerAdvocate/users.csv', index_col= 'user_id')
BA_users.head(1)

Unnamed: 0_level_0,nbr_ratings,nbr_reviews,user_name,joined,location
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
nmann08.184925,7820,465,nmann08,1199704000.0,"United States, Washington"


As for the BeerAdvocate text reviews, the text files were in a quite special format, we extracted them in the notebook 'TransformTextfileToCsv.ipynb' ([here](TransformTextfileToCsv.ipynb)) and saved them into csv files which we load below to take a look at: 

In [6]:
#open the csv file
df_BA_reviews = pd.read_csv('../DataframeStorage/df_BA_reviews.csv')
df_BA_reviews.head(1)

KeyboardInterrupt: 

### RateBeer data <a class="anchor" id="ratebeer"></a>

In [7]:
RB_beers = pd.read_csv(PATH+'RateBeer/beers.csv', index_col='beer_id')
RB_beers.head(1)

Unnamed: 0_level_0,beer_name,brewery_id,brewery_name,style,nbr_ratings,overall_score,style_score,avg,abv,avg_computed,zscore,nbr_matched_valid_ratings,avg_matched_valid_ratings
beer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
410549,33 Export (Gabon),3198,Sobraga,Pale Lager,1,,,2.72,5.0,2.0,,0,


In [8]:
RB_breweries = pd.read_csv(PATH+'RateBeer/breweries.csv', index_col= 'id')
RB_breweries.head(1)


Unnamed: 0_level_0,location,name,nbr_beers
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
3198,Gabon,Sobraga,3


In [9]:
RB_users = pd.read_csv(PATH+'RateBeer/users.csv', index_col= 'user_id')
RB_users.head(1)

Unnamed: 0_level_0,nbr_ratings,user_name,joined,location
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
175852,1890,Manslow,1337508000.0,Poland


For the RateBeer text reviews, the text files were in a quite special format, we extracted them in the notebook 'TransformTextfileToCsv.ipynb' ([here](TransformTextfileToCsv.ipynb)) and saved them into csv files:

In [29]:
#open the csv file
df_RB_reviews = pd.read_csv('../DataframeStorage/df_RB_reviews.csv')
df_RB_reviews.head(1)

Unnamed: 0,beer_name,beer_id,brewery_name,brewery_id,style,abv,date,user_name,user_id,appearance,aroma,palate,taste,overall,rating,text,brewery_location,brewery_merged_location,user_location,user_merged_location
0,33 Export (Gabon),410549,Sobraga,3198,Pale Lager,5.0,2016-04-26 10:00:00,Manslow,175852,2,4,2,4,8,2.0,"Puszka 0,33l dzięki Christoph . Kolor jasnozło...",Gabon,Gabon,Poland,Poland


### Matched data <a class="anchor" id="matched_data"></a>

In [23]:
matched_beers = pd.read_csv(PATH+'matched_beer_data/beers.csv', header=1)
matched_beers.head(1)

Unnamed: 0,abv,avg,avg_computed,avg_matched_valid_ratings,ba_score,beer_id,beer_name,beer_wout_brewery_name,brewery_id,brewery_name,...,brewery_id.1,brewery_name.1,nbr_matched_valid_ratings.1,nbr_ratings.1,overall_score,style.1,style_score,zscore.1,diff,sim
0,4.8,3.45,3.439867,3.504068,80.0,19827,Legbiter,Legbiter,10093,Strangford Lough Brewing Company Ltd,...,4959,Strangford Lough,89,89,23.0,Golden Ale/Blond Ale,27.0,-0.698304,1.0,1.0


In [24]:
matched_breweries = pd.read_csv(PATH+'matched_beer_data/breweries.csv', header=1)
matched_breweries.head(1)

Unnamed: 0,id,location,name,nbr_beers,id.1,location.1,name.1,nbr_beers.1,diff,sim
0,10093,Northern Ireland,Strangford Lough Brewing Company Ltd,5,4959,Northern Ireland,Strangford Lough,5,0.431275,0.889062


In [33]:
matched_users = pd.read_csv(PATH+'matched_beer_data/users.csv', header=1)
matched_users.head(1)

Unnamed: 0,joined,location,nbr_ratings,nbr_reviews,user_id,user_name,user_name_lower,joined.1,location.1,nbr_ratings.1,user_id.1,user_name.1,user_name_lower.1
0,1220868000.0,Germany,6,6,erzengel.248045,Erzengel,erzengel,1224324000.0,Germany,8781,83106,Erzengel,erzengel


In [40]:
matched_ratings = pd.read_csv(PATH+'matched_beer_data/ratings.csv', encoding = "ISO-8859-1", header=1)
matched_ratings.head(1)

Unnamed: 0,abv,appearance,aroma,beer_id,beer_name,brewery_id,brewery_name,date,overall,palate,...,brewery_name.1,date.1,overall.1,palate.1,rating.1,style.1,taste.1,text.1,user_id.1,user_name.1
0,11.3,4.5,4.5,645,Trappistes Rochefort 10,207,Brasserie de Rochefort,1324810800,5.0,4.5,...,Brasserie Rochefort,1387710000,19.0,4.0,4.6,Abt/Quadrupel,9.0,a) Geruch malzig-schwer-sÃÂ¼ÃÂ. Riecht sc...,83106,Erzengel


### All reviews <a class="anchor" id="all_reviews"></a>

In [10]:
df_reviews_all = pd.read_csv(PATH+'/DataframeStorage/df_reviews_all.csv')

  df_reviews_all = pd.read_csv(PATH+'/DataframeStorage/df_reviews_all.csv')


## Data description  <a class="anchor" id="datadescription"></a>

### Beers  <a class="anchor" id="beer_data"></a>

Both rating websites contain similar structures for the beer datasets

The variables in the BeerAdvocate beer data are:
- *beer_id*(we set it as index) and *beer_name* : the id and name of the beer (which are also used in the reviews)
- *brewery_id* : the id of the brewery (which is also used in the reviews)
- *nbr_ratings* and *nbr_reviews*: the number of ratings and reviews a beer received
- *avg* : the average rating of a beer (from 0 and 5)
- *ba_score* : the beer's overall score based on its ranking within its style category. It's based on the beer's truncated (trimmed) mean and a custom Bayesian (weighted rank) formula that takes the beer's style into consideration. Its purpose is to provide consumers with a quick reference using a format that's familiar to the wine and liquor worlds.<br> 
The score is out of 100 with the following range: <br>
95-100 = world-class <br>
90-94 = outstanding <br>
85-89 = very good <br>
80-84 = good <br>
70-79 = okay <br>
60-69 = poor <br>
< 60 = awful <br>
- *bros_score* : the score given by the two brothers running the BeerAdvocate website
- *abv* : is the alcohol content of the beer (%)
- *avg_computed* : 
- *zscore* : 
- *nbr_matched_valid_ratings* : 
- *avg_matched_valid_ratings* : <br>
reference [here](https://www.beeradvocate.com/community/threads/beeradvocate-ratings-explained.184726/)

The RateBeer beer dataframe contains all the variables listed above except for the *nbr_reviews*, *ba_score* and *bros_score* (which makes sense). Instead the Ratebeer website has three different scores: <br>
- *overall_score* : a score that reflects the rating given by RateBeer users and how this beer compares to all other beers on RateBeer. (out of 100)
- *style_score* : a score that ranks this beer against all beers within its own style category.
<br>
Those two scores are calculated only from ratings that are accompanied with a written review of 75 or more characters. A rating doesn't count toward the final rating if the rater has left fewer than 10 ratings, if it is is deemed unauthentic, derogatory or abusive or if the rating was made by a brewer or brewer affiliate. <br>
reference [here](https://www.ratebeer.com/our-scores)

###DESCRIBE MATCHED DATA<br>
We have a dataset available with matched datapoints. <br>
This is useful if we want to look at a certain effect and compare it in both rating websites. <br>
The matching was done in a project that investigated the herding effect. In particular how the first rating influences the final score of the beer. <br>
The rationale behind matching is to avoid biases due to difference in the two populations. For example could individuals of one group be more wealthy, which would introduce the hidden variable "income". <br>
Therefore, the propensity score was introduced. It uses the available features to predict the outcome variable y, for example via regression. <br>
In the matched dataset we have pairs of individuals from both groups, with the most similar scores. <br>
This minimizes the risk for hidden variables and insures that we have the same number of individuals from both groups.<br>
The latter is important to avoid higher weighting of one group due to overrepresentation.

For our project the matched dataset is not interesting, as we investigate global effects where comparison inside the group is not required.

### Breweries  <a class="anchor" id="brewery_data"></a>

On both rating websites the breweries data contain the same variables: 
- *id* (set as index) : the brewery's id which corresponds ot the id *brewery_id* from the beer dataset
- *location* : the location of the brewery
- *name* : the name of the brewery
- *nbr_beers* : the number of beers that brewery produces

##DESCRIBE MATCHED BREWERIES

### Users  <a class="anchor" id="users"></a>

On both rating websites the user data contains the same variables (except for *nbr_reviews* which is unique to BeerAdvocate):
- *user_id* (set as index) : the user_id 
- *nbr_ratings* : the number of ratings the reviewer has put on the website
- *nbr_reviews* (unique to BeerAdvocate) : the number of reviews a user has given on the website
- *user_name* : the username
- *joined* : the date when the user joined the websites
- *location* : the user's location

#DESCRIBE MATCHED USERS

### Reviews  <a class="anchor" id="reviews"></a>

From both websites the reviews have the same structure, they contain the following elements:
- *beer_name* and *beer_id* : the beer name abd id of the corresponding beer in the beer dataset
- *brewery_name* and *brewery_id* : the brewery name and id of the corresponding brewery in the brewery dataset
- *style* : corresponds to one out of 104 (BeerAdvocate) or 93 (RateBeer) different beer styles
- *abv* : the alcohol content of the beer (%)
- *date* : the date of the review
- *user_name* and *user_id* : the username and the id of the reviewer (which correspond to the ones in user)
- *text* : the text review

For the rating itself both websites do similar things: the user's beer rating is comprised of five relatable attributes (represented as features in the dataframe): *appearance*, *aroma*, *palate*, *taste* and *overall* which contribute to the final *rating*. But they are slightly different:

* BeerAdvocate:
each of the five attributes is given a point value on a 1–5 point scale with 0.25 increments, and then the final user rating is calculated using BeerAvocate's weighted rating system, wherein certain attributes are applied with more importance (The weights for each of the attributes are the following: 6% for appearance, 24% for the smell, 10% for palate and 40% for the taste). reference [here](https://www.beeradvocate.com/community/threads/how-to-review-a-beer.241156/)

* RateBeer:
Appearance and Mouthfeel are each scored out of 5. Aroma and Taste are scored out of 10. While Overall is scored out of 20. These all combine to give the beer a total score out of 50, which is then divided and displayed as a score out of 5 for each rating. reference [here](https://www.ratebeer.com/our-scores).

## Data pre-processing  <a class="anchor" id="data_preprocessing"></a>

The data preprocessing done in this section allows the generation of three dataframes *df_BA_reviews* *df_RB_reviews* and *df_reviews_all* which contain the processed reviews from both websites and df_reviews_all contains both dataframes concatenated.
We save these dataframes into a folder called DataFrameStorage as these computations are extremely long. 
This data pre-processing was thus ran once and in the following sections we just read the dataframes from the folder.

### Users  <a class="anchor" id="users_processing"></a>

For both websites :
- We convert the *joined* column from seconds into a datetime object 
- We add a column called *merged_location* where all the states of the United States are just defined as United States (this will be useful for analyses where we only care about comparing countries)

In [46]:
BA_users['joined'] = pd.to_datetime(BA_users['joined'],unit = 's')

# Add a column where all Users from the united states get 'United States' as location
BA_users['location'] = BA_users['location'].fillna('Unknown')
BA_users['merged_location'] = BA_users['location'].copy()
BA_users.loc[(BA_users['merged_location'].str.startswith('United States')), 'merged_location'] = 'United States'

In [47]:
RB_users['joined'] = pd.to_datetime(RB_users['joined'] ,unit = 's')

# Add a column where all Users from the united states get 'United States' as location
RB_users['location'] = RB_users['location'].fillna('Unknown')
RB_users['merged_location'] = RB_users['location'].copy()
RB_users.loc[(RB_users['merged_location'].str.startswith('United States')), 'merged_location'] = 'United States'

For the matched data we just convert the *joined* to a datatime object as we will use primarly the two other dataframes when it comes to analyzing locations:

In [48]:
matched_users['joined'] = pd.to_datetime(matched_users['joined'],unit = 's')
matched_users['joined.1'] = pd.to_datetime(matched_users['joined.1'],unit = 's')

### Breweries  <a class="anchor" id="breweries_processing"></a>

We proceed similarly with the two website breweries:

In [None]:
# Add a column where all Users from the united states get 'United States' as location
RB_breweries['merged_location'] = RB_breweries['location'].copy()
RB_breweries.loc[(RB_breweries['merged_location'].str.startswith('United States')), 'merged_location'] = 'United States'

In [None]:
BA_breweries['merged_location'] = BA_breweries['location'].copy()
BA_breweries.loc[(BA_breweries['merged_location'].str.startswith('United States')), 'merged_location'] = 'United States'

### Reviews  <a class="anchor" id="reviews_processing"></a>

To facilitate consequent analyses, we wanted to have both the user location and the brewery location as columns in the review data, we proceed to do the following:
To allow the merge of the brewery location with the brewery name on df_reviews, we first need to align the column names. <br>
To do so we rename the column "name" of breweries to "brewery_name" (the name of the column in df_reviews). <br>

We use the function df.merge to add the element "location" to the corresponding brewery in df_reviews. <br>
By default the new columns name would be the same as in df_breweries ('location'), we rename it by "brewery_location" to prevent confusion. <br>

After the merging we rename the column of the brewery dataframe again, to have the same name that we had initially.

In [None]:
#Rename the column name to brewery_name to allow merging with the 
BA_breweries.rename(columns = {'name':'brewery_name'}, inplace = True)
RB_breweries.rename(columns = {'name':'brewery_name'}, inplace = True)


#Add the location to the beer dataframe
df_BA_reviews = (df_BA_reviews.merge(BA_breweries[['location', 'brewery_name']], on=['brewery_name'], how='left')).rename(columns = {'location':'brewery_location'})
df_RB_reviews = (df_RB_reviews.merge(RB_breweries[['location', 'brewery_name']], on=['brewery_name'], how='left')).rename(columns = {'location':'brewery_location'})


#Name the columns back: 
BA_breweries.rename(columns = {'brewery_name':'name'}, inplace = True)
RB_breweries.rename(columns = {'brewery_name':'name'}, inplace = True)

As the different states of the US are described as different locations, but sometimes we want to group the beers by nations and not by states. <br>
To make this possible we add a new column. <br>
This column is called "brewery_merged_location" and is exactly the same as the column "brewery_location" except for the fact, that the different states of the US all take the value "United States".

In [None]:
# Add a column where all breweries from the united states get 'United States' as location
df_RB_reviews['brewery_merged_location'] = df_RB_reviews['brewery_location'].copy()
df_RB_reviews.loc[(df_RB_reviews['brewery_merged_location'].str.startswith('United States')), 'brewery_merged_location'] = 'United States'


df_BA_reviews['brewery_merged_location'] = df_BA_reviews['brewery_location'].copy()
df_BA_reviews.loc[(df_BA_reviews['brewery_merged_location'].str.startswith('United States')), 'brewery_merged_location'] = 'United States'

The same approach as for the brewery locations is used to add the country of origin of the users. <br>
To group them by nations we add here a column that sums up the states of the US under the value "United States".

In [None]:
#Add the country of origin of the reviewer to the review
df_RB_reviews = (df_RB_reviews.merge(RB_users[['location', 'user_name']], on=['user_name'], how='left')).rename(columns={'location':'user_location'})
df_BA_reviews = (df_BA_reviews.merge(BA_users[['location', 'user_name']], on=['user_name'], how='left')).rename(columns={'location':'user_location'})

# Add a column where all users from the united states get 'United States' as location
df_RB_reviews['user_location'] = df_RB_reviews['user_location'].fillna('Unknown')
df_RB_reviews['user_merged_location'] = df_RB_reviews['user_location'].copy()
df_RB_reviews.loc[(df_RB_reviews['user_merged_location'].str.startswith('United States')), 'user_merged_location'] = 'United States'

#For BA we have to replace the NaN values
df_BA_reviews['user_location'] = df_BA_reviews['user_location'].fillna('Unknown')
df_BA_reviews['user_merged_location'] = df_BA_reviews['user_location'].copy()
df_BA_reviews.loc[(df_BA_reviews['user_merged_location'].str.startswith('United States')), 'user_merged_location'] = 'United States'

In [None]:
# Add a column where all users from the united states get 'United States' as location
df_RB_reviews['user_location'] = df_RB_reviews['user_location'].fillna('Unknown')

df_RB_reviews['user_merged_location'] = df_RB_reviews['user_location'].copy()
df_RB_reviews.loc[(df_RB_reviews['user_merged_location'].str.startswith('United States')), 'user_merged_location'] = 'United States'

In [None]:
#Transform the dates of the reviews from string to datetime
df_BA_reviews['date'] = pd.to_datetime(df_BA_reviews['date'], format="%Y-%m-%d %H:%M:%S")
df_RB_reviews['date'] = pd.to_datetime(df_RB_reviews['date'], format="%Y-%m-%d %H:%M:%S")

To avoid doing these heavy computations (each take a long time) several times, we saved those dataframes as mentionned earlier; <br>

In [None]:
#Do that only once. Stores the dataframes so that we can access them more easily next time and have to do the data preprocessing only once
df_BA_reviews.to_csv(PATH+'/DataframeStorage/df_BA_reviews.csv', columns=['beer_name', 'beer_id', 'brewery_name', 'brewery_id', 'style', 'abv', 'date', 'user_name', 'user_id', 'appearance', 'aroma', 'palate', 'taste', 'overall', 'rating', 'text', 'brewery_location', 'brewery_merged_location', 'user_location', 'user_merged_location'], index=False)

In [None]:
df_RB_reviews.to_csv(PATH+'/DataframeStorage/df_RB_reviews.csv', columns=['beer_name', 'beer_id', 'brewery_name', 'brewery_id', 'style', 'abv', 'date', 'user_name', 'user_id', 'appearance', 'aroma', 'palate', 'taste', 'overall', 'rating', 'text', 'brewery_location', 'brewery_merged_location', 'user_location', 'user_merged_location'], index=False)

In [74]:
df_reviews_all = pd.concat([df_BA_reviews, df_RB_reviews], axis=0)

df_reviews_all.to_csv(PATH+'/DataframeStorage/df_reviews_all.csv', columns=['beer_name', 'beer_id', 'brewery_name', 'brewery_id', 'style', 'abv', 'date', 'user_name', 'user_id', 'appearance', 'aroma', 'palate', 'taste', 'overall', 'rating', 'text', 'brewery_location', 'brewery_merged_location', 'user_location', 'user_merged_location'], index=False)

## Data exploration  <a class="anchor" id="data_exploration"></a>

We have our review dataframes stored in DataframeStorage, so we just start by reading those below (in different cells because this is computationally expensive):

In [52]:
df_BA_reviews = pd.read_csv('../DataframeStorage/df_BA_reviews.csv')

In [53]:
df_RB_reviews = pd.read_csv('../DataframeStorage/df_RB_reviews.csv')

In [3]:
df_reviews_all = pd.read_csv(PATH+'/DataframeStorage/df_reviews_all.csv')

  df_reviews_all = pd.read_csv(PATH+'/DataframeStorage/df_reviews_all.csv')


The dates in df_reviews are saved as strings. <br>
A first step (will be important later) is to transform them to datetime. <br>
Probably this step we will have to do everytime we load the csv file.

In [16]:
#Transform the dates of the reviews from string to datetime
df_BA_reviews['date'] = pd.to_datetime(df_BA_reviews['date'], format="%Y-%m-%d %H:%M:%S")
df_RB_reviews['date'] = pd.to_datetime(df_RB_reviews['date'], format="%Y-%m-%d %H:%M:%S")

```
HERE WE CAN INCLUDE A FEW NUMBERS AND PLOTS RELEVANT TO OUR STORY

```

## Sensorial evaluation  <a class="anchor" id="sensory"></a>

## Location bias  <a class="anchor" id="location"></a>

## Political/Economic climate effect  <a class="anchor" id="political"></a>