In [17]:
import re
import os.path
import numpy as np
import pandas as pd
from datetime import datetime
import matplotlib.pyplot as plt

# DATA Path for BeerAdvocate
DATA_FOLDER = 'Data/BeerAdvocate/'
BEER_BA_DATA = DATA_FOLDER+"beers.csv"
BREWERY_BA_DATA = DATA_FOLDER+"breweries.csv"
USERS_BA_DATA = DATA_FOLDER+"users.csv"
REVIEWS_BA_DATA = DATA_FOLDER+"reviews.txt.gz"
RATINGS_BA_DATA = DATA_FOLDER+"ratings.txt.gz"

COMPRESSION = 'gzip'


In [18]:
labels = ['beer_name','beer_id','brewery_name','brewery_id','style','abv','date',
          'user_name','user_id','appearance','aroma','palate','taste','overall','rating','text']

In [19]:
review_ba = pd.read_pickle('Data/BeerAdvocate/review_ba.pkl')

## Sentiment analysis

In the <strong>Mileston 2</strong>, we observed that some beers have a marked seasonality. Many reviews contain words such as "summer", "winter", "autumn" or "spring", and we would like to push our analysis further :
* Are those reviews more positive or negative towards the mentioned seasons
* Can we grasp which words are used to qualify the beers in question, e.g does the beer feel warm, sour or bitter


We are looking for words appearing rather frequently to describe a beer which is trendy during Christmas time for example. For that reason, we will conduct an <strong>aspect-based sentiment analysis</strong>.

In order to do that, we will be looking at adjectives and their relative nouns to emphasize the beer's characteristics. That corresponds to <strong>adjectival modifiers</strong> mainly.

In [7]:
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob
# import warnings
# warnings.filterwarnings("ignore")

In [8]:
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe('spacytextblob')



<spacytextblob.spacytextblob.SpacyTextBlob at 0x1f09ed6a2b0>

In [218]:
def analyze_sentiments(text: str) -> tuple([bool,list[str],list[str]]):
    """Analyze the review to determine wether it is positive or negative

    Args:
        text (str): The text to be analyzed

    Returns:
        sentiment: A boolean; true if the sentiment is positive, false otherwise
        positive_words: the list of words assessed as positive by the nlp pipeline
        negative_words: the list of words assessed as negative by the nlp pipeline
    """
    doc = nlp(text)
    sentiment = doc._.blob.polarity

    positive_words, negative_words = [], []
    for assess in doc._.blob.sentiment_assessments.assessments:
        if assess[1] > 0:
            positive_words.append(assess[0][0])
        elif assess[1] < 0:
            negative_words.append(assess[0][0])
        else:
            pass

    return sentiment > 0, positive_words, negative_words


In [219]:
# a small function to choose wich token should be kept as an aspect, and relevant for the description
selected = lambda token : True if (token.dep_ == "amod" or token.dep_=="compound") and (token.pos_ == "ADJ" or token.pos_=="ADV") else False

def get_aspects(text):
    """From several sentences, detect key aspects and their ajective qualifying the object in question

    Args:
        text (str): the text to be analyzed

    Returns:
        list (set): A list containing pairs of (aspect, description)
    """
    aspects = []
    for sentence in text.split("."):
        for token in nlp(sentence):
            # if (token.dep_ == "amod" or token.dep_=="compound") and (token.pos_ == "ADJ" or token.pos_=="NOUN"):
            if selected(token):
                aspects.append({'aspect': token.head.text, 'description': token})    
    return aspects 

For now let's just focus on the reviews containing "winter".

In [214]:
review_ba["winter"]=review_ba['text'].apply(lambda x : int(bool(len(re.findall('winter',x.strip().lower())))))
beer_winter_style=review_ba.loc[review_ba['winter'] == 1]
beer_winter_style.reset_index(drop=True,inplace=True)

In [220]:
print(analyze_sentiments(beer_winter_style.text[76])[0])
get_aspects(beer_winter_style.text[76]) # splitting the sentences according to punctuation

True


[{'aspect': 'bottles', 'description': other},
 {'aspect': 'beers', 'description': other},
 {'aspect': 'winters', 'description': smoothest},
 {'aspect': 'beers', 'description': other},
 {'aspect': 'hue', 'description': nice},
 {'aspect': 'nothing', 'description': strong},
 {'aspect': 'beer', 'description': strong},
 {'aspect': 'night', 'description': cold},
 {'aspect': 'choice', 'description': great},
 {'aspect': 'offerings', 'description': other},
 {'aspect': 'sip', 'description': enjoyable},
 {'aspect': 'sip', 'description': mellow}]

In [221]:
# just checking the text to see if the aspects and sentiments found above seem coherent
beer_winter_style.text[76]

" Purchased as part of the gift pack along with the two other bottles and glass, and now that the nights are cold, this fit in well with the season! This wasn't as dark or heavy as other winter beers that I've had but this had to be one of the smoothest winters I've ever tasted and like the other Innis and Gunn beers that I tried, the carbonation was so low that I wondered if this was beer or not. It's good to have something that stands out from the crowd!This had a nice mahogany-oak hue to it with not a lot of head or lacing to be seen. The aroma wasn't strong but there was some oak, vanilla, and booze in the smell and all of these were present in the taste as well. Just a hint of peat and charcoal too but nothing so strong as to be offensive. For a beer this strong, this went down well and had a bit of a bourbon feel in the aftertaste, since it was more sticky and not spicy like some rums that I've had. This was thick like rum but had a bit more of a warming effect than I anticipated

Let's build a dataframe regrouping the results of the function above for all the "winter" beers. This dataframe should contain :
* a boolean indicating if the review shows appreciation or not towards the beer *is_positive* : 1 for True, 0 otherwise
* a list of aspects characterising the beer, or the feeling of the reviewer *aspects*
* for each aspects, some adjectives describing the effect of the aspect *describers*

In [223]:
columns = ["beer_id", "style", "is_positive", "aspects", "describers"]