# Finding season-dependent keywords

In this notebook, we investigate the reviews based on the season in which they were written. We will find words that are more commonly used in the winter season (specifically from January until March) and those that are more commonly used during the summer (specifically from July until September).

In [8]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import scipy.stats
from nltk.tokenize import RegexpTokenizer

In [9]:
DATA_PATH = './datasets/BeerAdvocate/'

In [10]:
# load Beer Advocate data from pickles
reviews = pd.read_pickle(DATA_PATH + 'reviews.pkl')
reviews.head()

Unnamed: 0,Beer Name,Beer Id,Brewery Name,Brewery Id,Style,Abv,Date,Username,User Id,Appearance,Aroma,Palate,Taste,Overall,Rating,Text,Review
0,Régab,142544,Societe des Brasseries du Gabon (SOBRAGA),37262,Euro Pale Lager,4.5,2015-08-20 12:00:00,nmann08,nmann08.184925,3.25,2.75,3.25,2.75,3.0,2.88,"From a bottle, pours a piss yellow color with ...",
1,Barelegs Brew,19590,Strangford Lough Brewing Company Ltd,10093,English Pale Ale,4.5,2009-02-20 12:00:00,StJamesGate,stjamesgate.163714,3.0,3.5,3.5,4.0,3.5,3.67,Pours pale copper with a thin head that quickl...,
2,Barelegs Brew,19590,Strangford Lough Brewing Company Ltd,10093,English Pale Ale,4.5,2006-03-13 12:00:00,mdagnew,mdagnew.19527,4.0,3.5,3.5,4.0,3.5,3.73,"500ml Bottle bought from The Vintage, Antrim.....",
3,Barelegs Brew,19590,Strangford Lough Brewing Company Ltd,10093,English Pale Ale,4.5,2004-12-01 12:00:00,helloloser12345,helloloser12345.10867,4.0,3.5,4.0,4.0,4.5,3.98,Serving: 500ml brown bottlePour: Good head wit...,
4,Barelegs Brew,19590,Strangford Lough Brewing Company Ltd,10093,English Pale Ale,4.5,2004-08-30 12:00:00,cypressbob,cypressbob.3708,4.0,4.0,4.0,4.0,4.0,4.0,"500ml bottlePours with a light, slightly hazy ...",


In [11]:
#Find the Month in which the review has been written
reviews['Month'] = reviews['Date'].apply(lambda x: x.month) #adding Month data
styles = reviews['Style'].unique()

In [12]:
def get_words(text):
    word_tokenizer = RegexpTokenizer(r'[a-zA-Z]+')
    word_list = word_tokenizer.tokenize(text)
    lowercase_words = list(map(lambda x: x.lower(),word_list))
    return lowercase_words

In [17]:
thresh = 0.2

for style in styles:
    
    winter_words = (reviews[(1<=reviews.Month)&(reviews.Month<=3)&(reviews.Style==style)])['Text'].apply(get_words).explode()
    summer_words = (reviews[(7<=reviews.Month)&(reviews.Month<=9)&(reviews.Style==style)])['Text'].apply(get_words).explode()
    n = 100
    winter_total = len(winter_words)
    summer_total = len(summer_words)
    winter_counts = winter_words.value_counts()[:n]
    summer_counts = summer_words.value_counts()[:n]
    winter_large = winter_words.value_counts()
    summer_large = summer_words.value_counts()
    print('Style: {}'.format(style))
    for word in winter_counts.keys():
        winter_freq = winter_counts[word]/winter_total
        summer_freq = summer_large[word]/summer_total
        if (winter_freq-summer_freq)/winter_freq > thresh:
            print("\t {} appears {:.5f}% in winter and {:.5f}% in summer".format(word,100*winter_freq,100*summer_freq));
    for word in summer_counts.keys():
        summer_freq = summer_counts[word]/summer_total
        winter_freq = winter_large[word]/winter_total
        if (summer_freq-winter_freq)/summer_freq > thresh:
            print("\t {} appears {:.5f}% in summer and {:.5f}% in winter".format(word,100*summer_freq,100*winter_freq));

Style: Euro Pale Lager
Style: English Pale Ale
Style: English Bitter
Style: American Pale Wheat Ale
	 summer appears 0.32045% in summer and 0.15715% in winter
	 refreshing appears 0.23396% in summer and 0.18504% in winter
Style: Irish Red Ale
	 clear appears 0.23734% in winter and 0.18887% in summer
	 great appears 0.20511% in winter and 0.15127% in summer
	 brown appears 0.31378% in summer and 0.22299% in winter
	 roasted appears 0.19493% in summer and 0.15078% in winter
Style: American Stout
Style: Milk / Sweet Stout
Style: Irish Dry Stout
Style: Munich Helles Lager
	 quot appears 0.25418% in winter and 0.19718% in summer
Style: English Brown Ale
Style: English India Pale Ale (IPA)
Style: English Porter
Style: American IPA
Style: American Pale Ale (APA)
Style: Foreign / Export Stout
Style: American Double / Imperial Stout
Style: Berliner Weissbier
	 peach appears 0.28780% in summer and 0.16116% in winter
Style: American Blonde Ale
	 summer appears 0.25144% in summer and 0.10044% in w

Style: Euro Strong Lager
	 thin appears 0.17089% in winter and 0.13511% in summer
	 malts appears 0.19428% in summer and 0.14930% in winter
Style: American Strong Ale
Style: Dubbel
Style: Cream Ale
	 coffee appears 0.36129% in winter and 0.19591% in summer
	 vanilla appears 0.26165% in summer and 0.18191% in winter
Style: Vienna Lager
	 style appears 0.24535% in summer and 0.19269% in winter
Style: Light Lager
	 lime appears 0.23239% in summer and 0.12772% in winter
Style: Quadrupel (Quad)
Style: American Pale Lager
Style: Munich Dunkel Lager
Style: Kellerbier / Zwickelbier
	 citrus appears 0.36084% in winter and 0.18966% in summer
	 orange appears 0.28728% in winter and 0.19495% in summer
	 quot appears 0.25050% in winter and 0.18437% in summer
	 lemon appears 0.23908% in winter and 0.16775% in summer
	 pretty appears 0.20674% in winter and 0.16170% in summer
	 honey appears 0.20230% in winter and 0.14130% in summer
	 or appears 0.18771% in winter and 0.14961% in summer
	 would appear

Style: Bière de Champagne / Bière Brut
	 not appears 0.76838% in winter and 0.55902% in summer
	 t appears 0.61355% in winter and 0.46404% in summer
	 taste appears 0.48182% in winter and 0.37449% in summer
	 m appears 0.28886% in winter and 0.23066% in summer
	 malt appears 0.28655% in winter and 0.22795% in summer
	 into appears 0.28078% in winter and 0.21438% in summer
	 color appears 0.27962% in winter and 0.21167% in summer
	 much appears 0.26344% in winter and 0.19810% in summer
	 body appears 0.25189% in winter and 0.19810% in summer
	 golden appears 0.24611% in winter and 0.16011% in summer
	 or appears 0.24496% in winter and 0.19539% in summer
	 poured appears 0.23918% in winter and 0.16825% in summer
	 smell appears 0.21491% in winter and 0.16825% in summer
	 quot appears 0.21145% in winter and 0.14111% in summer
	 fruity appears 0.20220% in winter and 0.15197% in summer
	 d appears 0.16061% in winter and 0.11126% in summer
	 high appears 0.15945% in winter and 0.12212% in su