# <center> **Beer Data Analysis** </center>


# **Problem Statement**

Analyse the provided beer data and answer the following questions:

Questions:
1. Rank top 3 Breweries which produce the strongest beers?
2. Which year did beers enjoy the highest ratings?
3. Based on the user’s ratings which factors are important among taste, aroma,
appearance, and palette?
4. If you were to recommend 3 beers to your friends based on this data which ones will
you recommend?
5. Which Beer style seems to be the favorite based on Reviews written by users? How does
written review compare to overall review score for the beer style?

# 1. **Data Exploratry** 

1.1   Loading data from CSV file



In [1]:
#loading required packages
import pandas as pd
import datetime
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from sklearn import linear_model
from sklearn.feature_selection import RFE
from sklearn.ensemble import RandomForestRegressor
from textblob import TextBlob
from scipy import stats
import plotly.express as px
from sklearn.model_selection import train_test_split
import numpy as np
import nltk
from textblob import TextBlob

  import pandas.util.testing as tm


formatting the  seconds to  actual values in datetime

In [2]:
def dateparse (time_in_secs):  
    """
    Converting time in seconds to datetime
    Parameters: 
    time_in_secs (int): time in seconds
  
    Returns: 
    datetime: converted datetime object
    """  

    return datetime.datetime.fromtimestamp(float(time_in_secs))

#columns to be considered from CSV file
cols_names = ['beer_ABV','beer_beerId','beer_brewerId','beer_name','beer_style','review_appearance','review_palette','review_overall','review_taste','review_profileName','review_aroma','review_text','review_time']

#loading CSV file in to beer_data dataframe
beer_data=pd.read_csv('/content/BeerDataScienceProject.csv',sep=',', names= cols_names, encoding='latin-1',header=0,date_parser=dateparse,parse_dates=['review_time'])

In [3]:
beer_data.head(5)

Unnamed: 0,beer_ABV,beer_beerId,beer_brewerId,beer_name,beer_style,review_appearance,review_palette,review_overall,review_taste,review_profileName,review_aroma,review_text,review_time
0,5.0,47986,10325,Sausa Weizen,Hefeweizen,2.5,2.0,1.5,1.5,stcules,1.5,A lot of foam. But a lot. In the smell some ba...,2009-02-16 20:57:03
1,6.2,48213,10325,Red Moon,English Strong Ale,3.0,2.5,3.0,3.0,stcules,3.0,"Dark red color, light beige foam, average. In ...",2009-03-01 13:44:57
2,6.5,48215,10325,Black Horse Black Beer,Foreign / Export Stout,3.0,2.5,3.0,3.0,stcules,3.0,"Almost totally black. Beige foam, quite compac...",2009-03-01 14:10:04
3,5.0,47969,10325,Sausa Pils,German Pilsener,3.5,3.0,3.0,2.5,stcules,3.0,"Golden yellow color. White, compact foam, quit...",2009-02-15 19:12:25
4,7.7,64883,1075,Cauldron DIPA,American Double / Imperial IPA,4.0,4.5,4.0,4.0,johnmichaelsen,4.5,"According to the website, the style for the Ca...",2010-12-30 18:53:26


observing the shape of the dataset 



In [4]:
beer_data.shape

(528870, 13)

**13 columns and 528770 row **

ABV stands for alcohol to beer volume  to find out how strong to  beer is its measured in percentage

In [5]:
#counting the  null values that can skew results
beer_data.isna().sum()

beer_ABV              20280
beer_beerId               0
beer_brewerId             0
beer_name                 0
beer_style                0
review_appearance         0
review_palette            0
review_overall            0
review_taste              0
review_profileName      115
review_aroma              0
review_text             119
review_time               0
dtype: int64

There are  many  null  values in beer_ABV which is crucial to find out how strong the beer is so  for that purpose it is essential to find out how to replace those null values and of they need to be replaced so for that purpose it is fine to find out the disttibution of ABV values these need to be imputed

In [6]:
distribution_beer_ABV = beer_data['beer_ABV']
fig = px.histogram(distribution_beer_ABV, x="beer_ABV")
fig.show()


39.184k beers have beer_ABV ranging from **4.98-5.02**. The data is positively skewed

In [7]:

beer_data["beer_ABV"].fillna(beer_data["beer_ABV"].mean(), inplace = True) 
#filling  values with mean 
beer_data_transform=beer_data.fillna(beer_data['beer_ABV'].mean())

distribution_beer_ABV_transform = beer_data_transform['beer_ABV']
fig = px.histogram(distribution_beer_ABV_transform, x="beer_ABV")
fig.show()



In [8]:
#replacing blank names  of  profile names with No_Name
beer_data["review_profileName"].fillna("No_Name", inplace = True) 

#replacing blank names  of  review text with  no text 
beer_data['review_text'].fillna("No review",inplace = True)

observing blanks after replacing 

In [9]:
beer_data.isna().sum()

beer_ABV              0
beer_beerId           0
beer_brewerId         0
beer_name             0
beer_style            0
review_appearance     0
review_palette        0
review_overall        0
review_taste          0
review_profileName    0
review_aroma          0
review_text           0
review_time           0
dtype: int64

The nature  of the distribution hasnt changed much after filling  the null values with th mean of beer_ABV.So this change hasnt distorted the shape of the distribution at all  

1.2.   Checking out type of data, summary statistics and reformatting the columns.



In [10]:
pd.set_option('display.max_columns', None) #prevents trailing elipses
pd.set_option('display.max_rows', None)
print(beer_data.describe())

            beer_ABV    beer_beerId  beer_brewerId  review_appearance  \
count  528870.000000  528870.000000  528870.000000      528870.000000   
mean        7.017442   22098.466016    2598.423429           3.864522   
std         2.161781   22158.284352    5281.805350           0.604010   
min         0.010000       3.000000       1.000000           0.000000   
25%         5.300000    1745.000000     132.000000           3.500000   
50%         6.500000   14368.000000     394.000000           4.000000   
75%         8.500000   40528.000000    1475.000000           4.000000   
max        57.700000   77310.000000   27980.000000           5.000000   

       review_palette  review_overall   review_taste   review_aroma  
count   528870.000000   528870.000000  528870.000000  528870.000000  
mean         3.758926        3.833197       3.765993       3.817350  
std          0.685335        0.709962       0.669018       0.718903  
min          1.000000        0.000000       1.000000       1.0

on an average the  ABV demonstrated is 7%  there is the maximum ABV observed as 57% meaning a strong alcohol , on an average the overall review for alchols ranges from 3.7 to 3.9

In [11]:
avg_ABV=beer_data.groupby(['beer_name','beer_brewerId','beer_beerId']).mean()

avg_ABV_sorted = avg_ABV.sort_values('beer_ABV',ascending=False)

avg_ABV_sorted.head(5)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,beer_ABV,review_appearance,review_palette,review_overall,review_taste,review_aroma
beer_name,beer_brewerId,beer_beerId,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Schorschbräu Schorschbock 57%,6513,73368,57.7,4.0,4.0,4.0,4.0,3.5
Schorschbräu Schorschbock 43%,6513,57856,43.0,3.75,4.0,3.75,4.0,4.25
Schorschbräu Schorschbock 40%,6513,55712,39.44,3.666667,3.666667,3.333333,3.166667,3.666667
Schorschbräu Schorschbock 31%,6513,51466,30.86,4.0,4.0,3.5,4.0,4.5
Samuel Adams Utopias,35,25759,27.0,4.198592,4.453521,4.066197,4.246479,4.467606


 **Schorschbräu Schorschbock 57% ** on an average the highest ABV

In [12]:
#grouping on the basis of beer name for review counts
beer_groupedby_beer_name=beer_data.groupby(beer_data['beer_name']).count()

In [13]:
beer_groupedby_beer_name.head(5)

Unnamed: 0_level_0,beer_ABV,beer_beerId,beer_brewerId,beer_style,review_appearance,review_palette,review_overall,review_taste,review_profileName,review_aroma,review_text,review_time
beer_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
"""100"" Pale Ale",1,1,1,1,1,1,1,1,1,1,1,1
"""33"" Export",3,3,3,3,3,3,3,3,3,3,3,3
"""76"" Anniversary Ale",3,3,3,3,3,3,3,3,3,3,3,3
"""76"" Anniversary Ale With English Hops",1,1,1,1,1,1,1,1,1,1,1,1
"""Fade To Black"" Porter",1,1,1,1,1,1,1,1,1,1,1,1


In [14]:
beer_groupedby_beer_name['review_appearance']

beer_name
"100" Pale Ale                                                                    1
"33" Export                                                                       3
"76" Anniversary Ale                                                              3
"76" Anniversary Ale With English Hops                                            1
"Fade To Black" Porter                                                            1
"Great Satchmo" Stout                                                             1
"Jessica & Brendan's" Bridal Ale                                                  6
"Just One More" Scotch Ale                                                        4
"Nein Toll Bier" German Alt                                                       1
"O Smokey Night"                                                                  1
"Old Yeltsin" Imperial Stout                                                     55
"Requisite" Imperialistic Stout                                   

Sierra Nevada Celebration Ale,Sierra Nevada Pale Ale,Founders Breakfast Stout
are most reviewd beer

In [15]:
mostreviewed_by_id=beer_data.groupby(beer_data['beer_brewerId']).count()

sort_by_reviewed_brewerid = mostreviewed_by_id.sort_values('review_appearance',ascending=False)


In [16]:
print(sort_by_reviewed_brewerid)

               beer_ABV  beer_beerId  beer_name  beer_style  \
beer_brewerId                                                 
35                39444        39444      39444       39444   
140               28751        28751      28751       28751   
132               24083        24083      24083       24083   
1199              20004        20004      20004       20004   
3818              15868        15868      15868       15868   
158               14935        14935      14935       14935   
22                13921        13921      13921       13921   
192               13410        13410      13410       13410   
392               12248        12248      12248       12248   
694               11842        11842      11842       11842   
68                11697        11697      11697       11697   
590               11172        11172      11172       11172   
73                10943        10943      10943       10943   
113               10292        10292      10292       1

**35,140,132** are the most reviewed breweries

In [17]:
beer_groupedby_beer_style=beer_data.groupby(beer_data['beer_style']).count()

sort_by_most_reviewed_style = beer_groupedby_beer_style.sort_values('review_appearance',ascending=False)

In [18]:
sort_by_most_reviewed_style.head(5)


Unnamed: 0_level_0,beer_ABV,beer_beerId,beer_brewerId,beer_name,review_appearance,review_palette,review_overall,review_taste,review_profileName,review_aroma,review_text,review_time
beer_style,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
American IPA,43369,43369,43369,43369,43369,43369,43369,43369,43369,43369,43369,43369
American Double / Imperial IPA,26106,26106,26106,26106,26106,26106,26106,26106,26106,26106,26106,26106
American Double / Imperial Stout,23354,23354,23354,23354,23354,23354,23354,23354,23354,23354,23354,23354
American Pale Ale (APA),20520,20520,20520,20520,20520,20520,20520,20520,20520,20520,20520,20520
American Amber / Red Ale,18731,18731,18731,18731,18731,18731,18731,18731,18731,18731,18731,18731


**American IPA, American Double / Imperial IPA,American Double / Imperial Stout ** are most reviewed styles 

# **Q1. Rank top 3 Breweries which produce the strongest beers?**

In [19]:
strongest_breweries = beer_data.sort_values('beer_ABV',ascending=False)
strongest_breweries['beer_brewerId'].unique()

array([ 6513,    35, 16866, ..., 22698, 21983, 24676])

Again brewer with brewerid **6513** is on top with producing strongest beer **Schorschbräu Schorschbock 57%**	of beer style **Eisbock** with beer ABV content of **57.7%**.

Followed by **brewerid:35** for** Samuel Adams Utopias**  of style **American Strong Ale**

Followed by **16866**

# **Q2. Which year did beers enjoy the highest ratings?**

In [20]:
beer_data['year'] = pd.DatetimeIndex(beer_data['review_time']).year
beer_data.head()

Unnamed: 0,beer_ABV,beer_beerId,beer_brewerId,beer_name,beer_style,review_appearance,review_palette,review_overall,review_taste,review_profileName,review_aroma,review_text,review_time,year
0,5.0,47986,10325,Sausa Weizen,Hefeweizen,2.5,2.0,1.5,1.5,stcules,1.5,A lot of foam. But a lot. In the smell some ba...,2009-02-16 20:57:03,2009
1,6.2,48213,10325,Red Moon,English Strong Ale,3.0,2.5,3.0,3.0,stcules,3.0,"Dark red color, light beige foam, average. In ...",2009-03-01 13:44:57,2009
2,6.5,48215,10325,Black Horse Black Beer,Foreign / Export Stout,3.0,2.5,3.0,3.0,stcules,3.0,"Almost totally black. Beige foam, quite compac...",2009-03-01 14:10:04,2009
3,5.0,47969,10325,Sausa Pils,German Pilsener,3.5,3.0,3.0,2.5,stcules,3.0,"Golden yellow color. White, compact foam, quit...",2009-02-15 19:12:25,2009
4,7.7,64883,1075,Cauldron DIPA,American Double / Imperial IPA,4.0,4.5,4.0,4.0,johnmichaelsen,4.5,"According to the website, the style for the Ca...",2010-12-30 18:53:26,2010


In [21]:
avg_overall_by_year=beer_data.groupby(['year']).mean()
avg_overall_by_year.head(5)

Unnamed: 0_level_0,beer_ABV,beer_beerId,beer_brewerId,review_appearance,review_palette,review_overall,review_taste,review_aroma
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1998,7.065622,881.26087,481.478261,3.369565,3.565217,3.891304,3.695652,3.956522
1999,6.946465,872.04,419.36,3.62,3.76,4.0,3.82,3.96
2000,5.99704,674.30303,1122.272727,3.909091,3.939394,4.181818,3.984848,4.19697
2001,6.131655,2460.234219,420.167774,3.879568,3.699336,3.927741,3.768272,3.922757
2002,6.137907,2940.856483,496.237304,3.799894,3.666469,3.798905,3.684145,3.761311


In [22]:
avg_overall_by_year_sorted = avg_overall_by_year.sort_values('review_overall',ascending=False)

avg_overall_by_year_sorted

Unnamed: 0_level_0,beer_ABV,beer_beerId,beer_brewerId,review_appearance,review_palette,review_overall,review_taste,review_aroma
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2000,5.99704,674.30303,1122.272727,3.909091,3.939394,4.181818,3.984848,4.19697
1999,6.946465,872.04,419.36,3.62,3.76,4.0,3.82,3.96
2001,6.131655,2460.234219,420.167774,3.879568,3.699336,3.927741,3.768272,3.922757
1998,7.065622,881.26087,481.478261,3.369565,3.565217,3.891304,3.695652,3.956522
2010,7.246612,29473.009434,3351.398252,3.897788,3.798502,3.866139,3.808075,3.849259
2009,7.164053,23027.01498,2431.437053,3.893453,3.791261,3.86439,3.79966,3.849518
2008,6.953825,17386.410539,1829.006355,3.8565,3.755255,3.833939,3.760227,3.821613
2005,6.662427,8762.6014,974.799103,3.845938,3.737387,3.832042,3.750875,3.807903
2012,7.534651,40113.407547,5334.993711,3.896226,3.79717,3.829717,3.795283,3.837579
2011,7.331372,36387.152541,4749.639738,3.891231,3.790176,3.828093,3.786184,3.827497


**Year 2000 enjoyed the highest overall rating  on an  average.**

Lets further explore the year wise ratings

In [23]:
overall_no_of_reviews=beer_data.groupby(['year']).count()
overall_no_of_reviews.sort_values('review_overall',ascending=False)

Unnamed: 0_level_0,beer_ABV,beer_beerId,beer_brewerId,beer_name,beer_style,review_appearance,review_palette,review_overall,review_taste,review_profileName,review_aroma,review_text,review_time
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2011,110836,110836,110836,110836,110836,110836,110836,110836,110836,110836,110836,110836,110836
2010,93810,93810,93810,93810,93810,93810,93810,93810,93810,93810,93810,93810,93810
2009,83578,83578,83578,83578,83578,83578,83578,83578,83578,83578,83578,83578,83578
2008,69080,69080,69080,69080,69080,69080,69080,69080,69080,69080,69080,69080,69080
2007,46514,46514,46514,46514,46514,46514,46514,46514,46514,46514,46514,46514,46514
2006,43083,43083,43083,43083,43083,43083,43083,43083,43083,43083,43083,43083,43083
2005,29433,29433,29433,29433,29433,29433,29433,29433,29433,29433,29433,29433,29433
2004,22905,22905,22905,22905,22905,22905,22905,22905,22905,22905,22905,22905,22905
2003,18187,18187,18187,18187,18187,18187,18187,18187,18187,18187,18187,18187,18187
2002,7581,7581,7581,7581,7581,7581,7581,7581,7581,7581,7581,7581,7581


In year 2011, beers recieved highest number of reviews/ratings.

# **Q3. Based on the user’s ratings which factors are important among taste,aroma, appearance, and palette?**

In [24]:
beer_data.shape

(528870, 14)

In [25]:
corr_beer = beer_data.corr()
corr_beer.style.background_gradient(cmap='coolwarm')

Unnamed: 0,beer_ABV,beer_beerId,beer_brewerId,review_appearance,review_palette,review_overall,review_taste,review_aroma,year
beer_ABV,1.0,0.213536,0.076368,0.246777,0.311951,0.116704,0.262634,0.265081,0.140895
beer_beerId,0.213536,1.0,0.462537,0.050345,0.061283,-0.010388,0.036456,0.015569,0.459316
beer_brewerId,0.076368,0.462537,1.0,-0.008476,0.013062,-0.016511,-0.005957,-0.013141,0.241397
review_appearance,0.246777,0.050345,-0.008476,1.0,0.547691,0.486687,0.554775,0.534244,0.054289
review_palette,0.311951,0.061283,0.013062,0.547691,1.0,0.601971,0.604271,0.706156,0.058763
review_overall,0.116704,-0.010388,-0.016511,0.486687,0.601971,1.0,0.692454,0.783002,0.02555
review_taste,0.262634,0.036456,-0.005957,0.554775,0.604271,0.692454,1.0,0.725273,0.05282
review_aroma,0.265081,0.015569,-0.013141,0.534244,0.706156,0.783002,0.725273,1.0,0.037958
year,0.140895,0.459316,0.241397,0.054289,0.058763,0.02555,0.05282,0.037958,1.0


with respect to review overall it seems **review palette,review taste, review aroma and review appreacrance **. **review aroma seems the most important feature**

All ratings are left skewed with mean value near 4.

From correlation heatmap, we can find out that all ratings are positively correlated to each others. rating_overall is having high correlation with rating_aroma.

Lets find out important factors for overall rating.



According to corrplot , important factors are in following order rating_aroma > rating_taste > rating_palette > rating_appearance.

Lets cross verify above result with Random Forest

With above graph, We can conclude that following is the order of important ratings among taste,aroma, appearance, and palette.

1. Aroma.
2. Taste.
3. Palette.
4. Appearance.

since we are performing linear regression we are keeping dfy as the value that needs to be predicted  which  is overall rating 

These is the predictor variables but they have a positive correlation with one another

# **Q4. If you were to recommend 3 beers to your friends based on this data which ones will you recommend?**

Q4.1 Lets first determine the user's sentiment from review text.

In [81]:
review_blob = [TextBlob(review) for review in beer_data['review_text']]
#add the sentiment metrics to the dataframe
beer_data['tb_Pol'] = [b.sentiment.polarity for b in review_blob]
beer_data['tb_Subj'] = [b.sentiment.subjectivity for b in review_blob]
#show dataframe
beer_data.head(3)

Unnamed: 0,beer_ABV,beer_beerId,beer_brewerId,beer_name,beer_style,review_appearance,review_palette,review_overall,review_taste,review_profileName,review_aroma,review_text,review_time,year,tb_Pol,tb_Subj
0,5.0,47986,10325,Sausa Weizen,Hefeweizen,2.5,2.0,1.5,1.5,stcules,1.5,A lot of foam. But a lot. In the smell some ba...,2009-02-16 20:57:03,2009,-0.090909,0.40625
1,6.2,48213,10325,Red Moon,English Strong Ale,3.0,2.5,3.0,3.0,stcules,3.0,"Dark red color, light beige foam, average. In ...",2009-03-01 13:44:57,2009,0.147436,0.487179
2,6.5,48215,10325,Black Horse Black Beer,Foreign / Export Stout,3.0,2.5,3.0,3.0,stcules,3.0,"Almost totally black. Beige foam, quite compac...",2009-03-01 14:10:04,2009,0.338333,0.693333


In [99]:
good_reviews=beer_data[beer_data['tb_Pol']>=1 & (beer_data['review_overall']>=5)]

good_reviews_sorted = good_reviews.sort_values('review_overall',ascending=False)

good_reviews_sorted.head(10)

Unnamed: 0,beer_ABV,beer_beerId,beer_brewerId,beer_name,beer_style,review_appearance,review_palette,review_overall,review_taste,review_profileName,review_aroma,review_text,review_time,year,tb_Pol,tb_Subj
286902,5.4,846,35,Samuel Adams Scotch Ale,Scotch Ale / Wee Heavy,5.0,5.0,5.0,4.0,John,5.0,One of the best scotch ales ever. Sorely missed.,2001-10-15 15:11:51,2001,1.0,0.3
374476,8.0,33,22,Maudite,Belgian Strong Dark Ale,5.0,5.0,5.0,5.0,John,5.0,Superb! Hail Unibroue!,2001-10-08 16:54:50,2001,1.0,1.0
391936,7.017442,42682,3,Abita Select Four Grain,American Pale Ale (APA),4.0,4.0,4.5,3.5,acrawf6,4.0,This is their select beer that came out in Apr...,2008-05-15 15:39:13,2008,0.285529,0.50164
414112,5.8,228,73,Great Lakes Dortmunder Gold,Dortmunder / Export Lager,3.5,4.0,4.5,4.5,micromaniac129,4.5,Deep golden color with a low foamy head but ma...,2010-10-10 12:06:24,2010,0.172727,0.518182
100986,8.5,56600,675,Adriaen Brouwer Dark Gold Ale,Belgian Strong Dark Ale,4.0,4.0,4.5,4.0,dstc,4.0,"Pours a deep, dark color with a ruby tint to i...",2010-09-23 12:19:51,2010,0.213636,0.477273
296744,5.8,20564,35,Samuel Adams Holiday Porter,American Porter,4.0,4.0,4.5,4.5,DIM,4.5,"a: This was dark brown, bordering on black. It...",2010-11-19 16:47:43,2010,0.14256,0.516071
414088,5.8,228,73,Great Lakes Dortmunder Gold,Dortmunder / Export Lager,4.0,4.0,4.5,4.5,Swedes21,4.0,12 fl oz served in a chilled glass Appearance ...,2010-12-08 19:26:52,2010,0.380741,0.644444
296742,5.8,20564,35,Samuel Adams Holiday Porter,American Porter,4.0,3.5,4.5,4.0,drtth,4.5,Poured into an imperial pint glass. Glass of c...,2010-11-20 02:08:08,2010,0.187667,0.554667
191418,7.0,76323,2743,Rayon Vert,Belgian Pale Ale,5.0,4.0,4.5,5.0,BB1313,4.0,Rayon Vert is bottle-conditioned. 12oz bottle ...,2011-12-29 19:57:27,2011,0.283383,0.541228
414106,5.8,228,73,Great Lakes Dortmunder Gold,Dortmunder / Export Lager,4.0,4.0,4.5,4.0,wmtxbb,4.0,Bottle poured into pint glass Appearance: Clea...,2010-11-01 17:14:03,2010,0.350157,0.599534


In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [100]:
good_reviews_sorted2 = good_reviews.sort_values('tb_Pol',ascending=False)

good_reviews_sorted2.head(10)

Unnamed: 0,beer_ABV,beer_beerId,beer_brewerId,beer_name,beer_style,review_appearance,review_palette,review_overall,review_taste,review_profileName,review_aroma,review_text,review_time,year,tb_Pol,tb_Subj
320187,4.3,3628,665,Kilkenny Irish Cream Ale,Irish Red Ale,4.5,4.0,4.5,4.0,BIGMURK,5.0,appearance: Copper smell: malt/cookie/slight c...,2008-07-03 22:38:43,2008,1.0,1.0
155816,11.0,40058,3818,Choklat,American Double / Imperial Stout,4.0,4.0,4.0,3.5,thequeen711,4.0,What a wonderful chocolaty stout! Nose is roma...,2008-09-22 03:58:23,2008,1.0,1.0
120840,4.5,832,302,Dundee Original Honey Brown Lager,American Amber / Red Lager,3.0,4.0,4.0,5.0,jetpilots1,5.0,This is an awesome session beer.,2001-07-06 10:35:04,2001,1.0,1.0
277079,8.7,1704,83,Hoegaarden Grand Cru,Belgian Strong Pale Ale,3.0,4.0,4.5,4.0,Clubchat,4.0,Dguste la Taverne Irlandaise Le Trfle de Trois...,2008-11-21 20:49:43,2008,1.0,1.0
339934,9.2,1696,207,Trappistes Rochefort 8,Dubbel,4.0,3.0,4.0,4.0,Clubchat,4.5,Dguste la Tarverne Irlandaise Le Trfle de Troi...,2008-11-21 20:34:33,2008,1.0,1.0
406291,4.4,3677,694,Tröegs Rugged Trail Nut Brown Ale,English Brown Ale,4.0,4.0,4.5,4.0,Jarod,4.0,From the bottle it pours at a bronze-ish color...,2002-11-19 09:04:49,2002,1.0,1.0
340143,9.2,1696,207,Trappistes Rochefort 8,Dubbel,4.5,4.5,4.5,4.5,AlexF,5.0,Bouteille de 33 cl. Bire brune opaque offrant ...,2008-01-03 23:57:39,2008,1.0,1.0
74188,5.8,283,140,Sierra Nevada Stout,American Stout,5.0,4.0,4.5,5.0,Jason,5.0,"One of my all time favourite stouts, I could d...",2002-03-04 18:05:51,2002,1.0,1.0
374476,8.0,33,22,Maudite,Belgian Strong Dark Ale,5.0,5.0,5.0,5.0,John,5.0,Superb! Hail Unibroue!,2001-10-08 16:54:50,2001,1.0,1.0
286902,5.4,846,35,Samuel Adams Scotch Ale,Scotch Ale / Wee Heavy,5.0,5.0,5.0,4.0,John,5.0,One of the best scotch ales ever. Sorely missed.,2001-10-15 15:11:51,2001,1.0,0.3


sentiment analysis conducted  with the help of **TextBlob** tells us that **Samuel Adams Scotch Ale,Maudite,Abita Select Four Grain** are the good  beers deteched due to almost near or positive polarity

# **Q5. Which Beer style seems to be the favorite based on Reviews written by users? How does written review compare to overall review score for the beer style?**

Beer Style **Scotch Ale / Wee Heavy	** is favorite based on reviews written by users.

Overall rating is positively correlated with the written review sentiment.

Over all rating is left skewed while sentiment score is centered near .3.
