## Building a Crowdsourced Recommendation System

In [1]:
# import module and upload file
import pandas as pd
import numpy as np
#from google.colab import files
#df = files.upload()

### High level description: 

The objective of this project is to create the building blocks of a crowdsourced recommendation system. This recommendation system should accept user inputs about desired attributes of a product and come up with 3 recommendations.

Obtain reviews of craft beer from beeradvocate.com. We would suggest using the following link, which shows the top 250 beers sorted by ratings: https://www.beeradvocate.com/beer/top-rated/

The nice feature of the above link is that it is a single-page listing of 250 top-rated beers (avoids the pagination feature, which we need in cases where listings go on for many pages). The way beeradvocate.com organizes reviews is that it provides about 25 reviews per page. The output file should have 3 columns: product_name, product_review, and user_rating.

### Task A. Extract about 5-6k reviews. 

In [2]:
beer = pd.read_csv('cleaned_beer.csv',index_col=0)
beer.head()

Unnamed: 0,beer,score,style,abv,rating,cleaned_review
0,Barrel-Aged Silhouette,99,Stout - Russian Imperial,11%,4.73,2019 vintage. Have had most of Lift Bridge's o...
1,Lou Pepe - Kriek,100,Lambic - Fruit,5%,5.0,Was surprised to be able to find a 2012 750ml ...
2,Barrel-Aged Sump Coffee Stout,100,Stout - American Imperial,10.5%,4.49,2018 vintage. Released 2 months ago. A: Pours ...
3,Double Dry Hopped Congress Street,100,IPA - New England,7.2%,4.46,Pours a hazy murky pale orange or orange with ...
4,Great,100,Barleywine - American,14%,4.46,2015 vintage drank on 7/8/2017 Look was syrupy...


In [3]:
#import module
import nltk
nltk.download('punkt')
nltk.download('stopwords')
from nltk.tokenize import RegexpTokenizer
from nltk.corpus import stopwords
stop = stopwords.words('english')

tokenizer = RegexpTokenizer(r'\w+')

#tokenize the review
beer['tokenized_review'] = beer['cleaned_review'].astype('str').apply(lambda x: tokenizer.tokenize(x))

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\chiay\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\chiay\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [4]:
beer_name = list(beer['beer'].unique())
len(beer_name)

250

In [5]:
len(beer)

5977

### Task B. Specify 3 attributes in a product.

Assume that a customer, who will be using this recommender system, has specified 3 attributes in a product. E.g., one website describes multiple attributes of beer:
https://www.dummies.com/food-drink/drinks/beer/beer-for-dummies-cheat-sheet/

- Aggressive (Boldly assertive aroma and/or taste) 
- Balanced: Malt and hops in similar proportions; equal representation of malt sweetness and hop bitterness in the flavor — especially at the finish
- Complex: Multidimensional; many flavors and sensations on the palate
- Crisp: Highly carbonated; effervescent
- Fruity: Flavors reminiscent of various fruits
- Hoppy: Herbal, earthy, spicy, or citric aromas and flavors of hops
- Malty: Grainy, caramel-like; can be sweet or dry
- Robust: Rich and full-bodied

A word frequency analysis of beer reviews may be a better way to find important attributes.

Assume that a customer has specified three attributes of the product as being important to him or her.

In [6]:
import nltk
nltk.download('punkt')
nltk.download('stopwords')
from nltk.corpus import stopwords
stop = stopwords.words('english')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\chiay\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\chiay\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [7]:
tokens = beer['cleaned_review'].astype('str').apply(nltk.word_tokenize)
from itertools import chain
words = list(chain(*tokens))

from collections import Counter
attribute = ['aggressive', 'balanced', 'complex', 'crisp', 'fruity', 'hoppy', 'malty', 'robust']
words = [word.lower() for word in words]
count_tuple = Counter(words)

for char, count in count_tuple.items():
    if char in attribute:
        print(char, count)
# balanced, complex, fruity

fruity 390
malty 154
complex 471
hoppy 213
balanced 658
crisp 241
robust 70
aggressive 79


### Task C. Similarity Analysis.

Perform a similarity analysis using cosine similarity (without word embeddings) with the 3 attributes specified by the customer and the reviews. From the output file, calculate the average similarity between each product and the preferred attributes.

For similarity analysis, use cosine similarity with bag of words. The script should accept as input a file with the product attributes, and calculate similarity scores (between 0 and 1) between these attributes and each review. That is, the output file should have 3 columns – product_name (for each product, the product_name will repeat as many times as there are reviews of the product), product_review and similarity_score. 

In [8]:
# three attributes specified by the customer and the reviews: balanced, complex, fruity
beer['tokenized_review'] = beer['tokenized_review'].apply(lambda x: [word.lower() for word in x if word not in stop])
beer.head()

Unnamed: 0,beer,score,style,abv,rating,cleaned_review,tokenized_review
0,Barrel-Aged Silhouette,99,Stout - Russian Imperial,11%,4.73,2019 vintage. Have had most of Lift Bridge's o...,"[2019, vintage, have, lift, bridge, offering, ..."
1,Lou Pepe - Kriek,100,Lambic - Fruit,5%,5.0,Was surprised to be able to find a 2012 750ml ...,"[was, surprised, able, find, 2012, 750ml, coul..."
2,Barrel-Aged Sump Coffee Stout,100,Stout - American Imperial,10.5%,4.49,2018 vintage. Released 2 months ago. A: Pours ...,"[2018, vintage, released, 2, months, ago, a, p..."
3,Double Dry Hopped Congress Street,100,IPA - New England,7.2%,4.46,Pours a hazy murky pale orange or orange with ...,"[pours, hazy, murky, pale, orange, orange, yel..."
4,Great,100,Barleywine - American,14%,4.46,2015 vintage drank on 7/8/2017 Look was syrupy...,"[2015, vintage, drank, 7, 8, 2017, look, syrup..."


In [9]:
beer.isna().sum()

beer                 0
score                0
style                0
abv                 24
rating               0
cleaned_review       3
tokenized_review     0
dtype: int64

In [10]:
# remove null value
beer.dropna(inplace=True)

In [11]:
#Calculate cosine similarity using Bag-of-Words

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
def cos(x):
    documents =[x, 'fruity complex balanced']
    count_vectorizer = CountVectorizer(stop_words='english')
    sparse_matrix = count_vectorizer.fit_transform(documents)
    doc_term_matrix = sparse_matrix.todense()
    df = pd.DataFrame(doc_term_matrix, columns=count_vectorizer.get_feature_names(), index=['x', 'y'])
    return cosine_similarity(df, df)[0,1]

result = beer[['beer', 'cleaned_review']]
result['cos_score'] = beer['cleaned_review'].map(cos)
result

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  result['cos_score'] = beer['cleaned_review'].map(cos)


Unnamed: 0,beer,cleaned_review,cos_score
0,Barrel-Aged Silhouette,2019 vintage. Have had most of Lift Bridge's o...,0.000000
1,Lou Pepe - Kriek,Was surprised to be able to find a 2012 750ml ...,0.000000
2,Barrel-Aged Sump Coffee Stout,2018 vintage. Released 2 months ago. A: Pours ...,0.000000
3,Double Dry Hopped Congress Street,Pours a hazy murky pale orange or orange with ...,0.046984
4,Great,2015 vintage drank on 7/8/2017 Look was syrupy...,0.000000
...,...,...,...
5972,Very Green,Great densely hopped IPA. Really intriguing to...,0.000000
5973,Foggier Window,L - Golden orange with murky body and less tha...,0.000000
5974,Pliny The Younger,500ml bottle into a tulip. Huge thanks to my f...,0.039014
5975,Fou' Foune,Clear and gold. Small but satisfactory white h...,0.000000


### Task D. For every review, perform a sentiment analysis. 

In [12]:
#! pip install vaderSentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

In [13]:
analyzer = SentimentIntensityAnalyzer()

def sentimentScore(x):
    return analyzer.polarity_scores(x)['compound']

result['Sentiment_Score'] = result['cleaned_review'].astype('str').apply(sentimentScore)
result[:5]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  result['Sentiment_Score'] = result['cleaned_review'].astype('str').apply(sentimentScore)


Unnamed: 0,beer,cleaned_review,cos_score,Sentiment_Score
0,Barrel-Aged Silhouette,2019 vintage. Have had most of Lift Bridge's o...,0.0,0.9924
1,Lou Pepe - Kriek,Was surprised to be able to find a 2012 750ml ...,0.0,0.9771
2,Barrel-Aged Sump Coffee Stout,2018 vintage. Released 2 months ago. A: Pours ...,0.0,0.9766
3,Double Dry Hopped Congress Street,Pours a hazy murky pale orange or orange with ...,0.046984,0.9861
4,Great,2015 vintage drank on 7/8/2017 Look was syrupy...,0.0,0.9487


### Task E. Recommend 3 products to the customer.

Assume an evaluation score for each beer = average similarity score + average sentiment score.

Now recommend 3 products to the customer. 

In [14]:
#Assume an evaluation score for each beer = average similarity score + average sentiment score. 
recommend1 = result.groupby(['beer'])[['cos_score','Sentiment_Score']].mean()
recommend1['evaluation_score'] = recommend1['cos_score'] + recommend1['Sentiment_Score']
recommend1

Unnamed: 0_level_0,cos_score,Sentiment_Score,evaluation_score
beer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A Deal With The Devil,0.013941,0.671513,0.685453
A Deal With The Devil - Double Oak-Aged,0.002304,0.810612,0.812917
Aaron,0.016805,0.711383,0.728188
Abner,0.026227,0.785871,0.812098
Abrasive Ale,0.026489,0.628767,0.655256
...,...,...,...
Westly,0.037191,0.707462,0.744654
Wide Awake It's Morning,0.020429,0.527721,0.548150
Zenne Y Frontera,0.054565,0.782942,0.837507
Zombie Dust,0.019477,0.692867,0.712344


In [15]:
recommend1.sort_values(by='evaluation_score', ascending=False, inplace=True)
recommend1[:5]

Unnamed: 0_level_0,cos_score,Sentiment_Score,evaluation_score
beer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Flora Plum,0.037117,0.889908,0.927026
Genealogy Of Morals - Bourbon Barrel-Aged,0.01693,0.896771,0.913701
Hunahpu's Imperial Stout - Laird's Apple Brandy Barrel,0.022262,0.836883,0.859145
Dorothy (Wine Barrel Aged),0.054924,0.789938,0.844862
Saison Bernice,0.041146,0.797133,0.83828


### Task F. Use word vectors to recommend

How would our recommendation change if we use word vectors (the spaCy package would be the easiest to use with pretrained word vectors) instead of plain vanilla bag-of-words cosine similarity? One way to analyze the difference would be to consider the % of reviews that mention a preferred attribute. E.g., if we recommend a product, what % of its reviews mention an attribute specified by the customer? Any difference across bag-of-words and word vector approaches? This article may be useful: https://medium.com/swlh/word-embeddings-versus-bag-of-words-the-curious-case-of-recommender-systems-6ac1604d4424?source=friends_link&sk=d746da9f094d1222a35519387afc6338

Note that the article doesn’t claim that bag-of-words will always be better than word embeddings for recommender systems. It lays out conditions under which it is likely to be the case. That is, depending on the attributes we use, we may or may not see the same effect. 

In [16]:
#!python -m spacy download en_core_web_md
import spacy
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
nlp = spacy.load('en_core_web_md') 

In [17]:
def spacy_cos(x):
    #spaCy uses word vectors for medium (md) and large (lg)
    text1 = x
    text2 = 'fruity complex balanced'
    
    #Calculates spaCy similarity between texts 1 and 2
    doc1 = nlp(text1)
    doc2 = nlp(text2)
    return doc1.similarity(doc2)

result['spacy_cos_score'] = beer['cleaned_review'].map(spacy_cos)
result

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  result['spacy_cos_score'] = beer['cleaned_review'].map(spacy_cos)


Unnamed: 0,beer,cleaned_review,cos_score,Sentiment_Score,spacy_cos_score
0,Barrel-Aged Silhouette,2019 vintage. Have had most of Lift Bridge's o...,0.000000,0.9924,0.566167
1,Lou Pepe - Kriek,Was surprised to be able to find a 2012 750ml ...,0.000000,0.9771,0.596417
2,Barrel-Aged Sump Coffee Stout,2018 vintage. Released 2 months ago. A: Pours ...,0.000000,0.9766,0.636524
3,Double Dry Hopped Congress Street,Pours a hazy murky pale orange or orange with ...,0.046984,0.9861,0.622003
4,Great,2015 vintage drank on 7/8/2017 Look was syrupy...,0.000000,0.9487,0.601244
...,...,...,...,...,...
5972,Very Green,Great densely hopped IPA. Really intriguing to...,0.000000,0.8641,0.583042
5973,Foggier Window,L - Golden orange with murky body and less tha...,0.000000,0.8374,0.608687
5974,Pliny The Younger,500ml bottle into a tulip. Huge thanks to my f...,0.039014,0.9987,0.573531
5975,Fou' Foune,Clear and gold. Small but satisfactory white h...,0.000000,0.9588,0.632907


In [18]:
#Recommend 3 products to the customer.
#Assume an evaluation score for each beer = average similarity score + average sentiment score. 
recommend2 = result.groupby(['beer'])[['spacy_cos_score','Sentiment_Score']].mean()
recommend2['evaluation_score'] = recommend2['spacy_cos_score'] + recommend2['Sentiment_Score']
recommend2

Unnamed: 0_level_0,spacy_cos_score,Sentiment_Score,evaluation_score
beer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A Deal With The Devil,0.582215,0.671513,1.253728
A Deal With The Devil - Double Oak-Aged,0.596159,0.810612,1.406772
Aaron,0.591487,0.711383,1.302870
Abner,0.590446,0.785871,1.376317
Abrasive Ale,0.575655,0.628767,1.204421
...,...,...,...
Westly,0.608842,0.707462,1.316304
Wide Awake It's Morning,0.580946,0.527721,1.108667
Zenne Y Frontera,0.581765,0.782942,1.364707
Zombie Dust,0.581891,0.692867,1.274757


In [19]:
recommend2.sort_values(by='evaluation_score', ascending=False, inplace=True)
recommend2[:5]

Unnamed: 0_level_0,spacy_cos_score,Sentiment_Score,evaluation_score
beer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Flora Plum,0.628026,0.889908,1.517935
Genealogy Of Morals - Bourbon Barrel-Aged,0.612697,0.896771,1.509468
Hunahpu's Imperial Stout - Laird's Apple Brandy Barrel,0.595648,0.836883,1.432531
All That Is And All That Ever Will Be,0.612824,0.811775,1.424599
JJJuliusss!,0.602487,0.808254,1.410742


In [21]:
for i in ['Flora Plum', 'Genealogy Of Morals - Bourbon Barrel-Aged',
          "Hunahpu's Imperial Stout - Laird's Apple Brandy Barrel"]:
    a = 0
    df = beer[beer['beer']==i]['tokenized_review']
    for j in ['fruity', 'balanced', 'complex']:
        for k in range(len(df)):
            if j in df.iloc[k]:
                a = a+1
        print('Beer:',i, 'Attribute:',j, 'Perc_of_reviews:',a/len(df))
    print()

Beer: Flora Plum Attribute: fruity Perc_of_reviews: 0.20833333333333334
Beer: Flora Plum Attribute: balanced Perc_of_reviews: 0.375
Beer: Flora Plum Attribute: complex Perc_of_reviews: 0.5

Beer: Genealogy Of Morals - Bourbon Barrel-Aged Attribute: fruity Perc_of_reviews: 0.041666666666666664
Beer: Genealogy Of Morals - Bourbon Barrel-Aged Attribute: balanced Perc_of_reviews: 0.16666666666666666
Beer: Genealogy Of Morals - Bourbon Barrel-Aged Attribute: complex Perc_of_reviews: 0.20833333333333334

Beer: Hunahpu's Imperial Stout - Laird's Apple Brandy Barrel Attribute: fruity Perc_of_reviews: 0.08333333333333333
Beer: Hunahpu's Imperial Stout - Laird's Apple Brandy Barrel Attribute: balanced Perc_of_reviews: 0.16666666666666666
Beer: Hunahpu's Imperial Stout - Laird's Apple Brandy Barrel Attribute: complex Perc_of_reviews: 0.3333333333333333



In [22]:
recom_beer_1 = ['Dorothy (Wine Barrel Aged)', 'Saison Bernice']
recom_beer_2 = ['All That Is And All That Ever Will Be', 'JJJuliusss!']
attributes = ['fruity', 'complex', 'balanced']

In [23]:
for i in recom_beer_1:
    a = 0
    df = beer[beer['beer']==i]['tokenized_review']
    for j in attributes:
        for k in range(len(df)):
            if j in df.iloc[k]:
                a = a+1
        print('Beer:',i, 'Attribute:',j, 'Perc_of_reviews:',a/len(df))
    print()

Beer: Dorothy (Wine Barrel Aged) Attribute: fruity Perc_of_reviews: 0.125
Beer: Dorothy (Wine Barrel Aged) Attribute: complex Perc_of_reviews: 0.4166666666666667
Beer: Dorothy (Wine Barrel Aged) Attribute: balanced Perc_of_reviews: 0.625

Beer: Saison Bernice Attribute: fruity Perc_of_reviews: 0.16666666666666666
Beer: Saison Bernice Attribute: complex Perc_of_reviews: 0.3333333333333333
Beer: Saison Bernice Attribute: balanced Perc_of_reviews: 0.4583333333333333



In [24]:
for i in recom_beer_2:
    a = 0
    df = beer[beer['beer']==i]['tokenized_review']
    for j in attributes:
        for k in range(len(df)):
            if j in df.iloc[k]:
                a = a+1
        print('Beer:',i, 'Attribute:',j, 'Perc_of_reviews:',a/len(df))
    print()

Beer: All That Is And All That Ever Will Be Attribute: fruity Perc_of_reviews: 0.08333333333333333
Beer: All That Is And All That Ever Will Be Attribute: complex Perc_of_reviews: 0.08333333333333333
Beer: All That Is And All That Ever Will Be Attribute: balanced Perc_of_reviews: 0.125

Beer: JJJuliusss! Attribute: fruity Perc_of_reviews: 0.08333333333333333
Beer: JJJuliusss! Attribute: complex Perc_of_reviews: 0.08333333333333333
Beer: JJJuliusss! Attribute: balanced Perc_of_reviews: 0.20833333333333334



**Conclusions:** 


Because bag-of-words and word vector approaches give us the same top three bears, we further look at top 5 and see difference in the fourth and fifth beer recommended. Bag-of-words recommends 'Dorothy (Wine Barrel Aged)' and 'Saison Bernice'. Word vector recommends 'All That Is And All That Ever Will Be' and 'JJJuliusss!'. 

Then we calculate percentage of reviews that mention a preferred attributes for such two beers under bag-of-words and word vector approaches. It shows that more reviews mention one or more attributes specified by the customer under bag-of-words approaches.

We can see bag-of-words approache gives us lower cosine similarity but higher review mention rate, while word vector approach gives us higher cosine similarity but lower review mention rate. This is because bag-of-words looks for an exact match of words. And we would feel comfortable recommending a product to customer when as many as reviews of such product mention a feature that a shopper considers important.

### Task G. Simply chose the 3 highest rated products.

How would our recommendations differ if we ignored the similarity and feature sentiment scores and simply chose the 3 highest rated products from the entire dataset? Would these products meet the requirements of the user looking for recommendations? Why or why not? Use the similarity and sentiment scores as well as overall ratings to think of this question.

Here is a sample web implementation of a recommender system based on the same principles (runningshoe4you.com). But in this assignment, we will not build this type of full automation.

In [25]:
df = beer.groupby(['beer'])[['rating']].mean()

combine = recommend1[['cos_score', 'evaluation_score']]
combine['Sentiment_Score'] = recommend1[['Sentiment_Score']]
combine[['spacy_cos_score', 'spacy_evaluation_score']] = recommend2[['spacy_cos_score', 'evaluation_score']]
combine['overall_rating'] = df

In [26]:
combine.sort_values(by='overall_rating', ascending=False)

Unnamed: 0_level_0,cos_score,evaluation_score,Sentiment_Score,spacy_cos_score,spacy_evaluation_score,overall_rating
beer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Chemtrailmix,0.022072,0.799685,0.777613,0.590003,1.367616,4.762083
Blessed,0.024476,0.800676,0.776200,0.596942,1.373142,4.751667
SR-71,0.006014,0.579177,0.573163,0.525364,1.098526,4.745833
Vanilla Bean Assassin,0.006804,0.797537,0.790733,0.577582,1.368315,4.736667
Ann,0.020369,0.701523,0.681154,0.552803,1.233957,4.736250
...,...,...,...,...,...,...
La Fosse,0.009678,0.637215,0.627537,0.592732,1.220269,4.278333
Fort Point Pale Ale - Mosaic Dry Hopped,0.042015,0.717882,0.675867,0.598913,1.274779,4.243333
Second Fiddle,0.015454,0.656083,0.640629,0.579161,1.219790,4.230000
Last Snow,0.006075,0.737137,0.731063,0.591708,1.322771,4.221667


In [27]:
for i in ['Chemtrailmix', 'Blessed', 'SR-71']:
    a = 0
    df = beer[beer['beer']==i]['tokenized_review']
    for j in attributes:
        for k in range(len(df)):
            if j in df.iloc[k]:
                a = a+1
        print('Beer:',i, 'Attribute:',j, 'Perc_of_reviews:',a/len(df))
    print()

Beer: Chemtrailmix Attribute: fruity Perc_of_reviews: 0.0
Beer: Chemtrailmix Attribute: complex Perc_of_reviews: 0.0
Beer: Chemtrailmix Attribute: balanced Perc_of_reviews: 0.20833333333333334

Beer: Blessed Attribute: fruity Perc_of_reviews: 0.0
Beer: Blessed Attribute: complex Perc_of_reviews: 0.16666666666666666
Beer: Blessed Attribute: balanced Perc_of_reviews: 0.3333333333333333

Beer: SR-71 Attribute: fruity Perc_of_reviews: 0.0
Beer: SR-71 Attribute: complex Perc_of_reviews: 0.0
Beer: SR-71 Attribute: balanced Perc_of_reviews: 0.041666666666666664



In [28]:
combine.loc[['Chemtrailmix', 'Blessed', 'SR-71','Flora Plum', 'Genealogy Of Morals - Bourbon Barrel-Aged',
             "Hunahpu's Imperial Stout - Laird's Apple Brandy Barrel"]]

Unnamed: 0_level_0,cos_score,evaluation_score,Sentiment_Score,spacy_cos_score,spacy_evaluation_score,overall_rating
beer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Chemtrailmix,0.022072,0.799685,0.777613,0.590003,1.367616,4.762083
Blessed,0.024476,0.800676,0.7762,0.596942,1.373142,4.751667
SR-71,0.006014,0.579177,0.573163,0.525364,1.098526,4.745833
Flora Plum,0.037117,0.927026,0.889908,0.628026,1.517935,4.548333
Genealogy Of Morals - Bourbon Barrel-Aged,0.01693,0.913701,0.896771,0.612697,1.509468,4.550417
Hunahpu's Imperial Stout - Laird's Apple Brandy Barrel,0.022262,0.859145,0.836883,0.595648,1.432531,4.52375


**Conclusions:** 

If we use highest rating to recommend product, then the top 3 beers would be 'Chemtrailmix', 'Blessed', and 'SR-71'. However, reviews of 'Chemtrailmix'don't mention 'fruity' and 'complex', reviews of 'Blessed'don't mention 'fruity', and reviews of 'SR-71'almost have nothing related to 'fruity', 'complex', and 'balanced'. So we can conlude these three products don't meet the attributes requirements of customers.

Based on cosine similarity and semtiment analysis, we would recommend 'Flora Plum', 'Genealogy Of Morals - Bourbon Barrel-Aged', and "Hunahpu's Imperial Stout - Laird's Apple Brandy Barrel" to customers who think 'fruity', 'complex', and 'balanced' are important features. Although these three beers have relatively lower overall rating, users discuss more about whether these beers are fruity, complex and/or balanced. And based on sentiment analysis, we can see that these three beers also receive more praise from users, which means they are more palatable to the markets.

Therefore, we would choose making recommendation to a customer by cosine similarity and semtiment analysis.