# Recommendation system - Craft Beers
## Based on scraped data 

This jupyter notebook will show how to create a simple recommendation system based on data scraped from Untappd.com, an online app that allows you to discover and rate craft beers. In this notebook I will use 2 different recommendation systems. The first is based on a weighted average of ratings per each beer. The user will input the type of the beer they like the most and the system will return the top 10 beers with the highest rating for that type. Although the rating onn the app is already weighted based on the number of reviewers

In [1]:
import pandas as pd
import numpy as np
import time
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

In [2]:
beers = pd.read_csv('All Beers.csv', names = ['Beer ID','Brewery', 'Beer name', 'Beer Style', 'Beer Description', 'ABV', 'IBU', 'No Raters', 'Rating', 'Date Added'])

In [3]:
brew = pd.read_csv('Breweries.csv',names = ['ID','Brewery', 'Country', 'N Beers', 'No Raters', 'Rating', 'url', 'img_url', 'Brewery type', 'Address'])

In [4]:
brew = brew.drop(columns = ['ID'])

In [5]:
brew.head()

Unnamed: 0,Brewery,Country,N Beers,No Raters,Rating,url,img_url,Brewery type,Address
0,Juguetes Perdidos,Argentina,38,1119,3.961,https://untappd.com/JuguetesPerdidos,https://untappd.akamaized.net/site/brewery_log...,Micro Brewery,"Caseros, Buenos Aires, Buenos Aires Argentina"
1,Manush,Argentina,30,1089,3.614,https://untappd.com/w/manush/96147,https://untappd.akamaized.net/site/brewery_log...,Brew Pub,"San Carlos de Bariloche, Río Negro Argentina"
2,La Zorra,Argentina,26,1008,3.549,https://untappd.com/w/la-zorra/186088,https://untappd.akamaized.net/site/brewery_log...,Micro Brewery,Argentina
3,Cervecería Antares,Argentina,69,9740,3.442,https://untappd.com/CerveceraAntares,https://untappd.akamaized.net/site/brewery_log...,Micro Brewery,"Mar del Plata, Buenos Aires Argentina"
4,Berlina Patagonia Brewery,Argentina,53,2425,3.43,https://untappd.com/w/berlina-patagonia-brewer...,https://untappd.akamaized.net/site/brewery_log...,Micro Brewery,"San Carlos de Bariloche, Río Negro Argentina"


In [6]:
# remove whitespaces at the beginning and the end of string 
beers['Brewery'] = beers['Brewery'].str.replace('^ ','')

In [7]:
beers['Brewery'] = beers['Brewery'].str.replace(' $','')

In [8]:
brew['Brewery'] = brew['Brewery'].str.replace('^ ','')
brew['Brewery'] = brew['Brewery'].str.replace(' $','')

In [10]:
beers1 = pd.merge(beers, brew, on = 'Brewery', how = 'left')

In [15]:
beers1.head(1)

Unnamed: 0,Beer ID,Brewery,Beer name,Beer Style,Beer Description,ABV,IBU,No Raters_x,Rating_x,Date Added,Country,N Beers,No Raters_y,Brewery type,Address
0,1649744,Juguetes Perdidos,Jamaica Dubbel,Belgian Dubbel,Clásica Dubbel de Abadía con adición de granos...,7,19,72,4.1,07/17/16,Argentina,38,1119,Micro Brewery,"Caseros, Buenos Aires, Buenos Aires Argentina"


In [14]:
beers1 = beers1.drop(columns = ['Rating_y','url','img_url'])

In [4]:
beers['Rating'] = beers['Rating'].dropna()

In [5]:
beers['Rating'] = beers['Rating'].str.replace('N/A','0')

In [6]:
beers['Rating'] = beers['Rating'].str.replace('  N/A ','0')

In [7]:
beers['Rating'] = beers['Rating'].astype(float)

In [8]:
beers['No Raters'] = beers['No Raters'].str.replace(' Rating','0')

In [9]:
beers['No Raters'] = beers['No Raters'].str.replace(',','')

In [10]:
beers['No Raters'] = beers['No Raters'].astype(int)

In [22]:
beers['ABV'] = beers['ABV'].str.replace('N/A','')
beers['ABV'] = beers['ABV'].str.replace(' ','')
beers['ABV'] = beers['ABV'].dropna()
beers['ABV'] = pd.to_numeric(beers['ABV'])

In [11]:
beers = beers[beers['No Raters']>=20]

In [12]:
# calculate C first
C = beers['Rating'].mean()
print('The mean review across all restaurants is ', str(C)[0:5])

The mean review across all restaurants is  3.574


In [13]:
# caclulate m 
# What is the minimum number of review a restaurant need to have to be included in this chart
m = beers['No Raters'].quantile(0.50)
print('The minimum number of raters to be listed is',m)

The minimum number of reviews required to be listed is 203.0


In [24]:
# get restaurants that have at leat m reviews
SR_data = beers.copy().loc[beers['No Raters'] >= m]
print(str(SR_data.shape[0]) + ' beers can be included in the chart')

16957 beers can be included in the chart


In [25]:
# create a function that calculate the weighted review for each beer
def weighted_review(x, m=m, C=C):
    # v is the number of reveiws of a particular restaurant
    v = x['No Raters']
    # R is the average rating 
    R = x['Rating']
    # weighted rating
    WR = (v/(v+m) * R) + (m/(m+v) * C)
    # return weighted rating
    return WR

In [26]:
# create a new column of dataframe called 'score' where to store this value 
SR_data['weighted_score'] = SR_data.apply(weighted_review, axis=1)

In [27]:
SR_data.head()

Unnamed: 0,Beer ID,Brewery,Beer name,Beer Style,Beer Description,ABV,IBU,No Raters,Rating,Date Added,weighted_score
0,1028555,La Zorra,IPA,IPA - American,Read Less,6.8,45,219,3.7,03/29/15,3.639593
0,12382,Cervecería Antares,Barleywine,Barleywine - American,Nuestra cerveza de mayor graduación alcohólica...,10.0,50,1287,3.55,12/02/10,3.553328
1,12375,Cervecería Antares,Scotch,Scotch Ale / Wee Heavy,Read Less,6.0,18,1231,3.33,12/02/10,3.364601
2,12372,Cervecería Antares,Kölsch,Kölsch,Read Less,5.0,22,1236,3.25,12/02/10,3.295767
3,12383,Cervecería Antares,Imperial Stout,Stout - Imperial / Double,Catalina la Grande amaba las emociones fuertes...,8.5,36,1183,3.59,12/02/10,3.587719


In [33]:
# filter beers based on type and ABV and then tell me the best 15 according to my score

# input type you want to select 
b_type = str(input('Insert beer type or all types: '))
# input price range 
abv = str(input('Insert alcohol percentage : '))
abv = float(abv)
# if the price range is 'all'
if b_type == 'all':
    # only filter the beer 
    beer_data = SR_data[(SR_data['ABV'] >= abv)&(SR_data['ABV'] <= abv+1)]
else:
    # otherwise filter the city and price range
    beer_data = SR_data.loc[(SR_data['Beer Style'] == b_type) & (SR_data['ABV'] >= abv) &(SR_data['ABV'] <= abv+1),:]

# sort restaurant by score 
beer_data = beer_data.sort_values('weighted_score', ascending=False)

# show top 10 rated beers in that city and price range 
beer_data[['Brewery','Beer name','Beer Style','Beer Description','ABV','IBU','No Raters','Rating','weighted_score']].head(10)

Insert beer type or all types: Stout - Milk / Sweet
Insert alcohol percentage : 5


Unnamed: 0,Brewery,Beer name,Beer Style,Beer Description,ABV,IBU,No Raters,Rating,weighted_score
10,Tree House Brewing Company,That's What She Said,Stout - Milk / Sweet,"A sneakily complex beer, ""TWSS"" exhibits flavo...",5.6,28.0,16725,4.16,4.152978
5,BKS Artisan Ales,Holstein (Maple),Stout - Milk / Sweet,Read Less,5.64,,328,4.34,4.047322
0,Thirsty Crow,Vanilla Milk Stout,Stout - Milk / Sweet,Arguably Wagga’s most famous export since the ...,5.2,,1202,4.08,4.006952
2,Microbrasserie Vox Populi,Vox Stout (Milkshake),Stout - Milk / Sweet,"Stout au lactose, riche, crémeuse et vanillée ...",5.5,25.0,1658,4.01,3.962487
12,Wiper And True,Hard Shake Milk Stout,Stout - Milk / Sweet,Read Less,6.0,18.0,945,4.02,3.941209
0,Brewlok Craft & Classic Brewery,Baba Yaga,Stout - Milk / Sweet,Milk stout with the addition of large portions...,6.0,,1198,3.96,3.904131
8,8 Wired Brewing Co.,Flat White,Stout - Milk / Sweet,"A rather traditional milk stout, taken up a no...",5.5,,3805,3.92,3.902497
1,Ras L'Bock,Señor Cacao,Stout - Milk / Sweet,Deux péchés mignons rasemblés pour assouvir vo...,6.0,22.0,663,4.0,3.90024
0,Wiper And True,Milk Shake Stout,Stout - Milk / Sweet,A milk stout uses sugar made from cows milk to...,5.6,15.0,8287,3.89,3.882454
3,La Patrona,Sta. Tomasa,Stout - Milk / Sweet,Read Less,6.0,30.0,279,4.09,3.872859


In [37]:
# make a description column by adding the couisine style and the price range. 

# make cuisine style and price range columns strings
cols = ['Beer Style', 'ABV']
for col in cols: 
    beers[col] = beers[col].astype(str)

beers['description'] = beers['Beer Style'] + ' ' + beers['ABV']

In [43]:
beers = beers.reset_index()
beers = beers.drop(columns=['index'])

In [73]:
beers.head(7)

Unnamed: 0,Beer ID,Brewery,Beer name,Beer Style,Beer Description,ABV,IBU,No Raters,Rating,Date Added,description
0,1649744,Juguetes Perdidos,Jamaica Dubbel,Belgian Dubbel,Clásica Dubbel de Abadía con adición de granos...,7.0,19,72,4.1,07/17/16,Belgian Dubbel 7.0
1,1967868,Juguetes Perdidos,Saison Maracuyá,Saison / Farmhouse Ale,Maracuyá / Passion Fruit Infused Read Less,7.0,19,57,3.93,02/25/17,Saison / Farmhouse Ale 7.0
2,1809101,Juguetes Perdidos,Imperial Saison Chardonnay Barrel,Saison / Farmhouse Ale,"Spiced Saison, High Gravity, aged in white win...",11.7,25,60,3.76,11/05/16,Saison / Farmhouse Ale 11.7
3,1277658,Juguetes Perdidos,Good Bye Lenin! - Baltic Porter,Porter - Baltic,"Chocolate, café y un tenor alcohólico importan...",9.0,25,52,3.99,10/17/15,Porter - Baltic 9.0
4,1227273,Juguetes Perdidos,Hop de Lis - Belgian IPA,IPA - Belgian,American IPA fermentada con levadura Belga Rea...,6.6,66,47,4.25,09/06/15,IPA - Belgian 6.6
5,1677114,Juguetes Perdidos,Kill Your IPA,IPA - Imperial / Double,Imperial IPA de Sir Hopper exquisitamente blen...,9.5,90,52,4.09,08/05/16,IPA - Imperial / Double 9.5
6,2138002,Juguetes Perdidos,Monster Ale - Del 1 al 1000,IPA - Imperial / Double,Cerveza para celebrar los 1000 socios de la As...,12.3,1000,50,3.61,06/10/17,IPA - Imperial / Double 12.3


In [107]:
# reduce dataset
m = beers['No Raters'].quantile(0.95)
CR_data = beers.copy().loc[beers['No Raters'] >= m]
#CR_data = beers.copy()
#CR_data = CR_data[CR_data['No Raters']]
CR_data = CR_data.reset_index(drop=True)

# create matrix with descriptions
tfidf = TfidfVectorizer(stop_words='english')
CR_data['description']= CR_data['description'].fillna('')
tfidf_matrix = tfidf.fit_transform(CR_data['description'])

# calculate similarity 
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

In [111]:
def get_recommendations(name, b_type = 'all', cosine_sim=cosine_sim):
    b_type = b_type
    
    # reset indeces
    indices = pd.Series(CR_data.index, index=CR_data['Beer name']).drop_duplicates()
    
    # Get the index of the movie that matches the title
    idx = indices[name]

    # Get the pairwsie similarity scores of all restaurant with that restaurant
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort the restaurants based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    
    # Get the restaurants indices
    res_indices = [i[0] for i in sim_scores]
    
    # get name, city and description of restaurant 
    sim_res = CR_data[['Beer name','Beer Style','Brewery','description']].iloc[res_indices]
    
    # if city is set to all, 
    if b_type != 'all':
        # only show the ones from that city 
        r = sim_res.loc[sim_res['Beer Style'] == b_type, :].head(10)
    else:
        # else show all 
        r = CR_data[['Beer name','Brewery','description']].iloc[res_indices].head(10)

    # Return the top 10 most similar restaurants
    return r

In [112]:
# ger recommendations for beers similar to 
name = str(input('Insert the name of the beer: '))
b_type = str(input('Insert the beer style: '))

print('If you enjoyed ', name, 'and', b_type, 'kind of beers you could try:' )
get_recommendations(name, b_type=b_type)

Insert the name of the beer: XPA
Insert the beer style: all
If you enjoyed  XPA and all kind of beers you could try:


Unnamed: 0,Beer name,Brewery,description
16,XPA,Balter Brewing Company,Pale Ale - International 5.0
899,961 Beer Lebanese Pale Ale,Gravity Brewing Sal,Pale Ale - International 6.3
1134,Lorita Passionfruit Pale Ale,Amundsen Bryggeri,Pale Ale - International 4.7
1439,Shape Shifter,Dugges Bryggeri,Pale Ale - International 6.0
517,New World IPA,Northern Monk,IPA - International 6.2
537,Ramberget IPA,Buxton Brewery,IPA - International 7.2
787,Chieftain,Franciscan Well Brewery,IPA - International 5.5
822,O'Hara's Irish Pale Ale,O'Hara's Brewery (Carlow Brewing Company),IPA - International 5.2
876,Hitachino Nest Japanese Classic Ale,Kiuchi Brewery,IPA - International 7.0
961,Bird of Prey IPA,Het Uiltje,IPA - International 5.8
