### Beer Reviews Dataset

Please download the **[beer reviews](https://s3.amazonaws.com/demo-datasets/beer_reviews.tar.gz) dataset.**

(If unable to download, use the truncated dataset [here](truncated_beer_reviews.csv). Skip cells 2, 3, and 4.)

# Recommending Beers: A Content-Based Recommendation System

### Objective

**To recommend beers based on similarity of user profile to item profile (content-based recommender system)**

### Steps

  1. Map items and users into a feature space
  1. Predicting ratings (or likes/dislikes) given the features
  
For this example, predictions of recommended or non-recommended items are calculated using dot product and linear regression.
  
**Item Profile**
![](figs/itemprofile.png)

**User Profile**
![](figs/userprofile.png)

**Mapping User Profile (`u`) and Item Profile (`i`) in Feature Space**

![](figs/map.png)

***In this example, instead of using cosine similarity, we will be using dot product.***

![](figs/dotproduct.png)

 (For cos similarity: Values from -1 to 1 with positive values meaning vectors are closer and negative values meaning vectors are farther or more different from each other.
 
 For dot product, the more positive it is, the more similar or closer the vectors, the more negative, the more different the vectors.)
 
***An alternate way of prediction shown here is by linear regression.***

### Loading code and data 

In [None]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from patsy import dmatrix
import seaborn.apionly as sb
from matplotlib import pyplot as plt
%matplotlib inline

In [None]:
data = pd.read_csv("../recommendersystems_wids_prep/beer_reviews/beer_reviews.csv")

In [None]:
data.info()

In [None]:
print "We have {} reviews for {} beers from {} breweries, by {} drinkers.".format(len(data), 
                                                                                 data.beer_beerid.nunique(), 
                                                                                 data.brewery_id.nunique(), 
                                                                                 data.review_profilename.nunique())

**Truncate data to speed up calculations:**

In [None]:
N = 150000
data = data.iloc[:N]
print "We have {} reviews for {} beers from {} breweries, by {} drinkers.".format(len(data), 
                                                                                  data.beer_beerid.nunique(), 
                                                                                  data.brewery_id.nunique(), 
                                                                                  data.review_profilename.nunique())

In [None]:
data.head(3)

#### Data exploration

Let's see per features how many categories we have.

(Load truncated_beer_reviews.csv here if not able to download the original dataset.)

In [None]:
data.info()

==> missing 'review_profilename' and 'beer_abv'

In [None]:
for col in data:
    print "{:20s}: {:7} uniques".format(col, data[col].nunique())

**USERS**

`review_profilename` (there are 13964 users or reviewers)

**ITEMS**

`beer_beerid`  (there are 6420 beers reviewed)

**FEATURES**

- `brewery_id` (or `brewery_name`)
- `beer_abv` (% alcohol)
- `beer_style`

**RATINGS**

- `review_overall` (10 uniques -- rating goes from 1 to 5 (ratings are 0, 1, 1.5, 2, 2.5, etc., see beer rating distribution below.)

*Other ratings can be used:*
- `review_aroma`
- `review_appearance`
- `review_palate`
- `review_taste`

Since the beer names are not unique, we will use the beer IDs. 

**Create lookup table from `beer_beerid` to `beer_name` (to easily track the beer's name):**

In [None]:
beer_names = data.groupby('beer_beerid').beer_name.first()  # only one name per ID

In [None]:
beer_names.head()

**Distribution of beer rating:**

In [None]:
sb.countplot(data.review_overall, color='steelblue')
sb.despine();

Also note that the beer-drinker reviews are not unique, as some people have filed multiple reviews per beer.  We generally take the average rating in such case.

In [None]:
n_reviews = data.groupby(['beer_beerid', 'review_profilename']).review_overall.nunique()
n_reviews.value_counts()

### Create feature matrix (or item profile)

==> `beer_features` matrix

![](figs/beerprofile.png)

#### `beer_abv`

We will categorize the `beer_abv` feature into bins.

Currently, there are 177 unique values for `beer_abv`:

In [None]:
data.beer_abv.nunique()

To categorize the `beer_abv` values:

1. round off to whole values
2. bin values lower or equal to 4 to 4 and values greater than or equal to 10 to 10
3. leave values between 4 and 10 as is
4. convert values into strings (to prepare for patsy)

In [None]:
min_bin, max_bin = 4, 10
abv = data.beer_abv.round()

In [None]:
abv[abv <= min_bin] = min_bin

In [None]:
abv[abv >= max_bin] = max_bin

In [None]:
abv.unique()

**Distribution before binning:**

In [None]:
sb.countplot(data.beer_abv, color='blue')
sb.despine();

**Distrubution after binning:**

In [None]:
sb.countplot(abv, color='blue')
sb.despine();

We convert numerical values to strings, so patsy will treat them as categorical features

In [None]:
data['beer_abv_cat'] = abv.dropna().astype(int).astype(str)

**`brewery_id`**

In [None]:
data['brewery_id_str'] = data.brewery_id.astype(str)

**Use `dmatrix` to create a design matrix:**

To learn more about patsy.dmatrix, see http://patsy.readthedocs.io/en/latest/

In [None]:
X_abv_brew = dmatrix('beer_abv_cat + brewery_id_str', data=data.fillna(0), return_type='dataframe')

In [None]:
X_abv_brew.shape

In [None]:
X_abv_brew.head()

**`beer_style`**

Let's use each word in the beer style as a feature as well (e.g., "IPA") using the 'bag-of-words' representation.

In [None]:
max_features = 5000
cv = CountVectorizer(max_features=max_features)
X_style = cv.fit_transform(data.beer_style)

In [None]:
X_style

At this point we have:
- `beer_abv_cat` (7 features, in `X_abv_brew`)
- `beer_beerid_cat` (589 features, in `X_abv_brew`)
- `beer_style` (120 features, in `X_style` matrix)

==> a total of 716 features!

##### Putting all the features together to create the feature matrix (item profile)

In [None]:
X = np.hstack([X_abv_brew, X_style.toarray()])
y = data.review_overall
n_samples, n_features = X.shape

**Now, we have a feature matrix $X$ with ratings in $y$, containing both beers and drinkers.**

Since the features only describe beer characteristics, each beer entry in $X$ has the same values for the same beer. So we can just take a beer's first occurence for its representation.

In [None]:
uniques, idx = np.unique(data.beer_beerid, return_index=True)
beer_features = pd.DataFrame(X[idx, :], index=data.beer_beerid[idx])
print beer_features.shape
beer_features.head(2)

(Recall that we 6420 unique beers, see data exploration part above.)

**`beer_features` matrix is a sparse matrix composed of 1's and 0's with 6420 items and 716 features**

### Create user profile

==> `reviewer_features`

![](figs/beerreviewerprofile.png)

Note that the features for this matrix should be the same as the feature matrix.

**Steps:**

- Use feature matrix `X` created above and broadcast the reviewer ratings into it.

For example,

            low-abv  high-abv  IPA  Stout  Pilsner rating  reviewer
    beer 1     1        0       0     0       1     2.0    reviewer1
    beer 2     0        1       1     0       0     4.5    reviewer2
    
...will become...

            low-abv  high-abv  IPA  Stout  Pilsner  reviewer
    beer 1    2.0       0       0     0      2.0       reviewer1
    beer 2     0       4.5     4.5    0       0        reviewer2
    
- Average all ratings per user to get a user profile. Also, subtract 3 from each rating, so bad ratings are negative and good ratings are positive This is to compensate for all missing entries, which will automatically get a zero-rating (and are now rated as average, instead of terribly bad).

In [None]:
reviewer_features = pd.DataFrame(X * (data.review_overall.values - 3).reshape(n_samples, 1))

In [None]:
reviewer_features.head()

In [None]:
reviewer_features['review_profilename'] = data.review_profilename

In [None]:
reviewer_features.head()

In [None]:
reviewer_features = reviewer_features.groupby('review_profilename')[range(n_features)].mean()

In [None]:
reviewer_features = reviewer_features.divide(reviewer_features.sum(axis=1), axis=0)

In [None]:
reviewer_features.head()

Now that we have `beer_features` and `reviewer_features`, we can proceed to calculating the distance or similarity between two vectors using dot product method.

![](figs/matrices.png)

### Predict recommendation by dot product

Let's pick an arbitrary user.

In [None]:
user = 'WesWes'

Compute user's vector and all beer vectors, and take dot product.

In [None]:
v = reviewer_features.loc[user].values

In [None]:
M = beer_features.values

In [None]:
pred = M.dot(v)

In [None]:
pred

In [None]:
pred = pd.Series(pred, index=beer_features.index, name="predictions").sort_values(ascending=False, inplace=False)

Change the beer id (the index) with beer names:

In [None]:
pred_name = pd.Series(pred.values, beer_names[pred.index], name=pred.name)

In [None]:
print "Top recommendations for {}:".format(user)
print pred_name.head()

In [None]:
print "Bottom  recommendations (don't drink these, {}!)".format(user)
print pred_name.tail()

How do they compare with his actual reviews?  (Note that we take the mean since multiple reviews per drinker-beer pair occur.)

In [None]:
user_reviews = data[data.review_profilename == user].groupby('beer_beerid') \
                                                    .review_overall.mean() \
                                                    .sort_values(ascending=False, inplace=False)

In [None]:
user_reviews_name = pd.Series(user_reviews.values, index=beer_names[user_reviews.index])

In [None]:
print "Top reviewed by {}:".format(user)
print user_reviews_name.head()

In [None]:
print "Bottom reviewed by {}:".format(user)
print user_reviews_name.tail()

Let's compare them all.

In [None]:
f = sb.regplot(user_reviews, pred[list(user_reviews.index)], scatter_kws=dict(alpha=.4))

Not bad. Note that we completely ignored overfitting and cross validation for readability's sake.

### Alternatively, predict recommendations using linear regression

(Here, the ridge regression is used.)

Given this feature matrix and user reviews, we could also use a simple linear regression to predict a review for a user.  We will limit our dataset to the reviews of one user only, and then feed those into the model.

In [None]:
print "Filtering reviews by {}:".format(user)
idx = (data.review_profilename == user).values  # filter reviews of user
X_user, y_user = X[idx, :], y[idx]
print X_user.shape, y_user.shape

In [None]:
model = Ridge()
model.fit(X_user, y_user)
print cross_val_score(model, X_user, y_user, scoring='neg_mean_absolute_error')
print cross_val_score(model, X_user, y_user, scoring='r2')

Pretty bad cross-validation scores, but mind you we have only a handful reviews.

In [None]:
uniques, idx = np.unique(data.beer_beerid, return_index=True)  # find beer labels
pred = pd.Series(model.predict(X[idx, :]), index=data.beer_beerid[idx], name="predictions") \
    .sort_values(ascending=False, inplace=False)
pred_name = pd.Series(pred.values, beer_names[pred.index], name="predictions")
print "Top recommendations for {}.".format(user)
print pred_name[:5]

Let's see how those relate to his actual reviews.

In [None]:
f = sb.regplot(user_reviews, pred[list(user_reviews.index)], scatter_kws=dict(alpha=.4))

Pretty spectacular. 

Just for the fun of it, how will this user rate a random beer?

In [None]:
beer = np.random.choice(beer_names.index)
beer_idx = (data.beer_beerid == beer).values
X_beer = X[beer_idx, :][0]  # just take the first entry
print "{} will give beer {} a rating of {:.1f}.".format(user, beer_names[beer], model.predict(X_beer.reshape(1,-1))[0])