# Wine Reviews Recommendation Systems

**Prepared by Elizabeth Webster**

*November 2022*

## Overview

Create a recommendation system for Wine Enthusiast's tasters using Surprise.

## Business Problem

This project is being prepared for a small winery in Walla Walla.  They are just starting out and currently only producing a few wines. Their wine maker wants to gain insight on how to generate wines that will be rated highly.

In this section of the project, I will create a recommendation system for Wine Enthusiast's tasters

## Dataset

The data that I am using comes from Wine Enthusiast and includes information on 130,000 different wines.  This information includes the description, variety, winery, country, taster name, etc.

# Data Understanding

In [16]:
import pandas as pd
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from surprise import Reader, Dataset
from surprise.model_selection import cross_validate
from surprise.prediction_algorithms import SVD
from surprise.prediction_algorithms import KNNWithMeans, KNNBasic, KNNBaseline
from surprise.model_selection import GridSearchCV
import warnings
warnings.filterwarnings('ignore')

In [2]:
df = pd.read_csv('Data/winemag-data-130k-v2.csv.zip', encoding='latin-1', index_col=0)

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 129971 entries, 0 to 129970
Data columns (total 13 columns):
 #   Column                 Non-Null Count   Dtype  
---  ------                 --------------   -----  
 0   country                129908 non-null  object 
 1   description            129971 non-null  object 
 2   designation            92506 non-null   object 
 3   points                 129971 non-null  int64  
 4   price                  120975 non-null  float64
 5   province               129908 non-null  object 
 6   region_1               108724 non-null  object 
 7   region_2               50511 non-null   object 
 8   taster_name            103727 non-null  object 
 9   taster_twitter_handle  98758 non-null   object 
 10  title                  129971 non-null  object 
 11  variety                129970 non-null  object 
 12  winery                 129971 non-null  object 
dtypes: float64(1), int64(1), object(11)
memory usage: 13.9+ MB


In [8]:
rec_df = df.loc[:, ('points', 'taster_name', 'variety')]
rec_df.head()

Unnamed: 0,points,taster_name,variety
0,87,Kerin OâKeefe,White Blend
1,87,Roger Voss,Portuguese Red
2,87,Paul Gregutt,Pinot Gris
3,87,Alexander Peartree,Riesling
4,87,Paul Gregutt,Pinot Noir


In [32]:
reader = Reader(rating_scale=(80,100))
data = Dataset.load_from_df(rec_df[['taster_name', 'variety', 'points']],reader)

In [33]:
dataset = data.build_full_trainset()
print('Number of users: ', dataset.n_users, '\n')
print('Number of items: ', dataset.n_items)

Number of users:  20 

Number of items:  708


In [34]:
# Perform a gridsearch with SVD
params = {'n_factors': [20, 50, 100],
         'reg_all': [0.02, 0.05, 0.1]}
g_s_svd = GridSearchCV(SVD,param_grid=params,n_jobs=-1)
g_s_svd.fit(data)

In [35]:
print(g_s_svd.best_score)
print(g_s_svd.best_params)

{'rmse': 2.7940448582805972, 'mae': 2.2520283451485734}
{'rmse': {'n_factors': 50, 'reg_all': 0.02}, 'mae': {'n_factors': 50, 'reg_all': 0.02}}


In [36]:
# cross validating with KNNBasic
knn_basic = KNNBasic(sim_options={'name':'pearson', 'user_based':True})
cv_knn_basic = cross_validate(knn_basic, data, n_jobs=-1)

In [37]:
for i in cv_knn_basic.items():
    print(i)
print('-----------------------')
print(np.mean(cv_knn_basic['test_rmse']))

('test_rmse', array([2.81076303, 2.82157005, 2.8279562 , 2.82793036, 2.82681394]))
('test_mae', array([2.27006689, 2.27505696, 2.28426054, 2.2797411 , 2.28109412]))
('fit_time', (8.557446002960205, 7.9456799030303955, 8.928771018981934, 8.28714919090271, 8.065242052078247))
('test_time', (44.363444805145264, 44.0202751159668, 44.713330030441284, 44.152297019958496, 44.80048394203186))
-----------------------
2.823006716046968


In [38]:
# cross validating with KNNBaseline
knn_baseline = KNNBaseline(sim_options={'name':'pearson', 'user_based':True})
cv_knn_baseline = cross_validate(knn_baseline,data)

Estimating biases using als...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson similarity matrix...
Done computing similarity matrix.


In [39]:
for i in cv_knn_baseline.items():
    print(i)

np.mean(cv_knn_baseline['test_rmse'])

('test_rmse', array([2.76787138, 2.7785566 , 2.75941271, 2.75563302, 2.78211822]))
('test_mae', array([2.21858844, 2.21818447, 2.2059972 , 2.2043417 , 2.22990973]))
('fit_time', (8.567604064941406, 8.741869688034058, 8.596271991729736, 8.580995082855225, 8.775190114974976))
('test_time', (42.5500853061676, 44.39785289764404, 42.733662128448486, 42.80648398399353, 43.22396278381348))


2.768718385883238

## Making Predictions

In [88]:
svd = SVD(n_factors= 50, reg_all=0.02)
svd.fit(dataset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7fd5f9fa83d0>

In [66]:
def predict_score(dataset, taster_name, variety):
    inner_uid = dataset.to_inner_uid(ruid=taster_name)
    inner_iid = dataset.to_inner_iid(riid=variety)
    estimated_score = svd.predict(inner_uid, inner_iid)[3]
    print(taster_name, 'would score', variety,':')
    return estimated_score

In [79]:
predict_score(dataset, 'Paul Gregutt', 'Chardonnay')

Paul Gregutt would score Chardonnay :


88.44713820775404

In [81]:
dataset.to_inner_iid(riid='Chardonnay')

10

In [86]:
svd.predict(10, 44)

Prediction(uid=10, iid=44, r_ui=None, est=88.44713820775404, details={'was_impossible': False})

In [68]:
rec_df['taster_name'].value_counts()

Roger Voss            25514
Michael Schachner     15134
Kerin OâKeefe       10776
Virginie Boone         9537
Paul Gregutt           9532
Matt Kettmann          6332
Joe Czerwinski         5147
Sean P. Sullivan       4966
Anna Lee C. Iijima     4415
Jim Gordon             4177
Anne KrebiehlÂ MW      3685
Lauren Buzzeo          1835
Susan Kostrzewa        1085
Mike DeSimone           514
Jeff Jenssen            491
Alexander Peartree      415
Carrie Dykes            139
Fiona Adams              27
Christina Pickard         6
Name: taster_name, dtype: int64

In [76]:
def predict_all_scores(dataset, variety):
    taster_list = ['Roger Voss', 'Michael Schachner', 'Kerin OâKeefe',
                   'Virginie Boone', 'Paul Gregutt', 'Matt Kettmann',
                   'Joe Czerwinski', 'Sean P. Sullivan', 'Anna Lee C. Iijima',
                   'Jim Gordon', 'Lauren Buzzeo','Susan Kostrzewa', 
                   'Mike DeSimone', 'Jeff Jenssen', 'Alexander Peartree', 
                   'Carrie Dykes', 'Fiona Adams', 'Christina Pickard']
    for taster in taster_list:
        inner_uid = dataset.to_inner_uid(ruid=taster)
        inner_iid = dataset.to_inner_iid(riid=variety)
        estimated_score = svd.predict(inner_uid, inner_iid)[3]
        print(taster,'scores',variety,':',estimated_score)

In [84]:
predict_all_scores(dataset, 'Pinot Noir')

Roger Voss scores Pinot Noir : 88.44713820775404
Michael Schachner scores Pinot Noir : 88.44713820775404
Kerin OâKeefe scores Pinot Noir : 88.44713820775404
Virginie Boone scores Pinot Noir : 88.44713820775404
Paul Gregutt scores Pinot Noir : 88.44713820775404
Matt Kettmann scores Pinot Noir : 88.44713820775404
Joe Czerwinski scores Pinot Noir : 88.44713820775404
Sean P. Sullivan scores Pinot Noir : 88.44713820775404
Anna Lee C. Iijima scores Pinot Noir : 88.44713820775404
Jim Gordon scores Pinot Noir : 88.44713820775404
Lauren Buzzeo scores Pinot Noir : 88.44713820775404
Susan Kostrzewa scores Pinot Noir : 88.44713820775404
Mike DeSimone scores Pinot Noir : 88.44713820775404
Jeff Jenssen scores Pinot Noir : 88.44713820775404
Alexander Peartree scores Pinot Noir : 88.44713820775404
Carrie Dykes scores Pinot Noir : 88.44713820775404
Fiona Adams scores Pinot Noir : 88.44713820775404
Christina Pickard scores Pinot Noir : 88.44713820775404
