# Recommender for Amazon Books with Surprise recommender library
<p>In the following, we perform all available analysis methods as provided by the scikit extension <a href=http://surprise.readthedocs.io/en/stable/index.html>Surprise</a>.</p> 

## Initialisation


<p>We first download and initialise surprise from http://surpriselib.com/</p>

In [1]:
!pip install scikit-surprise



<p>Then we import all the libraries we use</p>

In [2]:
from collections import defaultdict
from surprise import Reader
from surprise import Dataset
from surprise import evaluate, print_perf

from surprise import SVD
from surprise import SlopeOne
from surprise import CoClustering

<p>Lastly, we read and split our data</p>

In [3]:
# path to dataset file
file_path = 'data.csv'

# As we're loading a custom dataset, we need to define a reader. In 
# our data, each line has the following format:
# 'user item rating timestamp', separated by '\t' characters.
reader = Reader(line_format='user item rating timestamp', sep=',')
data = Dataset.load_from_file(file_path, reader=reader)

# Split data into four folds
data.split(n_folds=4)

### Matrix Factorisation (SVD)
<p>This approach we have already looked at in another notebook. We run Surprise's <a href=http://surprise.readthedocs.io/en/stable/matrix_factorization.html#surprise.prediction_algorithms.matrix_factorization.SVD>implementation</a>, as this allows us to compare the approach (in general) to the other approaches in Surprise.</p>

In [4]:
# We'll define the algorithm.
algo = SVD()

# Evaluate performances of our algorithm on the dataset.
perf = evaluate(algo, data, measures=['RMSE', 'MAE']) #Root Mean Squared Error & Mean Absolute Error.

print_perf(perf)

Evaluating RMSE, MAE of algorithm SVD.

------------
Fold 1
RMSE: 1.0439
MAE:  0.7364
------------
Fold 2
RMSE: 1.0473
MAE:  0.7381
------------
Fold 3
RMSE: 1.0483
MAE:  0.7393
------------
Fold 4
RMSE: 1.0431
MAE:  0.7367
------------
------------
Mean RMSE: 1.0456
Mean MAE : 0.7376
------------
------------
        Fold 1  Fold 2  Fold 3  Fold 4  Mean    
RMSE    1.0439  1.0473  1.0483  1.0431  1.0456  
MAE     0.7364  0.7381  0.7393  0.7367  0.7376  


### Slope One
<p>Slope One is a collaborative filtering algorithm. It bases the recommendation on the user and item similarity (<a href=https://arxiv.org/abs/cs/0702144>Source</a>).</p>

In [5]:
# We'll define the algorithm.
algo = SlopeOne()

# Evaluate performances of our algorithm on the dataset.
perf = evaluate(algo, data, measures=['RMSE', 'MAE']) #Root Mean Squared Error & Mean Absolute Error.

print_perf(perf)

Evaluating RMSE, MAE of algorithm SlopeOne.

------------
Fold 1
RMSE: 1.1963
MAE:  0.8606
------------
Fold 2
RMSE: 1.2013
MAE:  0.8634
------------
Fold 3
RMSE: 1.1997
MAE:  0.8634
------------
Fold 4
RMSE: 1.1958
MAE:  0.8608
------------
------------
Mean RMSE: 1.1983
Mean MAE : 0.8621
------------
------------
        Fold 1  Fold 2  Fold 3  Fold 4  Mean    
RMSE    1.1963  1.2013  1.1997  1.1958  1.1983  
MAE     0.8606  0.8634  0.8634  0.8608  0.8621  


## Co-Clustering
<p>Lastly, we will look at <a href=http://surprise.readthedocs.io/en/stable/co_clustering.html?highlight=co%20clustering>Co-Clustering</a>. Here the recommendation is based on the co-clusters between user and item clusters.</p>

In [6]:
# We'll define the algorithm.
algo = CoClustering()

# Evaluate performances of our algorithm on the dataset.
perf = evaluate(algo, data, measures=['RMSE', 'MAE']) #Root Mean Squared Error & Mean Absolute Error.

print_perf(perf)

Evaluating RMSE, MAE of algorithm CoClustering.

------------
Fold 1
RMSE: 1.1834
MAE:  0.8585
------------
Fold 2
RMSE: 1.1774
MAE:  0.8461
------------
Fold 3
RMSE: 1.1866
MAE:  0.8577
------------
Fold 4
RMSE: 1.1822
MAE:  0.8544
------------
------------
Mean RMSE: 1.1824
Mean MAE : 0.8542
------------
------------
        Fold 1  Fold 2  Fold 3  Fold 4  Mean    
RMSE    1.1834  1.1774  1.1866  1.1822  1.1824  
MAE     0.8585  0.8461  0.8577  0.8544  0.8542  


## Comparison of the Surprise recommenders
<p>Based on the mean RMSE, the Matrix Factorisation approach seems to work the best on our data. In the following we perform a few recommendations from this approach. This can be expanded to find the recommendations with the highest value for each user.</p>

In [7]:
#Prediction
uid = str('A3NIQK6ZLYEP1L')  # raw user id (as in the ratings file).
iid = str('B000IK882Y')  # raw item id (as in the ratings file).

#Get the prediction
pred = algo.predict(uid, iid, r_ui=4, verbose=True)

user: A3NIQK6ZLYEP1L item: B000IK882Y r_ui = 4.00   est = 5.00   {'was_impossible': False}


In [8]:
#Prediction
uid = str('A3NIQK6ZLYEP1L')  # raw user id (as in the ratings file).
iid = str('B000I0FXM2')  # raw item id (as in the ratings file).

#Get the prediction
pred = algo.predict(uid, iid, r_ui=4, verbose=True)

user: A3NIQK6ZLYEP1L item: B000I0FXM2 r_ui = 4.00   est = 4.84   {'was_impossible': False}


In [9]:
#Prediction
uid = str('A3NIQK6ZLYEP1L')  # raw user id (as in the ratings file).
iid = str('B000IXNFHY')  # raw item id (as in the ratings file).

#Get the prediction
pred = algo.predict(uid, iid, r_ui=4, verbose=True)

user: A3NIQK6ZLYEP1L item: B000IXNFHY r_ui = 4.00   est = 4.97   {'was_impossible': False}
