# Recommender for Amazon Books with Co Clustering and more
<p>Based on https://www.analyticsvidhya.com/blog/2016/06/quick-guide-build-recommendation-engine-python/</p>

## Initialisation


<p>We first download and initilise surprise from http://surpriselib.com/</p>

In [1]:
!pip install scikit-surprise



<p>Then we import all the libraries we use</p>

In [2]:
from surprise import Reader
from surprise import CoClustering
from surprise import Dataset
from surprise import evaluate, print_perf

<p>Lastly, we read our data</p>

In [3]:
# path to dataset file
file_path = 'data.csv'

# As we're loading a custom dataset, we need to define a reader. In 
# our data, each line has the following format:
# 'user item rating timestamp', separated by '\t' characters.
reader = Reader(line_format='user item rating timestamp', sep=',')

data = Dataset.load_from_file(file_path, reader=reader)
data.split(n_folds=5) #Todo, consider # of folds

## Co-Clustering
<p>This is based on >>>>>>>>>>>>>>> http://surprise.readthedocs.io/en/stable/co_clustering.html?highlight=co%20clustering </p>
Talk about this algo, as well as surprise project

In [5]:
# We'll use the Co Clustering algorithm.
algo = CoClustering()

# Evaluate performances of our algorithm on the dataset.
perf = evaluate(algo, data, measures=['RMSE', 'MAE']) #Root Mean Squared Error & Mean Absolute Error.

print_perf(perf)

Evaluating RMSE, MAE of algorithm CoClustering.

------------
Fold 1
RMSE: 1.1882
MAE:  0.8555
------------
Fold 2
RMSE: 1.1766
MAE:  0.8458
------------
Fold 3
RMSE: 1.1718
MAE:  0.8420
------------
Fold 4
RMSE: 1.1670
MAE:  0.8381
------------
Fold 5
RMSE: 1.1712
MAE:  0.8425
------------
------------
Mean RMSE: 1.1750
Mean MAE : 0.8448
------------
------------
        Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    
RMSE    1.1882  1.1766  1.1718  1.1670  1.1712  1.1750  
MAE     0.8555  0.8458  0.8420  0.8381  0.8425  0.8448  


<p>Then we perform a few predictions</p>

In [6]:
#Prediction
uid = str(196)  # raw user id (as in the ratings file). They are **strings**!
iid = str(302)  # raw item id (as in the ratings file). They are **strings**!

# get a prediction for specific users and items.
pred = algo.predict(uid, iid, r_ui=4, verbose=True)

user: 196        item: 302        r_ui = 4.00   est = 4.32   {'was_impossible': False}


<p>Just for fun, we also perform Co Clustering on a single full set</p>

In [8]:
# Retrieve the trainset.
trainset = data.build_full_trainset()

# Build an algorithm, and train it.
algo = CoClustering()
algo.train(trainset)

#Prediction
uid = str(196)  # raw user id (as in the ratings file). They are **strings**!
iid = str(302)  # raw item id (as in the ratings file). They are **strings**!

# get a prediction for specific users and items.
pred = algo.predict(uid, iid, r_ui=4, verbose=True)

user: 196        item: 302        r_ui = 4.00   est = 4.32   {'was_impossible': False}


<p>No difference, which seems quite odd</p>

<p>Let's try to run a Matrix Factorization-based algorithm</p>

In [10]:
from surprise import SVD
# We'll use the Matrix Factorization-based algorithm.
algo = SVD()

# Evaluate performances of our algorithm on the dataset.
perf = evaluate(algo, data, measures=['RMSE', 'MAE']) #Root Mean Squared Error & Mean Absolute Error.

print_perf(perf)

Evaluating RMSE, MAE of algorithm SVD.

------------
Fold 1
RMSE: 1.0478
MAE:  0.7361
------------
Fold 2
RMSE: 1.0456
MAE:  0.7369
------------
Fold 3
RMSE: 1.0432
MAE:  0.7348
------------
Fold 4
RMSE: 1.0452
MAE:  0.7347
------------
Fold 5
RMSE: 1.0436
MAE:  0.7363
------------
------------
Mean RMSE: 1.0451
Mean MAE : 0.7358
------------
------------
        Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    
RMSE    1.0478  1.0456  1.0432  1.0452  1.0436  1.0451  
MAE     0.7361  0.7369  0.7348  0.7347  0.7363  0.7358  


In [11]:
#Prediction
uid = str(196)  # raw user id (as in the ratings file). They are **strings**!
iid = str(302)  # raw item id (as in the ratings file). They are **strings**!

# get a prediction for specific users and items.
pred = algo.predict(uid, iid, r_ui=4, verbose=True)

user: 196        item: 302        r_ui = 4.00   est = 4.32   {'was_impossible': False}


<p>Odd. We look into this issue in future. We get a better RMSE, yet the same reccomendation is performed. We will have to look into how the difference is defined in Surprise, as well as print out some more reccomendations.</p> - yes, I'm a we atm. :)