# Recommending Top-N movies - Part 2/2 Testing recommendation algorithms

In part 1 of this project, we looked at the MovieLens data set and implemented algorithms to recommend movies based off of this. We took a brief look at some of these algorithms parameters and differences in runtime. In this part of the project, we will use various metrics to test the accuracy and validity of the recommended movies using a train-test-validation split of the reduced data set.

To do this we'll use three new classes: SplitData, Metrics, and Tester which split the ratings pivot data frame, run metrics on the recommendations, and run tests over multiple users respectively. More documentation for these classes can be found in the MovieLensData.py and Tester.py files, with further information on the algorithms in the Algorithms.py file.

---------------

## Preparing the Data

- Creating the ratings pivot data frame
- Splitting the data for testing
- Building the models

We'll start by importing the required packages and building the ratings pivot data frame from a reduced set of data. As discussed in the part 1, we're sampling users with rating counts between 200-6000 and limiting movies to those with more than 100 ratings to avoid biases from outliers and to work within the constraints of computing power available. However, as the tests are very computationally intensive, I will initially be reducing this further by taking a random sample of 10% of the reduced userIDs.

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from IPython.display import display
from time import perf_counter
from tqdm.auto import tqdm
import sys
import importlib
import random
import MovieLensData
import Algorithms
import Tester

import warnings
warnings.filterwarnings('ignore')
sns.set()

In [2]:
ml = MovieLensData.MovieLensData()
userIDs = ml.filterIDs('userId', minRatings=200, maxRatings=6000)
movieIDs = ml.filterIDs('movieId', minRatings=100)
userIDs = random.sample(userIDs, int(0.1 * len(userIDs)))
ml.reduce(userIDs, 'userId', 'ratings')
ml.reduce(movieIDs, 'movieId', 'movies')

ratingsData = ml.buildPivot(printStats=True)

Loading data...[92m Done [0m
Building movies/ratings pivot df...[92m Done [0m
[94m3528 / 283228 users retained (1.25%)[0m
[94m10498 / 58098 movies retained (18.07%)[0m
[94m1661576 / 27753444 ratings retained (5.99%)[0m
[94m95.51% sparsity[0m


The SplitData class splits up the ratings pivot data frame into a trainSet and testSet_full. testSet_full is then split in two ways: one into a a test-validation split where every user in testSet_full has a percentage or number of ratings split off into the validation set, and one that takes away just one rating from each user for leave-one-out (LOO) cross validation. The size of testSet_full and validation sets are governed by the testSize and validationSize parameters respectively. As the data is relatively large, we afford to keep more users in the train set, so 20% splits will be used for both the testSize and validationSize parameters.

The ratings in the test set are used to generate recommendations, with the two validation sets used to check the quality of recommendations against various metrics. We can ensure the same splits occur each time via the randomState parameter for consistency in results reporting.

In [3]:
train, test, validation, LOO = MovieLensData.SplitData(ratingsData).buildAll(testSize=0.2, 
                                                                             validationSize=0.2, randomState=20)

Building train/test/validation split by row...[92m Done [0m
Building LeaveOneOut-CrossValidation data...[92m Done [0m


As seen in the previous part, we can easily build each model by calling the buildModel method in the Algorithm class. This time, we initialise the method by passing the train and test set data so the program knows which set to build the model from, and which to use to look up ratings for testing. We will also need to call the buildMatrix method to generate the sparse matrix for building the KNN models.

However, this time we will not need to generate a similarity matrix for the users. As these models are built off of the train set and we will not be testing any f the userIDs in this set, we do not need to build a user similarity matrix for the CF algorithm and can simply pass 'None' as the model parameter when testing this algorithm.

In [4]:
algo = Algorithms.Algorithms(ml, train, test)
ratingsSparse = algo.buildMatrix()
printStatus = True

CF_itemModel = algo.buildModel(modelType='CF', matrix='item', printStatus=printStatus)
KNN_itemModel = algo.buildModel(modelType='KNN', matrix=ratingsSparse.transpose(), printStatus=printStatus)
KNN_userModel = algo.buildModel(modelType='KNN', matrix=ratingsSparse, printStatus=printStatus)
#SVD_model = algo.buildModel(modelType='SVD', printStatus=printStatus)

Initialising object...[92m Done [0m
Building sparse matrix...[92m Done [0m
[1m
Building CF model[0m
Correlating ratings for all movies...[92m Done [0m
Correlating genres for all movies...[92m Done [0m
Correlating years for all movies...[92m Done [0m
Generating combined correlation...[92m Done [0m
[92mDone. Time: 13.144672100000001s[0m
[1m
Building KNN model[0m
[92mDone. Time: 0.024962500000000887s[0m
[1m
Building KNN model[0m
[92mDone. Time: 0.007154299999996283s[0m


## Testing and Metrics

- Adding algorithms for testing
- Running basic and parameter tests
- MAE, RMSE, Coverage, Diversity, Novelty, Hit Rates (Validation, LOO-CV, Cumulative, Mean Reciprocal, Actual Rating)

The Metrics class contains the various metrics we'll be using the test the quality of our results. This takes the validation and LOO-CV sets to check the results against, a filename for the csv that stores the results, as well as various parameters for testing (discussed below).

This object is then passed to the Tester class to create our test object. This object has two primary methods for testing, runBasicTest and runParameterTest, as well as two methods for adding and removin algorithms. All we need to do to set up tests is to instantiate the tester object, and pass the various algorithms' names, method to call, model and all other parameters to control for this test. The parameters we control will be discussed in greater detail when discussing results later in this notebook.

Here we'll also add a random control algorithm which simply generates random similarity scores and random rating precitions for every user. The randomRatings parameter of this method determines whether or not the rating predictions are a random number between 0-5 independent of the similarity scores, or if the rating predictions are a multiple of the randomly generated similarity scores.

In [5]:
filename = 'ML_RecommenderMetrics'
evaluator = Tester.Metrics(validation, LOO, topN=10, moviesPerPage=5, thresholdRating=3.0, csvName=filename)
tester = Tester.Tester(evaluator)

neighbours = 100
sample = 100
thresh = 2.0

tester.addAlgorithm('Item-Based CF', algo.itemBased, model=CF_itemModel, modelType='CF', neighbours=neighbours,
                    sample=sample, threshold=thresh, pred='calc')
tester.addAlgorithm('User-Based CF', algo.userBased, model=None, modelType='CF', neighbours=neighbours,
                    sample=sample, threshold=thresh, pred='calc')
tester.addAlgorithm('Item-Based KNN', algo.itemBased, model=KNN_itemModel, modelType='KNN', neighbours=neighbours,
                    sample=sample, threshold=thresh, pred='calc')
tester.addAlgorithm('User-Based KNN', algo.userBased, model=KNN_userModel, modelType='KNN', neighbours=neighbours,
                    sample=sample, threshold=thresh, pred='calc')
#tester.addAlgorithm('SVD', algo.SVD, model=SVD_model, sample=sample, pred='calc')
tester.addAlgorithm('Random Control', algo.random, randomRatings=True)

ML_RecommenderMetrics.csv already exists. Update this file (u) or overwrite (o)? u
Added Item-Based CF
Added User-Based CF
Added Item-Based KNN
Added User-Based KNN
Added Random Control


With the algorithms loaded, we can now run a simple test using the test set. The runBasicTest takes a few key arguements: 
- testData - the test set
- testAlgo - the name of the algorithm from the dictionary of loaded algorithms to test 
- sampleTest - the number of users from the test to sample
- param, pValue - the paramater and its value to modify from the stored parameters within the object

Here we run a simply test on the above using the stored algorithms for 10 randomly sample users in the test set.

In [6]:
tester.runBasicTest(test, sampleTest=10)

Testing Item-Based CF:   0%|          | 0/10 [00:00<?, ? Users/s]

Testing User-Based CF:   0%|          | 0/10 [00:00<?, ? Users/s]

Testing Item-Based KNN:   0%|          | 0/10 [00:00<?, ? Users/s]

Testing User-Based KNN:   0%|          | 0/10 [00:00<?, ? Users/s]

Testing Random Control:   0%|          | 0/10 [00:00<?, ? Users/s]

The runParameterTest method allows us to iterate through a range of values for a given parameter. The parameter is passed to the runBasicTest to run tests across the users present in the test set. This allows us to see the effect of changing parameters within the methods, with the aim of fine-tuning the algorithms. The parameters for this function are much the same as the runBasicTest method, with pValue now replaced with pRange as the range of parameters to test.

Here we run a basic demonstration on the 'pred' parameter which changes the rating prediction algorithm (discussed in greater in the results).

In [7]:
parameter = 'pred'
pRange = ['rand', 'calc', 'sims', 'norm_sims']
sampleTest = 10

tester.runParameterTest(test, param=parameter, pRange=pRange, sampleTest=sampleTest, printResults=False)

Parameter Testing:   0%|          | 0/4 [00:00<?, ?Parameter/s]

Testing Item-Based CF:   0%|          | 0/10 [00:00<?, ? Users/s]

Testing User-Based CF:   0%|          | 0/10 [00:00<?, ? Users/s]

Testing Item-Based KNN:   0%|          | 0/10 [00:00<?, ? Users/s]

Testing User-Based KNN:   0%|          | 0/10 [00:00<?, ? Users/s]

Testing Random Control:   0%|          | 0/10 [00:00<?, ? Users/s]

Testing Item-Based CF:   0%|          | 0/10 [00:00<?, ? Users/s]

Testing User-Based CF:   0%|          | 0/10 [00:00<?, ? Users/s]

Testing Item-Based KNN:   0%|          | 0/10 [00:00<?, ? Users/s]

Testing User-Based KNN:   0%|          | 0/10 [00:00<?, ? Users/s]

Testing Random Control:   0%|          | 0/10 [00:00<?, ? Users/s]

Testing Item-Based CF:   0%|          | 0/10 [00:00<?, ? Users/s]

Testing User-Based CF:   0%|          | 0/10 [00:00<?, ? Users/s]

Testing Item-Based KNN:   0%|          | 0/10 [00:00<?, ? Users/s]

Testing User-Based KNN:   0%|          | 0/10 [00:00<?, ? Users/s]

Testing Random Control:   0%|          | 0/10 [00:00<?, ? Users/s]

Testing Item-Based CF:   0%|          | 0/10 [00:00<?, ? Users/s]

Testing User-Based CF:   0%|          | 0/10 [00:00<?, ? Users/s]

Testing Item-Based KNN:   0%|          | 0/10 [00:00<?, ? Users/s]

Testing User-Based KNN:   0%|          | 0/10 [00:00<?, ? Users/s]

Testing Random Control:   0%|          | 0/10 [00:00<?, ? Users/s]

By calling the readCSV method in the evaluator class, we can see the results we have just generated ready for analysis.

In [8]:
evaluator.readCSV(filename).head(10)

Unnamed: 0,Algorithm,Top-N,MoviesPerPage,ThresholdRating,TotalIDs,ParameterTest,ParameterValue,OtherParameters,TotalTime/s,MeanTime/s,MAE,RMSE,Coverage,Diversity,Novelty,LOOCV_HR,Validation_HR,Cumulative_HR,MeanReciprocal_HR,ActualRating_HR
0,Item-Based CF,10,5,3.0,10,,,"{'modelType': 'CF', 'neighbours': 100, 'sample...",0.372827,0.037283,0.817882,0.847521,0.292219,0.303801,0.077635,0.0,0.18,0.18,0.13,"{2.0: 0.000641025641025641, 3.0: 0.00333333333..."
1,User-Based CF,10,5,3.0,10,,,"{'modelType': 'CF', 'neighbours': 100, 'sample...",11.811715,1.181171,0.256619,0.269141,0.252552,0.430652,0.008315,0.0,0.12,0.12,0.085,"{3.0: 0.0011494252873563218, 4.0: 0.0042596348..."
2,Item-Based KNN,10,5,3.0,10,,,"{'modelType': 'KNN', 'neighbours': 100, 'sampl...",28.573854,2.857385,0.543997,0.593887,0.12039,0.156988,0.003518,0.0,0.1,0.1,0.08,"{1.0: 0.001020408163265306, 3.0: 0.00194174757..."
3,User-Based KNN,10,5,3.0,10,,,"{'modelType': 'KNN', 'neighbours': 100, 'sampl...",2.816096,0.28161,0.475467,0.536255,0.09761,0.409357,0.010964,0.0,0.3,0.3,0.22,"{3.0: 0.005664830841856806, 3.5: 0.00738613271..."
4,Random Control,10,5,3.0,10,,,{'randomRatings': True},0.017117,0.001712,0.0,0.0,1.0,0.000539,0.56825,0.0,0.0,0.0,0.0,{}
5,Item-Based CF,10,5,3.0,10,pred,rand,"{'modelType': 'CF', 'neighbours': 100, 'sample...",0.111868,0.011187,1.837523,2.022494,0.262286,0.313915,0.061639,0.1,0.22,0.12,0.175,"{2.0: 0.0024390243902439024, 2.5: 0.0008130081..."
6,User-Based CF,10,5,3.0,10,pred,rand,"{'modelType': 'CF', 'neighbours': 100, 'sample...",11.79968,1.179968,1.125994,1.167066,0.25719,0.43062,0.009057,0.0,0.13,0.06,0.095,"{2.5: 0.0013986013986013986, 3.0: 0.0006993006..."
7,Item-Based KNN,10,5,3.0,10,pred,rand,"{'modelType': 'KNN', 'neighbours': 100, 'sampl...",28.734914,2.873491,1.689455,1.862067,0.09681,0.162071,0.002309,0.0,0.14,0.07,0.105,"{3.0: 0.0018867924528301887, 3.5: 0.0031735985..."
8,User-Based KNN,10,5,3.0,10,pred,rand,"{'modelType': 'KNN', 'neighbours': 100, 'sampl...",2.825017,0.282502,1.85348,1.990853,0.110933,0.40959,0.01091,0.0,0.35,0.1,0.25,"{3.0: 0.004897959183673469, 3.5: 0.00571428571..."
9,Random Control,10,5,3.0,10,pred,rand,{'randomRatings': True},0.016694,0.001669,0.453887,0.453887,1.0,0.000535,0.478374,0.0,0.02,0.01,0.015,"{4.0: 0.0014492753623188406, 4.5: 0.0005464480..."


### Metrics and the Metrics class

The recommendations for each userID are passed to the an instantiate object of the Metrics class for evaluation. It runs various tests and saves the results to a csv file for later analysis. It requires a few paramaters upon initialisation to run the tests: 
- Top-N: the top number of recommendations to consider (used by all metrics bar coverage)
- moviesPerPage: the number of movies that will appear per page for the end-user, used only by the meanReciprocalHR metric)
- thresholdRating: the minimum rating to be considered by the cumulativeHR metric

Below is a full list and description of the metrics used for testing.

#### MAE


#### RMSE


#### Coverage


#### Diversity


#### Novelty


#### Validation HitRate


#### LeaveOneOut-CrossValidation HitRate


#### Cumulative HitRate


#### Mean Reciprocal HitRate


#### Actual Rating HitRate

## Results

- Basic results
- Testing the rating prediction paramater
- Testing the threshold rating parameter
- Testing the sample parameter for item-based algorithms
- Testing the sample parameter for user-based algorithms
- Testing the neighbours parameter for item-based algorithms
- Testing the neighbours parameter for user-based algorithms
- Runtimes

In [11]:
filename = 'ML_RecommenderMetrics'
evaluator = Tester.Metrics(validation, LOO, topN=30, moviesPerPage=5, thresholdRating=3.0, csvName=filename)
tester = Tester.Tester(evaluator)

neighbours = 100
sample = 100
thresh = 2.0

tester.addAlgorithm('Item-Based CF', algo.itemBased, model=CF_itemModel, modelType='CF', neighbours=neighbours,
                    sample=sample, threshold=thresh, pred='calc')
tester.addAlgorithm('User-Based CF', algo.userBased, model=None, modelType='CF', neighbours=neighbours,
                    sample=sample, threshold=thresh, pred='calc')
tester.addAlgorithm('Item-Based KNN', algo.itemBased, model=KNN_itemModel, modelType='KNN', neighbours=neighbours,
                    sample=sample, threshold=thresh, pred='calc')
tester.addAlgorithm('User-Based KNN', algo.userBased, model=KNN_userModel, modelType='KNN', neighbours=neighbours,
                    sample=sample, threshold=thresh, pred='calc')

Added Item-Based CF
Added User-Based CF
Added Item-Based KNN
Added User-Based KNN


In [8]:
parameter = 'pred'
pRange = ['rand', 'sims', 'norm_sims']

tester.runParameterTest(test, param=parameter, pRange=pRange, printResults=False)

Parameter Testing:   0%|          | 0/3 [00:00<?, ?Parameter/s]

Testing Item-Based CF:   0%|          | 0/2056 [00:00<?, ? Users/s]

Testing User-Based CF:   0%|          | 0/2056 [00:00<?, ? Users/s]

StatisticsError: mean requires at least one data point

In [12]:
parameter = 'threshold'
pRange = np.arange(2, 5.5, 0.5)

tester.runParameterTest(test, param=parameter, pRange=pRange, printResults=False)

Parameter Testing:   0%|          | 0/7 [00:00<?, ?Parameter/s]

Testing Item-Based CF:   0%|          | 0/706 [00:00<?, ? Users/s]

Testing User-Based CF:   0%|          | 0/706 [00:00<?, ? Users/s]

Testing Item-Based KNN:   0%|          | 0/706 [00:00<?, ? Users/s]

Testing User-Based KNN:   0%|          | 0/706 [00:00<?, ? Users/s]

Testing Item-Based CF:   0%|          | 0/706 [00:00<?, ? Users/s]

Testing User-Based CF:   0%|          | 0/706 [00:00<?, ? Users/s]

Testing Item-Based KNN:   0%|          | 0/706 [00:00<?, ? Users/s]

Testing User-Based KNN:   0%|          | 0/706 [00:00<?, ? Users/s]

Testing Item-Based CF:   0%|          | 0/706 [00:00<?, ? Users/s]

Testing User-Based CF:   0%|          | 0/706 [00:00<?, ? Users/s]

Testing Item-Based KNN:   0%|          | 0/706 [00:00<?, ? Users/s]

Testing User-Based KNN:   0%|          | 0/706 [00:00<?, ? Users/s]

Testing Item-Based CF:   0%|          | 0/706 [00:00<?, ? Users/s]

Testing User-Based CF:   0%|          | 0/706 [00:00<?, ? Users/s]

Testing Item-Based KNN:   0%|          | 0/706 [00:00<?, ? Users/s]

Testing User-Based KNN:   0%|          | 0/706 [00:00<?, ? Users/s]

Testing Item-Based CF:   0%|          | 0/706 [00:00<?, ? Users/s]

Testing User-Based CF:   0%|          | 0/706 [00:00<?, ? Users/s]

Testing Item-Based KNN:   0%|          | 0/706 [00:00<?, ? Users/s]

Testing User-Based KNN:   0%|          | 0/706 [00:00<?, ? Users/s]

Testing Item-Based CF:   0%|          | 0/706 [00:00<?, ? Users/s]

Testing User-Based CF:   0%|          | 0/706 [00:00<?, ? Users/s]

ZeroDivisionError: division by zero

In [None]:
parameter = 'sample'
pRange = np.arange(10, 100, 10)

tester.runParameterTest(test, testAlgo='Item-Based CF', param=parameter, pRange=pRange, printResults=False)
tester.runParameterTest(test, testAlgo='Item-Based KNN', param=parameter, pRange=pRange, printResults=False)
tester.runParameterTest(test, testAlgo='User-Based CF', param=parameter, pRange=pRange, printResults=False)
tester.runParameterTest(test, testAlgo='User-Based KNN', param=parameter, pRange=pRange, printResults=False)

In [None]:
parameter = 'sample'
pRange = np.arange(100, 6050, 50)

tester.runParameterTest(test, testAlgo='Item-Based CF', param=parameter, pRange=pRange, printResults=False)
#tester.runParameterTest(test, testAlgo='Item-Based KNN', param=parameter, pRange=pRange, printResults=False)
tester.runParameterTest(test, testAlgo='User-Based CF', param=parameter, pRange=pRange, printResults=False)
tester.runParameterTest(test, testAlgo='User-Based KNN', param=parameter, pRange=pRange, printResults=False)

In [None]:
parameter = 'neighbours'
pRange = np.arange(10, 100, 10)

tester.runParameterTest(test, testAlgo='Item-Based CF', param=parameter, pRange=pRange, printResults=False)
tester.runParameterTest(test, testAlgo='Item-Based KNN', param=parameter, pRange=pRange, printResults=False)
tester.runParameterTest(test, testAlgo='User-Based CF', param=parameter, pRange=pRange, printResults=False)
tester.runParameterTest(test, testAlgo='User-Based KNN', param=parameter, pRange=pRange, printResults=False)

In [None]:
parameter = 'neighbours'
pRange = np.arange(100, len(train.columns), 50)

tester.runParameterTest(test, testAlgo='Item-Based CF', param=parameter, pRange=pRange, printResults=False)
#tester.runParameterTest(test, testAlgo='Item-Based KNN', param=parameter, pRange=pRange, printResults=False)

In [None]:
parameter = 'neighbours'
pRange = np.arange(100, len(train.index), 200)

tester.runParameterTest(test, testAlgo='User-Based CF', param=parameter, pRange=pRange, printResults=False)
tester.runParameterTest(test, testAlgo='User-Based KNN', param=parameter, pRange=pRange, printResults=False)

## Conclusion

- InLine Comments
- Discussion
- LOO-CV is invalid, update metrics to test validation OR LOO-CV. ReRun basic test for LOO-CV at end.
- README file