<a href="https://www.kaggle.com/code/gpreda/collaborative-filtering-svd-evaluation?scriptVersionId=128774329" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Introduction

We compare 2 different Matrix Factorization methods (SVD & SVDpp) with reference random recommender.


# Analysis preparation

In [1]:
import numpy as np
import pandas as pd
import re
import os
import heapq
from surprise import accuracy
from surprise import Dataset
from surprise import Reader
from sklearn.metrics.pairwise import linear_kernel, cosine_similarity
from surprise.model_selection import cross_validate
from surprise.model_selection import train_test_split
from surprise.model_selection import LeaveOneOut
from surprise import NormalPredictor, SVD, SVDpp

In [2]:
from recommender_metrics import RecommenderMetrics
from movie_lens_data import MovieLensData
from evaluator import Evaluator

# Read the data

In [3]:
path = "/kaggle/input/movielens-100k-dataset/ml-100k"
movie_lens_data = MovieLensData(
    users_path = os.path.join(path, "u.user"),
    ratings_path = os.path.join(path, "u.data"), 
    movies_path = os.path.join(path, "u.item"), 
    genre_path = os.path.join(path, "u.genre") 
    )

evaluation_data = movie_lens_data.read_ratings_data()
movie_data = movie_lens_data.read_movies_data()
popularity_rankings = movie_lens_data.get_popularity_ranks()
ratings = movie_lens_data.get_ratings()

# Prepare evaluator

In [4]:
evaluator = Evaluator(evaluation_data, popularity_rankings)

Number of full trainset users: 943
Number of full trainset items: 1682
Number of trainset users: 943
Number of trainset items: 1641
Size of testset: 25000
Estimating biases using als...
Computing the cosine similarity matrix...
Done computing similarity matrix.


# Add random recommender to evaluator

In [5]:
algo_np = NormalPredictor()
evaluator.add_algorithm(algo_np, "Random")

# Add SVD & SVD++

In [6]:
SVD = SVD()
evaluator.add_algorithm(SVD, "SVD")

In [7]:
SVD_plus_plus = SVDpp()
evaluator.add_algorithm(SVD_plus_plus, "SVD++")

# Evaluate algorithms

In [8]:
evaluator.evaluate(do_top_n=False)

Evaluating  Random ...
Evaluating accuracy...
Analysis complete.
Evaluating  SVD ...
Evaluating accuracy...
Analysis complete.
Evaluating  SVD++ ...
Evaluating accuracy...
Analysis complete.


Algorithm  RMSE       MAE        FCP       
Random     1.5216     1.2211     0.4971    
SVD        0.9412     0.7411     0.6983    
SVD++      0.9255     0.7247     0.7096    

Legend:

RMSE:      Root Mean Squared Error. Lower values mean better accuracy.
MAE:       Mean Absolute Error. Lower values mean better accuracy.
FCP:       Fraction of Concordant Pairs. Higher values mean better accuracy.


# Evaluate topN recommendations

In [9]:
evaluator.sample_top_n_recs(movie_lens_data, test_subject=85, k=10)


Using recommender  Random

Building recommendation model...
Computing recommendations...

We recommend:
Casper 5
That Darn Cat! 5
Copycat 5
Bean 5
Swimming with Sharks 5
Bio-Dome 5
Ninotchka 5
Rock, The 5
Dante's Peak 5
Canadian Bacon 5

Using recommender  SVD

Building recommendation model...
Computing recommendations...

We recommend:
Close Shave, A 4.380873975649877
12 Angry Men 4.254184419357254
Rear Window 4.228485769984509
Shall We Dance? 4.214710756013492
Secrets & Lies 4.185902208775218
Pather Panchali 4.182138898820537
Wallace & Gromit: The Best of Aardman Animation 4.180762607839151
Wrong Trousers, The 4.179825326160878
Usual Suspects, The 4.169170276568919
L.A. Confidential 4.134127381465049

Using recommender  SVD++

Building recommendation model...
Computing recommendations...

We recommend:
Close Shave, A 4.3770623837434695
Wrong Trousers, The 4.351609610569908
12 Angry Men 4.245679321845104
Usual Suspects, The 4.2142760904727465
L.A. Confidential 4.21397074183966
Shall 

In [10]:
evaluator.sample_top_n_recs(movie_lens_data, test_subject=314, k=10)


Using recommender  Random

Building recommendation model...
Computing recommendations...

We recommend:
Kolya 5
Man Without a Face, The 5
Curdled 5
Old Yeller 5
That Darn Cat! 5
Muppet Treasure Island 5
Bogus 5
Annie Hall 5
Pump Up the Volume 5
Big Night 5

Using recommender  SVD

Building recommendation model...
Computing recommendations...

We recommend:
Contact 4.935462723167798
As Good As It Gets 4.90519473682251
Hunt for Red October, The 4.80939775497458
Terminator 2: Judgment Day 4.780721483405336
True Lies 4.654551831595679
Sting, The 4.6492974583413735
Air Force One 4.631384795098233
Great Escape, The 4.623977606515087
Highlander 4.5871440357351565
In the Line of Fire 4.5797332412473715

Using recommender  SVD++

Building recommendation model...
Computing recommendations...

We recommend:
Titanic 5
Jurassic Park 4.8996884266291705
Twister 4.7814278005596025
Air Force One 4.74396364819321
Good Will Hunting 4.69300185828042
Game, The 4.691890755295274
Terminator 2: Judgment Day 