<a href="https://www.kaggle.com/code/gpreda/collaborative-filtering-knn-user-item-evaluation?scriptVersionId=128769100" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Introduction


Compare collaborative filtering using KNN:   
* User-based collaborative filtering  
* Item-based collaborative filtering  
* Random generated recommendations (baseline)  


# Prepare analysis

In [1]:
import numpy as np
import pandas as pd
import re
import os
import heapq
from surprise import accuracy
from surprise import Dataset
from surprise import Reader
from sklearn.metrics.pairwise import linear_kernel, cosine_similarity
from surprise.model_selection import cross_validate
from surprise.model_selection import train_test_split
from surprise.model_selection import LeaveOneOut
from surprise import NormalPredictor, KNNBasic

In [2]:
from recommender_metrics import RecommenderMetrics
from movie_lens_data import MovieLensData
from evaluator import Evaluator

# Read the data

In [3]:
path = "/kaggle/input/movielens-100k-dataset/ml-100k"
movie_lens_data = MovieLensData(
    users_path = os.path.join(path, "u.user"),
    ratings_path = os.path.join(path, "u.data"), 
    movies_path = os.path.join(path, "u.item"), 
    genre_path = os.path.join(path, "u.genre") 
    )

evaluation_data = movie_lens_data.read_ratings_data()
movie_data = movie_lens_data.read_movies_data()
popularity_rankings = movie_lens_data.get_popularity_ranks()
ratings = movie_lens_data.get_ratings()

# Prepare evaluator

In [4]:
evaluator = Evaluator(evaluation_data, popularity_rankings)

Number of full trainset users: 943
Number of full trainset items: 1682
Number of trainset users: 943
Number of trainset items: 1641
Size of testset: 25000
Estimating biases using als...
Computing the cosine similarity matrix...
Done computing similarity matrix.


# Add random recommender to evaluator

In [5]:
algo_np = NormalPredictor()
evaluator.add_algorithm(algo_np, "Random")

# Add item-based collaborative filtering RecSys to evaluator


Using the sim_options, we specify the type of similarity calculation and if the collaborative filtering is user based (in this case, No)

In [6]:
item_KNN = KNNBasic(sim_options = {'name': 'pearson', 'user_based': False})
evaluator.add_algorithm(item_KNN, "Item KNN")

# Add user-based collaborative filtering RecSys to evaluator

Using the sim_options, we specify the type of similarity calculation and if the collaborative filtering is user based (in this case, Yes)

In [7]:
user_KNN = KNNBasic(sim_options = {'name': 'pearson', 'user_based': True})
evaluator.add_algorithm(user_KNN, "User KNN")

# Evaluate algorithms

In [8]:
evaluator.evaluate(do_top_n=False)

Evaluating  Random ...
Evaluating accuracy...
Analysis complete.
Evaluating  Item KNN ...
Evaluating accuracy...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Analysis complete.
Evaluating  User KNN ...
Evaluating accuracy...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Analysis complete.


Algorithm  RMSE       MAE        FCP       
Random     1.5251     1.2224     0.4945    
Item KNN   1.0442     0.8338     0.5456    
User KNN   1.0166     0.8059     0.7080    

Legend:

RMSE:      Root Mean Squared Error. Lower values mean better accuracy.
MAE:       Mean Absolute Error. Lower values mean better accuracy.
FCP:       Fraction of Concordant Pairs. Higher values mean better accuracy.


In [9]:
# Time consuming, uncomment optionally
evaluator.evaluate(do_top_n=True)

Evaluating  Random ...
Evaluating accuracy...
Evaluating top-N with leave-one-out...
Computing hit-rate and rank metrics...
Computing recommendations with full data set...
Analyzing coverage, diversity, and novelty...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Analysis complete.
Evaluating  Item KNN ...
Evaluating accuracy...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Evaluating top-N with leave-one-out...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing hit-rate and rank metrics...
Computing recommendations with full data set...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Analyzing coverage, diversity, and novelty...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Analysis complete.
Evaluating  User KNN ...
Evaluating accuracy...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Evaluating top-N

# Evaluate topN recommendations

In [10]:
evaluator.sample_top_n_recs(movie_lens_data, test_subject=85, k=10)


Using recommender  Random

Building recommendation model...
Computing recommendations...

We recommend:
Die Hard 5
Naked Gun 33 1/3: The Final Insult 5
Executive Decision 5
Swimming with Sharks 5
Devil's Own, The 5
Courage Under Fire 5
Homeward Bound: The Incredible Journey 5
It Could Happen to You 5
GoldenEye 5
Kazaam 5

Using recommender  Item KNN

Building recommendation model...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing recommendations...

We recommend:
Salut cousin! 4.5
To Have, or Not 4.5
Good Man in Africa, A 4.4
Amityville Curse, The 4.148591942911796
Amityville 3-D 4.137656281621579
Inkwell, The 4.0
Castle Freak 4.0
Turning, The 4.0
Commandments 4.0
Letter From Death Row, A 4.0

Using recommender  User KNN

Building recommendation model...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing recommendations...

We recommend:
Prefontaine 5
Santa with Muscles 5
Boys, Les 5
Great Day in Harlem, A 5
Ai

In [11]:
evaluator.sample_top_n_recs(movie_lens_data, test_subject=85, k=5)


Using recommender  Random

Building recommendation model...
Computing recommendations...

We recommend:
Hunt for Red October, The 5
Age of Innocence, The 5
Casper 5
Mr. Holland's Opus 5
Jean de Florette 5
Dangerous Minds 5
Liar Liar 5
Terminator 2: Judgment Day 5
Clockers 5
Chain Reaction 5

Using recommender  Item KNN

Building recommendation model...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing recommendations...

We recommend:
Salut cousin! 4.5
To Have, or Not 4.5
Good Man in Africa, A 4.4
Amityville Curse, The 4.148591942911796
Amityville 3-D 4.137656281621579
Inkwell, The 4.0
Castle Freak 4.0
Turning, The 4.0
Commandments 4.0
Letter From Death Row, A 4.0

Using recommender  User KNN

Building recommendation model...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing recommendations...

We recommend:
Prefontaine 5
Santa with Muscles 5
Boys, Les 5
Great Day in Harlem, A 5
Aiqing wansui 5
For the Moment 5


In [12]:
evaluator.sample_top_n_recs(movie_lens_data, test_subject=314, k=10)


Using recommender  Random

Building recommendation model...
Computing recommendations...

We recommend:
To Wong Foo, Thanks for Everything! Julie Newmar 5
Sabrina 5
Silence of the Lambs, The 5
Rear Window 5
Crumb 5
Tales from the Hood 5
Return of the Jedi 5
Angels and Insects 5
Ben-Hur 5
Batman 5

Using recommender  Item KNN

Building recommendation model...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing recommendations...

We recommend:
Hugo Pool 5
Ill Gotten Gains 5
Hugo Pool 5
Rough Magic 5
Temptress Moon 5
Men of Means 5
Truman Show, The 4.894106578227169
Leading Man, The 4.843018994729905
Relative Fear 4.818263625777179
Simple Wish, A 4.666666666666667

Using recommender  User KNN

Building recommendation model...
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing recommendations...

We recommend:
Prefontaine 5
Great Day in Harlem, A 5
Innocents, The 5
Visitors, The 5
Star Kid 5
Saint of Fort Washington, 