In this notebook, the KNNBaseline algorithm will be introduced. User bias can be an issue
in certain datasets and using the baseline aims to improve predictions by incorporating a baseline rating


In [20]:
import time

# import libraries
from surprise import Dataset
from surprise.accuracy import rmse
from own_algorithms.UserItemKNN import UserItemKNN
from surprise import KNNBaseline
from own_algorithms.UserItemKNNv2 import UserItemKNNv2
from surprise.model_selection import train_test_split
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random
from surprise.model_selection import GridSearchCV



Before running the algorithm, we are gonna test for the best parameter setup. Unlike earlier, GridSearch will be used for sake of simplicity.

In [2]:
# takes a while to run- 21 mins
data= Dataset.load_builtin('ml-100k')
param_grid = {'k': [10, 20, 30, 40, 50, 60],
              'sim_options': {'name': ['cosine', 'pearson_baseline'],
                              'min_support': [1, 5],
                              'user_based': [True, False]}}


# Instantiate the GridSearchCV object and fit the data
gs = GridSearchCV(KNNBaseline, param_grid, measures=['rmse', 'mae'], cv=5)
gs.fit(data)

# Print the best RMSE score and the corresponding parameters
print(gs.best_score['rmse'])
print(gs.best_params['rmse'])

Estimating biases using als...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Comput

In [25]:
algo = gs.best_estimator['rmse']

#test on 100k
trainset, testset = train_test_split(data, test_size=0.25)

algo.fit(trainset)

predictions = algo.test(testset)
print("Unbiased accuracy for KNNBaseline on ml100k,", end=" ")
algo_rmse= rmse(predictions)

#test on 1m
data=Dataset.load_builtin('ml-1m')
trainset, testset = train_test_split(data, test_size=0.25)
fit_start= time.time()
algo.fit(trainset)
fit_baseline= time.time()-fit_start
predict_start=time.time()
predictions = algo.test(testset)
predict_baseline=time.time()-predict_start
print("Unbiased accuracy for KNNBaseline on ml-1m,", end=" ")
algo_rmse_1m= rmse(predictions)
baseline_stats= np.array([algo_rmse_1m, fit_baseline, predict_baseline])

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Unbiased accuracy for KNNBaseline on ml100k, RMSE: 0.8610
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Unbiased accuracy for KNNBaseline on ml-1m, RMSE: 0.8638


In [4]:
KNNbaselineScores= pd.DataFrame(data={"Algo":["KNNBaseline"],
                                "100k": [algo_rmse],
                                "1M":[algo_rmse_1m]})
KNNbaselineScores.to_csv('./algo_data/KNNBaselineOp.csv', index=False)

In [24]:
baseline_stats

array([ 0.85999087, 37.72652411, 78.87895727])

In [28]:
# load knn scores and update with
data= pd.read_csv('./algo_data/KNN_1m')
new_row= {'Algorithm': 'KNN Baseline', 'RMSE': algo_rmse_1m, 'Fit Time': fit_baseline, 'Predict Time': predict_baseline}
data.loc[len(data)] = new_row
data.to_csv('./algo_data/KNN_1m', index=False)

In [5]:
movies_cols = ['movie_id', 'title', 'genres']
movies_df = pd.read_csv('./ml-1m/movies.dat', sep='::', names=movies_cols, engine='python', encoding='latin-1')

In [6]:
# load dataframes with predictions
user1= pd.read_csv('./predictions/1.csv')
user134= pd.read_csv('./predictions/134.csv')
user398= pd.read_csv('./predictions/398.csv')

In [16]:

from own_algorithms.top_n_list import get_top_n_list

movies=get_top_n_list(predictions, 10 ,'398', movies_df)
user398["KNN Baseline"]= movies
user398.to_csv('./predictions/398.csv', index=False)

In [17]:
movies=get_top_n_list(predictions, 10 ,'1', movies_df)
user1["KNN Baseline"]= movies
user1.to_csv('./predictions/1.csv',index=False)

In [18]:
movies=get_top_n_list(predictions, 10 ,'134', movies_df)
user134["KNN Baseline"]= movies
user134.to_csv('./predictions/134.csv', index=False)

In [14]:
user134

Unnamed: 0,Hybrid,KNN Basic,KNN Baseline
0,Braveheart (1995),Braveheart (1995),Rumble in the Bronx (1995)
1,In the Line of Fire (1993),In the Line of Fire (1993),In the Line of Fire (1993)
2,"Last of the Mohicans, The (1992)","Last of the Mohicans, The (1992)",Rudy (1993)
3,Austin Powers: International Man of Mystery (1...,Austin Powers: International Man of Mystery (1...,Grosse Pointe Blank (1997)
4,"Full Monty, The (1997)","Full Monty, The (1997)",Contact (1997)
5,"Mask of Zorro, The (1998)",Office Space (1999),Saving Private Ryan (1998)
6,Being John Malkovich (1999),Being John Malkovich (1999),My Cousin Vinny (1992)
7,Toy Story 2 (1999),Toy Story 2 (1999),Enemy of the State (1998)
8,Chicken Run (2000),Chicken Run (2000),Office Space (1999)
9,Almost Famous (2000),Almost Famous (2000),High Fidelity (2000)


In [15]:
user1

Unnamed: 0,Hybrid,KNN Basic,KNN Baseline
0,Fargo (1996),Fargo (1996),Star Wars: Episode IV - A New Hope (1977)
1,Gigi (1958),Gigi (1958),Schindler's List (1993)
2,Cinderella (1950),Cinderella (1950),My Fair Lady (1964)
3,One Flew Over the Cuckoo's Nest (1975),One Flew Over the Cuckoo's Nest (1975),"Wizard of Oz, The (1939)"
4,Ben-Hur (1959),Ben-Hur (1959),Driving Miss Daisy (1989)
5,Saving Private Ryan (1998),Saving Private Ryan (1998),Saving Private Ryan (1998)
6,"Christmas Story, A (1983)","Christmas Story, A (1983)",Miracle on 34th Street (1947)
7,Ferris Bueller's Day Off (1986),Ferris Bueller's Day Off (1986),Run Lola Run (Lola rennt) (1998)
8,Awakenings (1990),Awakenings (1990),"Sixth Sense, The (1999)"
9,Toy Story 2 (1999),Toy Story 2 (1999),"Christmas Story, A (1983)"
