It does comparative analysis of various techniques used in implementing Recommender Systems on the basis of their errors using Root Mean Square Error, Precision on top K and Spearman Rank Correlation. It also compares their overall running time and prediction time.
The Dataset used was taken from 1M MovieLens. These files contain 1,000,209 anonymous ratings of approximately 3,900 movies made by 6,040 MovieLens users who joined MovieLens in 2000.
This was divided into training and testing dataset of 80-20 percent.
Following techniques were used to implement the movie recommendation systems. First two techniques are variations of collaborative filtering which is based on KNN (K Neareast Neighrbour) algorithm. Remaining four are based on idea of utility matrix decomposition.
To account for generous and strict raters, we normalise all users by subtracting the mean of each user from their ratings.
For user based collaborative filtering, we follow these steps to find the ratings for an unrated movie for a given user:
- Find the top k similar users for the given user. The similarity is cosine similarity between the movie vectors of two users.
- For a movie, calculate the average of the ratings given by the above top k users.
We used k=15, that is, the top 15 similar users are taken into consideration for calculating the movie ratings.
Rating is predicted using the global mean and the movie deviation in addition to the collaborative filtering value from above.
where
The singular value decomposition decomposes a matrix
We retain 90% of the sum of squares of the diagonal matrix
Similar to SVD, the matrix
where,
-
$C$ has$r$ randomly selected columns of$A$ -
$R$ has$r$ randomly selected rows of$A$ -
$U$ is the pseudo-inverse of the intersection of$C$ and$R$ ($=W$ )
To find the pseudo-inverse of W, we take the SVD decomposition
Now take the pseudo-inverse of
Run python3 util.py
to split the data.
Run python3 cur.py
to create the matrix from csv.
Run python3 main.py
for collaborative and collaborative with baseline.
Run python3 CUR_all.py
for CUR and SVD decompositions.
Check main.html
for the documentation of the code.
Technique | RMSE | Precision on top 4 | Spearman Rank Coefficient | Time taken |
---|---|---|---|---|
Collaborative | 2.33 | 0.998 | 0.935 | 24.03 |
Collaborative with baseline | 0.87 | 0.992 | 0.937 | 24.78 |
SVD | 7.24e-15 | 0.999 | 0.999 | 41.34 |
SVD (90%) | 0.8 | 0.905 | 0.999 | 38.25 |
CUR | 0.65 | 0.997 | 0.999 | 20.65 |
CUR (90%) | 3.72 | 0.998 | 0.993 | 20.85 |
Technique | RMSE | Precision on top 4 | Spearman Rank Coefficient |
---|---|---|---|
Collaborative | 2.51 | 0.85 | 0.916 |
Collaborative with baseline | 0.939 | 0.95 | 0.923 |
SVD | 3.75 | 0.978 | 0.999 |
SVD (90%) | 3.66 | 0.983 | 0.999 |
CUR | 3.81 | 0.973 | 0.999 |
CUR (90%) | 3.76 | 0.984 | 0.993 |
Names | ID |
---|---|
Sankalp Kulkarni | 2020A7PS1097H |
Aaditya Rathi | 2020A7PS2191H |
Akshat Oke | 2020A7PS0284H |
Rishi Podda | 2020A7PS1195H |