In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
import utilities # codeTimer context manager and saving/loading utilities.
import data_preparation # Load dataset and build required matrices.
import factorisation # WALS factorisation.
import recommender # Recommender system.

## Loading dataset and creating recommender system

In [3]:
%load_ext autoreload
%autoreload 2

In [None]:
np.random.seed(17)

mov, rat, rat_test = data_preparation.importDataset()
k = 100
rec = recommender.recommenderSystem(mov, rat, rat_test, k)

  ratings_df_test = ratings_df_test.append(ratings_df[ratings_df


The dataframe contains 610 users and 9721 items.


A pre-trained recommander system can be loaded using the following cell. The pre-trained system has been trained using the whole dataset with $k = 100$ latent factors and 10 iterations.

In order to save disk space, the saved system consists only of the item and user embedding matrices. This requires the user to call the previous cell, building the remaining components of the system. 

In [None]:
utilities.loadRecSys(rec, "rec.npz")

## Similar items
Some suggestions:
* 911: Star Wars Episode VI
* 3638: The Lord of the Rings: The Fellowship of the Ring
* 957: The Shining
* 474: Blade Runner

In [17]:
rec.suggestSimilar(3638)

Unnamed: 0,MovieID,Title,Genres,Similarity
3638,3638,"Lord of the Rings: The Fellowship of the Ring,...",Adventure|Fantasy,1.0
4137,4137,"Lord of the Rings: The Two Towers, The (2002)",Adventure|Fantasy,0.772991
4800,4800,"Lord of the Rings: The Return of the King, The...",Action|Adventure|Drama|Fantasy,0.748294
3568,3568,"Monsters, Inc. (2001)",Adventure|Animation|Children|Comedy|Fantasy,0.748244
659,659,"Godfather, The (1972)",Crime|Drama,0.722716
3194,3194,Shrek (2001),Adventure|Animation|Children|Comedy|Fantasy|Ro...,0.721707
4360,4360,Finding Nemo (2003),Adventure|Animation|Children|Comedy,0.691656
277,277,"Shawshank Redemption, The (1994)",Crime|Drama,0.688652
2078,2078,"Sixth Sense, The (1999)",Drama|Horror|Mystery,0.683175
3141,3141,Memento (2000),Mystery|Thriller,0.677524


## Assessing results

The following two cells may take a few minutes to run. By default there are 10 test observations. If this is the case, the mean precision and the mean recall at 10 are expected to be the same.

In [18]:
with utilities.codeTimer("Mean precision"):
    print("Mean precision at 10: {}".format(rec.meanPrecision(10)))

Mean precision at 10: 0.027823240589198002
Executed 'Mean precision'.  Elapsed time: 32.983721s


## New user recommendation

In [20]:
np.random.seed(17)

new_user, new_user_id = rec.generateNewUser(50)
np.shape(rec.R)

(611, 9721)

In [21]:
new_user_id
reg_lambda = 0.15

In [22]:
with utilities.codeTimer("New user factorisation"):
    rec.addNewUser(new_user, reg_lambda)
np.shape(rec.R)

  R_df = R_df.append(new_df,


Executed 'New user factorisation'.  Elapsed time: 0.358114s


(612, 9721)

In [23]:
recommend(rec, new_user_id).head(10)

Unnamed: 0,MovieID,Prediction,Title,Genres,AVG_Rating
508,511,8.89,Snow White and the Seven Dwarfs (1937),Animation|Children|Drama|Fantasy|Musical,3.515385
3548,3607,7.41,Ocean's Eleven (a.k.a. Ocean's 11) (1960),Comedy|Crime,3.7
1360,1375,7.37,Fear and Loathing in Las Vegas (1998),Adventure|Comedy|Drama,3.944444
156,157,7.33,Nine Months (1995),Comedy|Romance,2.822917
3622,3682,7.22,Metropolis (2001),Animation|Sci-Fi,3.961538
953,960,7.18,Evil Dead II (Dead by Dawn) (1987),Action|Comedy|Fantasy|Horror,4.044118
190,192,6.85,Disclosure (1994),Drama|Thriller,3.538462
2243,2265,6.64,"Insider, The (1999)",Drama|Thriller,3.7
1470,1485,6.62,Metropolis (1927),Drama|Sci-Fi,3.857143
2530,2558,6.59,Do the Right Thing (1989),Drama,4.038462


In [24]:
bestRated(rec, new_user_id).head(10)

Unnamed: 0,MovieID,UserID,Genres,Title,Rating
94806,3787,611,Comedy|Romance,"Sweetest Thing, The (2002)",5.0
94794,1365,611,Comedy,Major League: Back to the Minors (1998),5.0
94809,4302,611,Crime|Drama,Better Luck Tomorrow (2002),5.0
94829,8361,611,Adventure|Comedy|Fantasy,Knights of Badassdom (2013),5.0
94815,5121,611,Action|Adventure|Romance,Captain Blood (1935),5.0
94790,1203,611,Crime|Drama|Film-Noir,Hoodlum (1997),5.0
94823,6911,611,Animation|Comedy,Igor (2008),5.0
94799,2454,611,Action|Crime|Thriller|Western,"Mariachi, El (1992)",4.5
94800,2672,611,Action|Horror|Sci-Fi,"Hidden, The (1987)",4.5
94827,8143,611,Children|Horror|Sci-Fi,Yongary: Monster from the Deep (1967),4.5


## Cold start problem
If a new user has rated less than 10 movies, the most popular and unseen movies will be recommended.

In [25]:
np.random.seed(17)

new_user, new_user_id = rec.generateNewUser(8)
np.shape(rec.R)

with utilities.codeTimer("New user factorisation"):
    rec.addNewUser(new_user, reg_lambda)

  R_df = R_df.append(new_df,


Executed 'New user factorisation'.  Elapsed time: 0.339126s


In [26]:
recommend(rec, new_user_id).head(10)

Too few movies! Most poular movies will be suggested.


Unnamed: 0,MovieID,Title,Genres,AVG_Rating,Counts
314,314,Forrest Gump (1994),Comedy|Drama|Romance|War,4.173913,322
277,277,"Shawshank Redemption, The (1994)",Crime|Drama,4.431746,315
257,257,Pulp Fiction (1994),Comedy|Crime|Drama|Thriller,4.197068,307
1933,1939,"Matrix, The (1999)",Action|Sci-Fi|Thriller,4.18251,263
224,224,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Sci-Fi,4.231076,251
97,97,Braveheart (1995),Action|Drama|War,4.031646,237
509,510,"Silence of the Lambs, The (1991)",Crime|Horror|Thriller,4.146552,232
418,418,Jurassic Park (1993),Action|Adventure|Sci-Fi|Thriller,3.742009,219
0,0,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,3.92093,215
2216,2226,Fight Club (1999),Action|Crime|Drama|Thriller,4.258216,213
