## Collaborative Filtering Recommender Systems

### Packages


In [1]:
import numpy as np
import tensorflow as tf
import pandas as pd
from tensorflow import keras

## Movie ratings dataset

The dataset has $n_u= 443$ users, and $n_m= 4778$ movies (movies from the years since 2000).

Below, we will load the movie dataset into variables $Y$ and $R$.

The matrix $Y$ ($n_m X n_u$) stores ratings $y^{(i, j)}$. The matrix $R$ ($n_m X n_u$) stores binary-valued indicator matrix, where $R(i, j) = 1$ 
if user $j$ has rated movie $i$, and $R(i, j) = 0$ otherwise.

Throughout this notebook we will work with $X$, $W$ and $b$. The i-th row of $X$ corresponds vector feature $x^{(i)}$ of i-th movie, and j-th row of $W$ corresponds to parameter vectors $w^{(j)}$ of j-th user


We will start by loading the data set into Y and R with movie dataset.

We'll also load $X, W$ and $b$ with pre-computed values. These values will be learnt later in the notebook, but we will use pre-computed values to devleop the model


In [2]:
def load_precomputed_params(): 
    # Read the file as binary mode
    file = open('./data/small_movies_X.csv', 'rb')
    X = np.loadtxt(file, delimiter=',')
    
    file = open('./data/small_movies_W.csv', mode='rb')
    W = np.loadtxt(file, delimiter=',')
    
    file = open('./data/small_movies_b.csv')
    b = np.loadtxt(file, delimiter=',')
    b = b.reshape(-1, 1)
    
    num_movies, num_features = X.shape
    num_users, _ = W.shape
    
    return X, W, b, num_movies, num_features, num_users

def load_ratings(): 
    file = open('./data/small_movies_R.csv')
    R = np.loadtxt(file, delimiter=',')
    
    file = open('./data/small_movies_Y.csv')
    Y = np.loadtxt(file, delimiter=',')
    return Y, R
    
    
X, W, b, num_movies, num_features, num_users = load_precomputed_params()
Y, R = load_ratings()

print('Y: ', Y.shape, 'R: ', R.shape)

print("X: ", X.shape)
print("W: ", W.shape)
print("b: ", b.shape)

print("num_movies: ", num_movies)
print("num_features: ", num_features)
print("num_users: ", num_users)


Y:  (4778, 443) R:  (4778, 443)
X:  (4778, 10)
W:  (443, 10)
b:  (443, 1)
num_movies:  4778
num_features:  10
num_users:  443


In [3]:
# From the matrix, we can compute statistics like average rating.
tsmean = np.mean(Y[0, R[0, :].astype(bool)])

print(f"Average rating for movie 1: {tsmean:0.3f}/5")

Average rating for movie 1: 3.400/5


## Collaborative filtering learning algorithm
### Collaborative filtering cost function

The collaborative filtering cost function is given by
$$J({\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)},\mathbf{w}^{(0)},b^{(0)},...,\mathbf{w}^{(n_u-1)},b^{(n_u-1)}})= \left[ \frac{1}{2}\sum_{(i,j):r(i,j)=1}(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2 \right]
+ \underbrace{\left[
\frac{\lambda}{2}
\sum_{j=0}^{n_u-1}\sum_{k=0}^{n-1}(\mathbf{w}^{(j)}_k)^2
+ \frac{\lambda}{2}\sum_{i=0}^{n_m-1}\sum_{k=0}^{n-1}(\mathbf{x}_k^{(i)})^2
\right]}_{regularization}
\tag{1}$$
The first summation in (1) is "for all $i$, $j$ where $r(i,j)$ equals $1$" and could be written:

$$
= \left[ \frac{1}{2}\sum_{j=0}^{n_u-1} \sum_{i=0}^{n_m-1}r(i,j)*(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2 \right]
+\text{regularization}
$$

In [4]:
def cofi_cost_func_v(X, W, b, Y, R, lambda_): 
    '''
    Args: 
        X (ndarray (num_movies, num_features)): matrix of item features
        W (ndarray (num_users, num_features)) : matrix of user parameters
        b (ndarray (1, num_users))            : matrix of user parameters
        Y (ndarray (num_movies, num_users))   : matrix of user ratings of movies
        R (ndarray (num_movies, num_users))   : matrix, where R(i, j) = 1 if the user j-th has rated the i-th movie
        lambda_ (float): regulariztion parameter
    Returns: 
        J (float): cost
    '''
    j = (tf.linalg.matmul(X, tf.transpose(W)) + b - Y) * R
    J = 0.5 * tf.reduce_sum(j ** 2) + lambda_/2 * (tf.reduce_sum(W ** 2) + tf.reduce_sum(X ** 2))
    return J

In [5]:
# Reduce the dataset size to test the function run faster
num_users_r = 4
num_movies_r = 5
num_features_r = 3

X_r = X[:num_movies_r, :num_features_r]
W_r = W[:num_users_r, :num_features_r]
b_r = b[:num_users_r, 0].reshape(1, -1)
Y_r = Y[:num_movies_r, :num_users_r]
R_r = R[:num_movies_r, :num_users_r]

# Evaluate cost function
J = cofi_cost_func_v(X_r, W_r, b_r, Y_r, R_r, 0);
print(f"Cost: {J:0.2f}")


Cost: 13.67


In [6]:
# Evaluate cost function with regularization 
J = cofi_cost_func_v(X_r, W_r, b_r, Y_r, R_r, 1.5);
print(f"Cost (with regularization): {J:0.2f}")

Cost (with regularization): 28.09


## Get movie name list

In [7]:
def load_Movie_List_pd(): 
    df = pd.read_csv('./data/small_movie_list.csv', delimiter=',')
    movie_list = df['title'].to_list()
    return movie_list,df

## Try adding a user rating

In [8]:
movieList, moveList_df = load_Movie_List_pd()
my_ratings = np.zeros(num_movies)
# Randomly pick some movie and give rating
my_ratings[2700] = 5 
my_ratings[2609] = 2
my_ratings[929]  = 5   
my_ratings[246]  = 5   
my_ratings[2716] = 3   
my_ratings[1150] = 5   
my_ratings[382]  = 2   
my_ratings[366]  = 5   
my_ratings[622]  = 5   
my_ratings[988]  = 3   
my_ratings[2925] = 1   
my_ratings[2937] = 1   
my_ratings[793]  = 5 
my_rated = [i for i in range(len(my_ratings)) if my_ratings[i] > 0]
print('my_rated index: ', my_rated)

print('Y and R before adding new user rating', Y.shape, R.shape)

# Add my rating to the data
Y = np.c_[ my_ratings, Y]
R = np.c_[(my_ratings != 0).astype(int), R]

print('Y and R after new user rating', Y.shape, R.shape)

my_rated index:  [246, 366, 382, 622, 793, 929, 988, 1150, 2609, 2700, 2716, 2925, 2937]
Y and R before adding new user rating (4778, 443) (4778, 443)
Y and R after new user rating (4778, 444) (4778, 444)


In [9]:
moveList_df

Unnamed: 0.1,Unnamed: 0,mean rating,number of ratings,title
0,0,3.400000,5,"Yards, The (2000)"
1,1,3.250000,6,Next Friday (2000)
2,2,2.000000,4,Supernova (2000)
3,3,2.000000,4,Down to You (2000)
4,4,2.672414,29,Scream 3 (2000)
...,...,...,...,...
4773,4773,3.500000,1,Jon Stewart Has Left the Building (2015)
4774,4774,4.000000,1,Black Butler: Book of the Atlantic (2017)
4775,4775,3.500000,1,No Game No Life: Zero (2017)
4776,4776,3.500000,1,Flint (2017)


In [10]:
test = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
testa =np.array([[0, 1, 0, 1], [1, 0, 1, 0]])
test * testa

array([[0, 2, 0, 4],
       [5, 0, 7, 0]])

## Normalizing data

In [11]:

def normalizeRatings(Y: np.ndarray, R: np.ndarray): 
    Ymean = (np.sum(Y * R, axis=1)/(np.sum(R, axis=1)+1e-12)).reshape(-1, 1)
    Ynorm = Y - R * Ymean
    return Ynorm, Ymean

# Normalize the Dataset
Ynorm, Ymean =normalizeRatings(Y, R)
print(Ynorm.shape, Ymean.shape)
print(np.mean(Ynorm[0]))

(4778, 444) (4778, 1)
7.657538267144322e-15


Let's prepare to train the model. Initialize the parameters and select Adam optimizer.

In [12]:
# Useful variables
num_movies, num_users = Y.shape
num_features = 100

# Set initial Parameters (W, X), use tf.Variable to trach these variables
W = tf.Variable(tf.random.normal((num_users, num_features), dtype=tf.float64))
X = tf.Variable(tf.random.normal((num_movies, num_features), dtype=tf.float64))
b = tf.Variable(tf.random.normal((1,          num_users), dtype=tf.float64))

# Instantiate an optimizer.
optimizer = keras.optimizers.Adam(learning_rate=1e-1)

The operations involved in learning $w, b$ and $x$ simultaneously do not fall into the typical 'layers' offered in TensorFlow neural network package. Consequently, the flow used: Model, Compile(), Fit(), Predict(), are not directly applicable. Instead, we can use a custom training loop.

Recall from earlier labs the steps of gradient descent.
- repeat until convergence:
    - compute forward pass
    - compute the derivatives of the loss relative to parameters
    - update the parameters using the learning rate and the computed derivatives 
    
 Within the `tf.GradientTape()` section, operations on Tensorflow Variables are tracked. When `tape.gradient()` is later called, it will return the gradient of the loss relative to the tracked variables. The gradients can then be applied to the parameters using an optimizer. 
This is a very brief introduction to a useful feature of TensorFlow and other machine learning frameworks. Further information can be found by investigating "custom training loops" within the framework of interest.

In [13]:
print(X.shape, W.shape, b.shape, Ynorm.shape, R.shape)

(4778, 100) (444, 100) (1, 444) (4778, 444) (4778, 444)


In [14]:
iterations = 200
lambda_ = 1
for iter in range(iterations):
    # Use TensorFlow’s GradientTape
    # to record the operations used to compute the cost 
    with tf.GradientTape() as tape:

        # Compute the cost (forward pass included in cost)
        cost_value = cofi_cost_func_v(X, W, b, Ynorm, R, lambda_)

    # Use the gradient tape to automatically retrieve
    # the gradients of the trainable variables with respect to the loss
    grads = tape.gradient( cost_value, [X,W,b] )

    # Run one step of gradient descent by updating
    # the value of the variables to minimize the loss.
    optimizer.apply_gradients( zip(grads, [X,W,b]) )

    # Log periodically.
    if iter % 20 == 0:
        print(f"Training loss at iteration {iter}: {cost_value:0.1f}")

Training loss at iteration 0: 2267241.4


Training loss at iteration 20: 131608.7
Training loss at iteration 40: 49397.5
Training loss at iteration 60: 23294.4
Training loss at iteration 80: 12894.0
Training loss at iteration 100: 8052.6
Training loss at iteration 120: 5535.7
Training loss at iteration 140: 4132.5
Training loss at iteration 160: 3312.4
Training loss at iteration 180: 2815.0


## Recommendations
Below, we compute the ratings for all the movies and users and display the movies that are recommended. These are based on the movies and ratings entered as `my_ratings[]` above. To predict the rating for user `i` for user `j`, you compute $w^{(j)} . x^{(i)} + b^{(j)} $. This can be computed for all ratings using matrix multiplication.

In [15]:
b.shape

TensorShape([1, 444])

In [16]:
# Make a prediction using trained weights and biases
p = np.matmul(X.numpy(), np.transpose(W.numpy())) + b.numpy()

# restore the mean
pm = p + Ymean
my_predictions = pm[:, 0]

# sort predictions
ix = tf.argsort(my_predictions, direction='DESCENDING')

for i in range(18): 
    j = ix[i]
    if j not in my_rated: 
        print(f'Predicting rating {my_predictions[j]:0.2f} for movie {movieList[j]}')

print('\nOriginal vs Predicted ratings: ')

for i in range(len(my_ratings)): 
    if my_ratings[i] > 0: 
        print(f'Original {my_ratings[i]}, Predicted {my_predictions[i]:0.2f} for {movieList[i]}')

Predicting rating 4.70 for movie Lord of the Rings: The Two Towers, The (2002)
Predicting rating 4.66 for movie Harry Potter and the Prisoner of Azkaban (2004)
Predicting rating 4.63 for movie Colourful (Karafuru) (2010)
Predicting rating 4.58 for movie Delirium (2014)
Predicting rating 4.58 for movie One I Love, The (2014)
Predicting rating 4.58 for movie Laggies (2014)
Predicting rating 4.57 for movie Deathgasm (2015)
Predicting rating 4.54 for movie Strictly Sexual (2008)
Predicting rating 4.54 for movie Lost in Translation (2003)
Predicting rating 4.54 for movie Kung Fu Panda: Secrets of the Masters (2011)
Predicting rating 4.54 for movie Won't You Be My Neighbor? (2018)

Original vs Predicted ratings: 
Original 5.0, Predicted 4.91 for Shrek (2001)
Original 5.0, Predicted 4.89 for Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)
Original 2.0, Predicted 2.22 for Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001)
Original 5.0, Predicted

In [17]:
ix.shape

TensorShape([4778])

In [18]:
filter = moveList_df['number of ratings'] > 20
moveList_df['pred'] = my_predictions
moveList_df = moveList_df.reindex(columns=['pred', 'mean rating', 'number of ratings', 'title'])

moveList_df.loc[ix[:300]].loc[filter].sort_values('mean rating', ascending=False)

Unnamed: 0,pred,mean rating,number of ratings,title
1743,4.128719,4.252336,107,"Departed, The (2006)"
2112,4.510491,4.238255,149,"Dark Knight, The (2008)"
929,4.893142,4.118919,185,"Lord of the Rings: The Return of the King, The..."
2700,4.873835,4.109091,55,Toy Story 3 (2010)
848,4.543516,4.033784,74,Lost in Translation (2003)
2608,4.068852,4.022388,67,Shutter Island (2010)
653,4.703979,4.021277,188,"Lord of the Rings: The Two Towers, The (2002)"
2420,4.395332,4.004762,105,Up (2009)
3083,4.106094,3.993421,76,"Dark Knight Rises, The (2012)"
1051,4.655351,3.913978,93,Harry Potter and the Prisoner of Azkaban (2004)
