# Collaborative Filtering Recommender Systems
- We will implement collaborative filtering to build a recommender system for movies.

### Packages
- Will use the familiar NumPy and Tensorflow Packages

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras




![image.png](attachment:image.png)

## Recommender Systems
- Will implement the collaborative filtering learning algorithm and apply it to a dataset of movie ratings. 
- The goal of a collaborative filtering recommenders system is to generate 2 vectors :
    1. For each user, a 'parameter vector' that embodies the movie tastes of a user.
    2. For each movie, a feature vector of the same size which embodies some decription of the movie.
- The dot product of 2 vectors plus the bias term should produe an estimate of the rating the user might give to that movie.
    ![image.png](attachment:image.png)
- Each user has a parameter vector w^user and bias. Each movie has a feature vector x^movie. These vectors are simultaneously learned by using the existing user/movie ratings as training data. It is worth noting that the feature vector x^movie must satisfy all the users while the user vector w^user must satisfy all the movies. This is the source of the name of this approach -  all the users collaborate to generate the rating set.
      ![image-2.png](attachment:image-2.png)
- Once the feature vector and parameters are learned, they can be used to predict how a user might rate an unrated movie. 
- Will implement the function cofiCostFunc that computes the collaborative filtering objective function.
- After implementing the objective function, we will use a TensorFlow custom training loop to learn the parameters for collaborative filtering. 

### Movie rating dataset
- This dataset consists of ratings on scale of 0.5 to 5 in 0.5 step increments. The dataset gas n_u = 443 users and n_m = 4778 movies.
- We will load the movie dataset into the variabls Y and R.
- The matrix Y (n_m x n_u matrix) stores the ratings y^(i, j). The matrix R is an binary-valued indicator matrix, where R(i, j) = 1 if user j gave a rating to movie i, and R(i, j) = 0 otherwise.
- We will work with the matrices, X, W and b:
    ![image.png](attachment:image.png)
- The ith row of X corresponds to the feature vector x^(i) for the ith movie, and the jth of W corresponds to one parameter vector w^(j), for the jth user. Both x^i and w^j are n-dimensinal vecctors. We will use n = 10, and therefore x^i and w^j have 10 elements. Correspondingly, X is a n_M x 10 matrix and W is a n_u x 10 matrix.

In [2]:
def load_precalc_params_small():
    file = open('small_movies_X.csv', 'rb')
    X =  np.loadtxt(file, delimiter = ",")

    file = open('small_movies_W.csv', 'rb')
    W = np.loadtxt(file,delimiter = ",")

    file = open('small_movies_b.csv', 'rb')
    b = np.loadtxt(file,delimiter = ",")
    b = b.reshape(1, -1)
    
    num_movies, num_features = X.shape
    num_users, _ = W.shape
    
    return (X, W, b, num_movies, num_features, num_users)

In [3]:
def load_rating_small():
    file = open('small_movies_Y.csv', 'rb')
    Y = np.loadtxt(file, delimiter=',')
    
    file = open('small_movies_R.csv', 'rb')
    R = np.loadtxt(file, delimiter=',')
    
    return (Y,R)

In [4]:
# Load data 
X, W, b, num_movies, num_features, num_users = load_precalc_params_small()
Y, R = load_rating_small()

print("Y", Y.shape, "R", R.shape)
print("X", X.shape)
print("W", W.shape)
print("b", b.shape)
print("num_features", num_features)
print("num_movies",   num_movies)
print("num_users",    num_users)

Y (4778, 443) R (4778, 443)
X (4778, 10)
W (443, 10)
b (1, 443)
num_features 10
num_movies 4778
num_users 443


In [5]:
# From the matrix, we can compute statistics like average rating.
tsmean = np.mean(Y[0, R[0, :].astype(bool)])
print(f"Average rating for movie 1 : {tsmean:0.3f} / 5" )

Average rating for movie 1 : 3.400 / 5


### Collaborative filtering learning Algorithm
- We will begin implenmenting the collaborative filtering learning algorithm. We will start by implementing the objective function.
- The collaborative filtering algorithm in the setting of movie recommendations considers a set of n-dimensional parameters vectors x^(0), ...x^(n_m -1), w^(0)...,w^(n_u-1) and b^(0), ...b^(n_u-1), where the model predicts the rating for movie i by user j as y^(i, j) = w^(j).x^(i) + b^(j). Given a dataset taht consists of a set of ratings produced by some users on some movies, we wish to learn the parameter vector x^(0),..x^(n_m-1), w^(0),...w^(n_u-1) and b^(0),..b^(n_u-1) that produces the best fit (minimizes the squared error).
- cofiCostFunc to comput the cost function for collaborative filtering.

#### Collaborative filtering cost function
- The collaborative filtering cost function is given by
    ![image-2.png](attachment:image-2.png)


In [6]:
def cofi_cost_func(X, W, b, Y, R, lambda_):
    '''
    Returns the cost for the content-based filtering
    Args : 
        X (ndarry (num_movies, num_features)) : matrix of item features
        W (ndarray (num_user, num_features)) : matrix of user parameters
        b (ndarray (1, num_users)) : vector of user parameters
        Y (ndarray (num_movies, num_users)) : matrix of user ratings of movies'
        R (ndarray (num_movies, num_users)) : matrix, where R(i, j) = 1 if the ith movies was rated by the jth user.
        lambda_ (float) : Regularization parameter
    Returns : 
        J (float) : Cost
    '''
    
    nm, nu = Y.shape
    J = 0
    
    for j in range(nu):
        w = W[j, :]
        b_j = b[0, j]
        for i in range(nm):
            x = X[i, :]
            y = Y[i, j]
            r = R[i, j]
            J += np.square(r*(np.dot(w, x) + b_j - y))
        J = J/2
        J += (lambda_/2) * (np.sum(np.square(W)) + np.sum(np.square(X)))
        
        return J

In [7]:
# Reduce the data set size so that runs fast

num_users_r = 4
num_movies_r = 5
num_features_r = 3

X_r = X[:num_movies_r, : num_features_r]
W_r = W[:num_users_r, :num_features_r]
b_r = b[0, :num_users_r].reshape(1, -1)
Y_r = Y[:num_movies_r, :num_users_r]
R_r = R[:num_movies_r, :num_users_r]

# Evaluet the cost function
# Evaluate cost function
J = cofi_cost_func(X_r, W_r, b_r, Y_r, R_r, 0);
print(f"Cost: {J:0.2f}")

# Evaluate cost function with regularization 
J = cofi_cost_func(X_r, W_r, b_r, Y_r, R_r, 1.5);
print(f"Cost (with regularization): {J:0.2f}")

Cost: 13.67
Cost (with regularization): 28.09


#### Vector implementation

In [8]:
def cofi_cost_func_v(X, W, b, Y, R, lambda_):
    """
    Returns the cost for the content-based filtering vectorized for speed. Uses tensorflow 
    operations to be compatible with custom training loop.
    Args :
        X (ndarray (num_movies, num_features)) : Matrix of item features
        W (ndarray (num_users, num_features)) : Matrix of user parameters
        b (ndarray (1, num_users)) : Vector of user parameter
        Y (ndarray (num_movies, num_users)) : Matri of user ratings of movies
        R (ndarray (num_movies, num_users)) : Matrix, where R(i, j) = 1 if the ith movies was rated by jth user.
        lambda_ (float) : Regularization parameter
    Returns :
        J (float) : Cost
    """
    
    j = R * (tf.linalg.matmul(X, tf.transpose(W)) + b - Y)
    J = 0.5 * tf.reduce_sum(j**2) + (lambda_/2) * (tf.reduce_sum(X**2) + tf.reduce_sum(W**2))
    return J

In [9]:
# Evaluate cost function
J = cofi_cost_func_v(X_r, W_r, b_r, Y_r, R_r, 0);
print(f"Cost: {J:0.2f}")

# Evaluate cost function with regularization 
J = cofi_cost_func_v(X_r, W_r, b_r, Y_r, R_r, 1.5);
print(f"Cost (with regularization): {J:0.2f}")

Cost: 13.67
Cost (with regularization): 28.09


### Learning movie recommendations
- After we have finished implementing the collaborative filtering cost function. We can start training our algorithm to make movie recommendations for ourself.

In [10]:
def load_Movie_List_pd():
    '''
    Returns df with and index of movies in the order they are in the Y matrix
    '''
    df = pd.read_csv('small_movie_list.csv', header=0, index_col=0,  delimiter=',', quotechar='"')
    mlist = df["title"].to_list()
    return (mlist, df)

In [11]:
movieList, movieList_df = load_Movie_List_pd()

my_ratings = np.zeros(num_movies) # Initialize my ratings

# Check the file small_movie_list.csv fo rid of each movie in our dataset.
my_ratings[2700] = 5

# Or suppose we did not enjoy persuasian (2007), we can set 
my_ratings[2609] = 2

# We have selected a few movies we liked / did not like and the ratings we
# gave are as follows:
my_ratings[929]  = 5   # Lord of the Rings: The Return of the King, The
my_ratings[246]  = 5   # Shrek (2001)
my_ratings[2716] = 3   # Inception
my_ratings[1150] = 5   # Incredibles, The (2004)
my_ratings[382]  = 2   # Amelie (Fabuleux destin d'Amélie Poulain, Le)
my_ratings[366]  = 5   # Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)
my_ratings[622]  = 5   # Harry Potter and the Chamber of Secrets (2002)
my_ratings[988]  = 3   # Eternal Sunshine of the Spotless Mind (2004)
my_ratings[2925] = 1   # Louis Theroux: Law & Disorder (2008)
my_ratings[2937] = 1   # Nothing to Declare (Rien à déclarer)
my_ratings[793]  = 5   # Pirates of the Caribbean: The Curse of the Black Pearl (2003)

my_ratings = [i for i in range(len(my_ratings)) if my_ratings[i] > 0]

print('\nNew user ratings:\n')
for i in range(len(my_ratings)):
    if my_ratings[i] > 0 :
        print(f'Rated {my_ratings[i]} for  {movieList_df.loc[i,"title"]}');


New user ratings:

Rated 246 for  Yards, The (2000)
Rated 366 for  Next Friday (2000)
Rated 382 for  Supernova (2000)
Rated 622 for  Down to You (2000)
Rated 793 for  Scream 3 (2000)
Rated 929 for  Boondock Saints, The (2000)
Rated 988 for  Gun Shy (2000)
Rated 1150 for  Beach, The (2000)
Rated 2609 for  Snow Day (2000)
Rated 2700 for  Tigger Movie, The (2000)
Rated 2716 for  Boiler Room (2000)
Rated 2925 for  Hanging Up (2000)
Rated 2937 for  Pitch Black (2000)


In [12]:
def load_ratings_small():
    file = open('small_movies_Y.csv', 'rb')
    Y = np.loadtxt(file,delimiter = ",")

    file = open('small_movies_R.csv', 'rb')
    R = np.loadtxt(file,delimiter = ",")
    return(Y,R)


In [14]:
my_ratings

[246, 366, 382, 622, 793, 929, 988, 1150, 2609, 2700, 2716, 2925, 2937]

In [16]:
Y.shape

(4778, 443)

In [13]:
# Reload ratings
Y, R = load_ratings_small()

# Add new user ratings to Y 
Y = np.c_[my_ratings, Y]

# Add new user indicator matrix to R
R = np.c_[(my_ratings != 0).astype(int), R]

# Normalize the Dataset
Ynorm, Ymean = normalizeRatings(Y, R)

ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 13 and the array at index 1 has size 4778

- Let's prepare to train the model, initialize the parameters and select the Adam optimizer

In [42]:
num_movies, num_users = Y.shape
num_features = 100

# Set initial parameters (W, X), use tf.varaible to track these variables
tf.random.set_seed(1234) # For consistent results
W = tf.Variable(tf.random.normal((num_users, num_features), dtype=tf.float64), name='W')
X = tf.Variable(tf.random.normal((num_movies, num_features), dtype=tf.float64), name='X')
b = tf.Variable(tf.random.normal((1, num_users), dtype=tf.float64), name='b')

# Instantiate an optimizer
optimizer = keras.optimizers.Adam(learning_rate=1e-1)

- Let's now train the collaborative filtering model. This will learn the parameters X, W, and b.
- The operations involved in learning w, b, and x simultaneosly do not fall into the typical layers offered in the Tensorflow neural network pacakage. Consequently, the flow Model, Compile(), Fit(), Predict() are not directly applicable. Instead, we can use a custom training loop.
- The steps of gradient desecent
    - Repeat until convergence
        - Compute forward pass
        - Compute the derivatives of the loss relative to parameters
        - Update the prameters using the learning rate and computed derivatives
 - Tensorflow has the marvelous capability of calculating the derivatives for us. Withinin the tf.GradientTape() section, operations on Tensorflow variables are tracked. When tape.gradient() is called, it will return the gradient of the loss relative to the tracked variables. The gradients can then be applied to the parameters using an optimizer.

In [None]:
iterations = 200
lambda_ = 1
for iter in range(iterations):
    # Use TensorFlow's GradientTape
    # To record the operations used to compute the cost
    with tf.GradientTape() as tape:
        # Compute the cost (forward pass included in cost)
        cost_value = cofi_cost_func_v(X, W, b, Y)