# Recommender Systems
A recommender system is a type of algorithm that suggests items to users based on various types of data. It's widely used in platforms like Netflix, Amazon, YouTube, and Spotify to personalize user experiences.
The goal is to predict and recommend items (products, movies, songs, etc.) that a user is likely to prefer or engage with.
 
## Notation


| **Notation**       | **Description**                                                            | **Python (if any)** |
|--------------------|----------------------------------------------------------------------------|---------------------|
| $r(i,j)$           | scalar; = rating given by user j on movie  i    (if r(i,j) = 1 is defined) |                     |
| $\mathbf{w}^{(j)}$ | vector; parameters for user j                                              |                     |
| $b^{(j)}$          | scalar; parameter for user j                                               |                     |
| $\mathbf{x}^{(i)}$ | vector; feature ratings for movie i                                        |                     |     
| $n_u$              | number of users                                                            | num_users           |
| $n_m$              | number of movies                                                           | num_movies          |
| $n$                | number of features                                                         | num_features        |
| $\mathbf{X}$       | matrix of vectors $\mathbf{x}^{(i)}$                                       | X                   |
| $\mathbf{W}$       | matrix of vectors $\mathbf{w}^{(j)}$                                       | W                   |
| $\mathbf{b}$       | vector of bias parameters $b^{(j)}$                                        | b                   |
| $\mathbf{R}$       | matrix of elements $r(i,j)$                                                | R                   |



1. load the movie ratings
2. build cost function without regularization
3. add regularization to the cost function -> to avoid overfitting  
3. 
4. 
5. 
6. Compute the collaborative filtering objective function
2. use tensorflow to custom training loop to learn the parameters
3. 



In [1]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from recsys_utils import *

In [8]:
#Load data
X, W, b, num_movies, num_features, num_users = load_precalc_params_small()
Y, R = load_ratings_small()

print("Y", Y.shape, "R", R.shape)
print("X", X.shape)
print("W", W.shape)
print("b", b.shape)
print("num_features", num_features)
print("num_movies",   num_movies)
print("num_users",    num_users)

Y (4778, 443) R (4778, 443)
X (4778, 10)
W (443, 10)
b (1, 443)
num_features 10
num_movies 4778
num_users 443


In [9]:
#  From the matrix, we can compute statistics like average rating.
tsmean =  np.mean(Y[0, R[0, :].astype(bool)])
print(f"Average rating for movie 1 : {tsmean:0.3f} / 5" )

Average rating for movie 1 : 3.400 / 5


In [20]:

def cofi_cost_func(X, W, b, Y, R, lambda_):
    """
    Returns the cost for the content-based filtering
    Args:
      X (ndarray (num_movies,num_features)): matrix of item features
      W (ndarray (num_users,num_features)) : matrix of user parameters
      b (ndarray (1, num_users)            : vector of user parameters
      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
      R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
      lambda_ (float): regularization parameter
    Returns:
      J (float) : Cost
    """
    
    # Extracts the number of users and movies from Y
    nm, nu = Y.shape
    J = 0
    for j in range(nu):
        # retrieve the feature_vector and the bias from each user
        w = W[j,:]
        b_j = b[0,j]
        
        for i in range(nm):
            # for each movie, gets the feature_vector, the actual rating from user j  to movie i, and r, which is 1 if that rating exists, or 0 otherwise
            x = X[i,:]
            y = Y[i,j]
            r = R[i,j]
            # cost calculation: 
            # - Calculate the prediction =  np.dot(w, x) + b_j
            # - calculate the error: prediction - actual = (np.dot(w, x) + b_j - y)
            # - Multiplies it by r so that only existing ratings are counted
            # - Squares the error and adds it to the cost
            J += np.square(r * (np.dot(w,x) + b_j - y ) )
            
    # Regularization Term:
    # - dds regularization to penalize large values in W and X, This helps prevent overfitting
    
     # Divide the entire cost by 2
    J = J/2
    J += (lambda_/2) * (np.sum(np.square(W)) + np.sum(np.square(X)))
        
    return J
        

In [21]:
# Reduce the data set size so that this runs faster
num_users_r = 4
num_movies_r = 5 
num_features_r = 3

X_r = X[:num_movies_r, :num_features_r]
W_r = W[:num_users_r,  :num_features_r]
b_r = b[0, :num_users_r].reshape(1,-1)
Y_r = Y[:num_movies_r, :num_users_r]
R_r = R[:num_movies_r, :num_users_r]

# Evaluate cost function
J = cofi_cost_func(X_r, W_r, b_r, Y_r, R_r, 0);
print(f"Cost: {J:0.2f}")

Cost: 13.67


**Expected Output (lambda = 0)**:  
$13.67$.

In [22]:
# Evaluate cost function with regularization

# print(f"X_r={X_r}")
# print(f"W_r={W_r}")
# print(f"b_r={b_r}")
# print(f"Y_r={Y_r}")
# print(f"R_r={R_r}")
# print(f"lambda_=1.5")

J = cofi_cost_func(X_r, W_r, b_r, Y_r, R_r, 1.5);

print(f"Cost (with regularization): {J:0.2f}")

Cost (with regularization): 28.09


**Expected Output**:

28.09

In [None]:
def cofi_cost_func_v(X, W, b, Y, R, lambda_):
    """
    Returns the cost for the content-based filtering
    Vectorized for speed. Uses tensorflow operations to be compatible with custom training loop.
    Args:
      X (ndarray (num_movies,num_features)): matrix of item features
      W (ndarray (num_users,num_features)) : matrix of user parameters
      b (ndarray (1, num_users)            : vector of user parameters
      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
      R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
      lambda_ (float): regularization parameter
    Returns:
      J (float) : Cost
    """
    j = (tf.linalg.matmul(X, tf.transpose(W)) + b - Y)*R
    J = 0.5 * tf.reduce_sum(j**2) + (lambda_/2) * (tf.reduce_sum(X**2) + tf.reduce_sum(W**2))
    return J

In [23]:
movieList, movieList_df = load_Movie_List_pd()

my_ratings = np.zeros(num_movies)          #  Initialize my ratings

# Check the file small_movie_list.csv for id of each movie in our dataset
# For example, Toy Story 3 (2010) has ID 2700, so to rate it "5", you can set
my_ratings[2700] = 5 

#Or suppose you did not enjoy Persuasion (2007), you can set
my_ratings[2609] = 2;

# We have selected a few movies we liked / did not like and the ratings we
# gave are as follows:
my_ratings[929]  = 5   # Lord of the Rings: The Return of the King, The
my_ratings[246]  = 5   # Shrek (2001)
my_ratings[2716] = 3   # Inception
my_ratings[1150] = 5   # Incredibles, The (2004)
my_ratings[382]  = 2   # Amelie (Fabuleux destin d'Amélie Poulain, Le)
my_ratings[366]  = 5   # Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)
my_ratings[622]  = 5   # Harry Potter and the Chamber of Secrets (2002)
my_ratings[988]  = 3   # Eternal Sunshine of the Spotless Mind (2004)
my_ratings[2925] = 1   # Louis Theroux: Law & Disorder (2008)
my_ratings[2937] = 1   # Nothing to Declare (Rien à déclarer)
my_ratings[793]  = 5   # Pirates of the Caribbean: The Curse of the Black Pearl (2003)
my_rated = [i for i in range(len(my_ratings)) if my_ratings[i] > 0]

print('\nNew user ratings:\n')
for i in range(len(my_ratings)):
    if my_ratings[i] > 0 :
        print(f'Rated {my_ratings[i]} for  {movieList_df.loc[i,"title"]}');


New user ratings:

Rated 5.0 for  Shrek (2001)
Rated 5.0 for  Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)
Rated 2.0 for  Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001)
Rated 5.0 for  Harry Potter and the Chamber of Secrets (2002)
Rated 5.0 for  Pirates of the Caribbean: The Curse of the Black Pearl (2003)
Rated 5.0 for  Lord of the Rings: The Return of the King, The (2003)
Rated 3.0 for  Eternal Sunshine of the Spotless Mind (2004)
Rated 5.0 for  Incredibles, The (2004)
Rated 2.0 for  Persuasion (2007)
Rated 5.0 for  Toy Story 3 (2010)
Rated 3.0 for  Inception (2010)
Rated 1.0 for  Louis Theroux: Law & Disorder (2008)
Rated 1.0 for  Nothing to Declare (Rien à déclarer) (2010)


In [24]:
#Add my personal reviews and normalize the ratings
# Reload ratings
Y, R = load_ratings_small()

# Add new user ratings to Y 
Y = np.c_[my_ratings, Y]

# Add new user indicator matrix to R
R = np.c_[(my_ratings != 0).astype(int), R]

# Normalize the Dataset
Ynorm, Ymean = normalizeRatings(Y, R)

In [25]:
# prepare to train the model. Initialize the parameters and select the Adam optimizer.
#  Useful Values
num_movies, num_users = Y.shape
num_features = 100

# Set Initial Parameters (W, X), use tf.Variable to track these variables
tf.random.set_seed(1234) # for consistent results
W = tf.Variable(tf.random.normal((num_users,  num_features),dtype=tf.float64),  name='W')
X = tf.Variable(tf.random.normal((num_movies, num_features),dtype=tf.float64),  name='X')
b = tf.Variable(tf.random.normal((1,          num_users),   dtype=tf.float64),  name='b')

# Instantiate an optimizer.
optimizer = keras.optimizers.Adam(learning_rate=1e-1)

Now lets  train the collaborative filtering model. This will learn the parameters x, w and b

The operations involved in learning w, b and x simultaneously do not fall into the typical 'layers' offered in the TensorFlow neural network package.
Consequently, the flow Model, Compile(), Fit(), Predict(), are not directly applicable. Instead, we can use a custom training loop.

The steps of gradient descent are as follows:

repeat until convergence:
- compute forward pass
- compute the derivatives of the loss relative to parameters
- update the parameters using the learning rate and the computed derivatives

TensorFlow has the marvelous capability of calculating the derivatives for you. This is shown below. Within the tf.GradientTape() section, operations on Tensorflow Variables are tracked. When tape.gradient() is later called, it will return the gradient of the loss relative to the tracked variables. The gradients can then be applied to the parameters using an optimizer. This is a very brief introduction to a useful feature of TensorFlow and other machine learning frameworks. Further information can be found by investigating "custom training loops" within the framework of interest.

In [None]:
iterations = 200
lambda_ = 1
for iter in range(iterations):
    # Use TensorFlow’s GradientTape
    # to record the operations used to compute the cost 
    with tf.GradientTape() as tape:

        # Compute the cost (forward pass included in cost)
        cost_value = cofi_cost_func_v(X, W, b, Ynorm, R, lambda_)

    # Use the gradient tape to automatically retrieve
    # the gradients of the trainable variables with respect to the loss
    grads = tape.gradient( cost_value, [X,W,b] )

    # Run one step of gradient descent by updating
    # the value of the variables to minimize the loss.
    optimizer.apply_gradients( zip(grads, [X,W,b]) )

    # Log periodically.
    if iter % 20 == 0:
        print(f"Training loss at iteration {iter}: {cost_value:0.1f}")

In [None]:
# Recomendations 
# Below, we compute the ratings for all the movies and users and display the movies that are recommended. These are based on the movies and ratings entered as my_ratings[] above. To predict the rating of movie  𝑖 for user  𝑗  you compute  𝐰(𝑗)⋅𝐱(𝑖)+𝑏(𝑗)This can be computed for all ratings using matrix multiplication.

# Make a prediction using trained weights and biases
p = np.matmul(X.numpy(), np.transpose(W.numpy())) + b.numpy()

#restore the mean
pm = p + Ymean

my_predictions = pm[:,0]

# sort predictions
ix = tf.argsort(my_predictions, direction='DESCENDING')

for i in range(17):
    j = ix[i]
    if j not in my_rated:
        print(f'Predicting rating {my_predictions[j]:0.2f} for movie {movieList[j]}')

print('\n\nOriginal vs Predicted ratings:\n')
for i in range(len(my_ratings)):
    if my_ratings[i] > 0:
        print(f'Original {my_ratings[i]}, Predicted {my_predictions[i]:0.2f} for {movieList[i]}')

In [None]:
# In practice, additional information can be utilized to enhance our predictions. Above, the predicted ratings for the first few hundred movies lie in a small range. We can augment the above by selecting from those top movies, movie that have high average ratings and movies with more than 20 ratings. This section uses a Pandas data frame which has many handy sorting features.
filter=(movieList_df["number of ratings"] > 20)
movieList_df["pred"] = my_predictions
movieList_df = movieList_df.reindex(columns=["pred", "mean rating", "number of ratings", "title"])
movieList_df.loc[ix[:300]].loc[filter].sort_values("mean rating", ascending=False)