 # Outline

 - [1-Packages](#1)
 - [2-Load Dataset](#2)
 - [&nbsp;&nbsp; 2.1-Function to load dataset](#2.1)
 - [&nbsp;&nbsp; 2.2 Load & View Dataset size](#2.2)
 - [3-Cost Function Implementation](#3)
 - [&nbsp;&nbsp;3.1-Cost function using for loop](#3.1)
 - [&nbsp;&nbsp;3.2-Cost Function using vectorization with numpy implementation](#3.2)
 - [&nbsp;&nbsp;3.3-ost Function using vectorization with Tensorflow implementation-1](#3.3)
 - [&nbsp;&nbsp;3.4-ost Function using vectorization with Tensorflow implementation-2](#3.4)

<a name="1"></a>
##  1-Packages <img align="left" src="./images/python-logo.png"     style=" width:40px;   " > 

In [1]:
import warnings

# Ignore all future warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

In [2]:


import numpy as np
from numpy import loadtxt
import pandas as pd
import tensorflow as tf




<a name="2"></a>
#  2- Load dataset <img align="left" src="./images/dataset-logo.png" style="width:50px; ">

<a name='2.1'></a>

### 2.1 Function to load datset

In [3]:
def load_dataset():
    file = open('./data/processed/small_movie_X.csv', 'rb')
    X = loadtxt(file, delimiter=",")
    num_movies, num_features = X.shape
    
    file = open('./data/processed/small_movie_R.csv', 'rb')
    R = loadtxt(file, delimiter=",")
    _,num_users = R.shape
    
    file = open('./data/processed/small_movie_Y.csv', 'rb')
    Y = loadtxt(file, delimiter=",")
    return (X, Y, R, num_movies, num_features, num_users)

def load_movie_list():
    df = pd.read_csv('./data/processed/movie_list_df.csv')
    movie_list = df["title"].to_list()
    return(df, movie_list)
    

<a name="2.2"></a>

### 2.2 Load & View Dataset size

In [4]:
X, Y, R, num_movies, num_features, num_users = load_dataset()

df, movie_list = load_movie_list()

# Initialize parameters
W = np.random.rand(num_users, num_features)
b = np.random.rand(1, num_users)

print("Y", Y.shape, "R", R.shape)
print("X", X.shape)
print("W", W.shape)
print("b", b.shape)
print("num_features", num_features)
print("num_movies",   num_movies)
print("num_users",    num_users)

Y (9724, 610) R (9724, 610)
X (9724, 34)
W (610, 34)
b (1, 610)
num_features 34
num_movies 9724
num_users 610


Average rating for movie 1 : 3.921 / 5


<a name="3"></a>

# 3- Cost Function Implementation

The collaborative filtering algorithm in the setting of movie
recommendations considers a set of $n$-dimensional parameter vectors
$\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)}$, $\mathbf{w}^{(0)},...,\mathbf{w}^{(n_u-1)}$ and $b^{(0)},...,b^{(n_u-1)}$, where the
model predicts the rating for movie $i$ by user $j$ as
$y^{(i,j)} = \mathbf{w}^{(j)}\cdot \mathbf{x}^{(i)} + b^{(j)}$ . Given a dataset that consists of
a set of ratings produced by some users on some movies, you wish to
learn the parameter vectors $\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)},
\mathbf{w}^{(0)},...,\mathbf{w}^{(n_u-1)}$  and $b^{(0)},...,b^{(n_u-1)}$ that produce the best fit (minimizes
the squared error).

#### Collaborative filtering cost function

The collaborative filtering cost function is given by
$$J({\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)},\mathbf{w}^{(0)},b^{(0)},...,\mathbf{w}^{(n_u-1)},b^{(n_u-1)}})= \left[ \frac{1}{2}\sum_{(i,j):r(i,j)=1}(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2 \right]
+ \underbrace{\left[
\frac{\lambda}{2}
\sum_{j=0}^{n_u-1}\sum_{k=0}^{n-1}(\mathbf{w}^{(j)}_k)^2
+ \frac{\lambda}{2}\sum_{i=0}^{n_m-1}\sum_{k=0}^{n-1}(\mathbf{x}_k^{(i)})^2
\right]}_{regularization}
\tag{1}$$
The first summation in (1) is "for all $i$, $j$ where $r(i,j)$ equals $1$" and could be written:

$$
= \left[ \frac{1}{2}\sum_{j=0}^{n_u-1} \sum_{i=0}^{n_m-1}r(i,j)*(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2 \right]
+\text{regularization}
$$


Below implementation [3.1-Cost function using for loop](#3.1) & [3.2 -Cost function using numpy](#3.2) are not recommended to train the model using tensorflow

<a name="3.1"></a>

## 3.1- Cost function using for loop

In [None]:
def cofi_cost_func_for_loop(X, W, b, Y, R, lambda_):
    """
    Returns the cost for the content-based filtering
    Args:
      X (ndarray (num_movies,num_features)): matrix of item features
      W (ndarray (num_users,num_features)) : matrix of user parameters
      b (ndarray (1, num_users)            : vector of user parameters
      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
      R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
      lambda_ (float): regularization parameter
    Returns:
      J (float) : Cost
    """
    nm, nu = Y.shape
    J = 0
    
    for j in range(nu):
        w = W[j,:]
        b_j = b[0,j]
        
        for i in range(nm):
            x = X[i,:]
            y = Y[i,j]
            r = R[i,j]
            J += np.square(r * (np.dot(w,x) + b_j -y))
    J = J/2
    J += (lambda_ / 2) * (np.sum(sp.square(W)) + np.sum(np.square(X)))
    return J
    
    

<a name="3.2"></a>

##  3.2- Cost Function using vectorization with numpy implementation <img align="left" src="./images/numpy.png" style="width:40px; ">

In [5]:
def cofi_cost_func_numpy(X, W, b, Y, R, lambda_ ):
    """
    Returns the cost for the content-based filtering
    Args:
      X (ndarray (num_movies,num_features)): matrix of item features
      W (ndarray (num_users,num_features)) : matrix of user parameters
      b (ndarray (1, num_users)            : vector of user parameters
      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
      R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
      lambda_ (float): regularization parameter
    Returns:
      J (float) : Cost
    """
    # Vectorized computation of cost
    J = (1/2) * np.sum(R * np.square(X @ W.T + b - Y))
    
    # Regularization term
    reg_term = (lambda_/2) * (np.sum(np.square(W)) + np.sum(np.square(X)))
    
    # Compute cost with regularization
    J += reg_term
    
    return J
    

<a name="3.3"></a>

## 3.3- Cost Function using vectorization with Tensorflow implementation-1 <img align="left" src="./images/tf.png" style="width:25px; ">


This implementation requires more memory so below Tensorflow implementation-2 is recommended to compute cost for collaborative filtering: Both are same implementation

In [None]:
def cofi_cost_func_tf1(X, W, b, Y, R, lambda_):
    # Tensorflow Variable
    X_tf = tf.Variable(X, dtype=tf.float32)
    W_tf = tf.Variable(W, dtype=tf.float32)
    b_tf = tf.Variable(b, dtype=tf.float32)
    Y_tf = tf.Variable(Y, dtype=tf.float32)
    R_tf = tf.Variable(R, dtype=tf.float32)
    # Cost computation
    J = 0.5 * tf.reduce_sum(R_tf * tf.square(tf.matmul(X_tf, tf.transpose(W_tf))+ b_tf -Y_tf))
    # Regularization term
    reg_term = 0.5 * lambda_ * (tf.reduce_sum(tf.square(W)) + tf.reduce_sum(tf.square(X)))
    # Compute cost with regularization
    J += reg_term
    return J

<a name="3.4"></a>
 
## 3.4- Cost Function using vectorization with Tensorflow implementation-2 <img align="left" src="./images/tf.png" style="width:25px; ">

Recommended Cost Function Implementation if using Tensorflow

In [5]:
def cofi_cost_func_tf2(X, W, b, Y, R, lambda_):
    """
    Returns the cost for the content-based filtering
    Args:
      X (ndarray (num_movies,num_features)): matrix of item features
      W (ndarray (num_users,num_features)) : matrix of user parameters
      b (ndarray (1, num_users)            : vector of user parameters
      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
      R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
      lambda_ (float): regularization parameter
    Returns:
      J (float) : Cost
    """
    # Compute cost
    j = (tf.linalg.matmul(X, tf.transpose(W)) + b - Y) * R
    J = 0.5 * tf.reduce_sum(j**2) + (lambda_/2) * (tf.reduce_sum(X**2) + tf.reduce_sum(W**2))
    return J
    

<a name="4"></a>

# 4- Learning Recomendation  <img align="left" src="./images/film_rating.png" style="width:40px">

<a name="4.1"> </a>

## 4.1- Initialize my/user ratings for at least 10 movies


You can choose your own ratings here

In [24]:
#---------------------------------------------------------> This part is totally optional and for experimental purpose <------------------------------------------------------

# Set seed to get consistent result for creating movie index
np.random.seed(42)

my_ratings_list_index = np.random.randint(1,9725,size=10).tolist()
my_ratings_list_index

[1290, 7294, 1345, 7292, 9373, 4830, 1521, 9225, 9290, 6401]

In [10]:
my_ratings_list_index = [3569,6726,7571,7750,7752,7784,1182,9173,9288,9417]

In [11]:
# Check the movie names
for i in range(len(my_ratings_list_index)):
    print(f"Index of : {my_ratings_list_index[i]} is movie : {df.loc[my_ratings_list_index[i], 'title']}")

Index of : 3569 is movie : Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)
Index of : 6726 is movie : Iron Man (2008)
Index of : 7571 is movie : Thor (2011)
Index of : 7750 is movie : Dark Knight Rises, The (2012)
Index of : 7752 is movie : Sherlock Holmes: A Game of Shadows (2011)
Index of : 7784 is movie : Intouchables (2011)
Index of : 1182 is movie : Men in Black (a.k.a. MIB) (1997)
Index of : 9173 is movie : The Devil's Candy (2015)
Index of : 9288 is movie : Now You See Me 2 (2016)
Index of : 9417 is movie : Underworld: Blood Wars (2016)


<a name="4.1.1"> </a>


### 4.1.1- Take movie ratings input

In [12]:
my_ratings = np.zeros(num_movies)

# add user ratings manually
my_ratings[1182] = 2           # Men in Black (a.k.a. MIB) (1997)
my_ratings[3569] = 3           # Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)
my_ratings[6726] = 5           # Iron Man (2008)
my_ratings[7571] = 4           # Thor (2011)
my_ratings[7750] = 5           # Dark Knight Rises, The (2012)
my_ratings[7752] = 4           # Sherlock Holmes: A Game of Shadows (2011)
my_ratings[7784] = 5           # Intouchables (2011)
my_ratings[9173] = 1           # The Devil's Candy (2015)
my_ratings[9288] = 3           # Now You See Me 2 (2016)
my_ratings[9417] = 3           # Underworld: Blood Wars (2016)

my_rated = [i for i in range(len(my_ratings)) if my_ratings[i] > 0]


Below cell to take different ratings inputs for the movies to experiment

In [10]:
#---------------------------------------------------------> This part is totally optional and for experimental purpose <------------------------------------------------------

# Take inputs from user for rating variation:
for i in range(len(my_ratings_list_index)):
    while True:
        try:
            user_rating = int(input(f"Enter your rating between 1 & 5 for the movie:  {df.loc[my_ratings_list_index[i], 'title']} "))
            if 1 <= user_rating <= 5:
                my_ratings[my_ratings_list_index[i]] = user_rating
                break
            else:
                print("Invalid Input! Please enter a number between 1 & 5")

        except ValueError:
            print("Invalid Input! Please enter a Valid Integer")

In [13]:
print("New User Ratings: \n")

for i in range(len(my_ratings)):
    if (my_ratings[i] > 0):
        print(f'Rated {my_ratings[i]}  for movie : {df.loc[i, "title"]}')

New User Ratings: 

Rated 2.0  for movie : Men in Black (a.k.a. MIB) (1997)
Rated 3.0  for movie : Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)
Rated 5.0  for movie : Iron Man (2008)
Rated 4.0  for movie : Thor (2011)
Rated 5.0  for movie : Dark Knight Rises, The (2012)
Rated 4.0  for movie : Sherlock Holmes: A Game of Shadows (2011)
Rated 5.0  for movie : Intouchables (2011)
Rated 1.0  for movie : The Devil's Candy (2015)
Rated 3.0  for movie : Now You See Me 2 (2016)
Rated 3.0  for movie : Underworld: Blood Wars (2016)


<a name="4.1.2"></a>

### 4.1.2-  Add New Reviews/Ratings and Normalize user Ratings 

Preprocess data by subtracting mean rating for every movie (every row). Only include real ratings $R(i,j)=1$.
 ``[Ynorm, Ymean] = normalizeRatings(Y, R)`` normalized Y so that each movie has a rating of $0$ on average. Unrated moves then have a mean rating (0)
Returns the mean rating in Ymean.

In [14]:
def normalizeRatings(Y,R):
    Ymean = (np.sum(Y *R, axis=1) / (np.sum(R, axis=1)+1e-12)).reshape(-1,1)
    Ynorm = Y - np.multiply(Ymean, R)
    return(Ymean, Ynorm)


In [15]:
# Add new user ratings to Y
Y = np.c_[my_ratings, Y]

# Add new user indiacator to matrix R
R = np.c_[(my_ratings!=0).astype(int), R]

# Normalize Y & R
Ymean, Ynorm = normalizeRatings(Y, R)

In [12]:
R.shape

(9724, 611)

# Numpy Implementation training  <img align="left" src="./images/numpy.png" style="width:45px">

## Gradient Decent

In [16]:
def compute_gradient(X, W, b, Y, R, lambda_):
    
    # Calculate error
    error = (X @ W.T + b - Y) * R

    # Calculate gradient
    X_grad = error @ W + lambda_ * X
    W_grad = error.T @ X + lambda_ * W
    b_grad = np.sum(error)

    return (X_grad, W_grad, b_grad)


### Clip the gradients to avoid exploding gradient

In [17]:
def clip_gradients(gradients, clip_value):
    return [np.clip(grad, -clip_value, clip_value) for grad in gradients]

In [18]:
def update_parameters(X, W, b, Y, R, lambda_, lr, clip_value):

    X_grad, W_grad, b_grad = compute_gradient(X, W, b, Y, R, lambda_)
    # Clip the gradient before updating
    clipped_grad = clip_gradients([X_grad, W_grad, b_grad], clip_value)

    # Update the parameters using clipped Gradients
    X -= lr* clipped_grad[0]
    W -= lr* clipped_grad[1]
    b -= lr* clipped_grad[2]
    

In [58]:
# Useful Values
num_movies, num_users = Y.shape
num_features = 100
# Training Parameters
learning_rate = 0.01
iteration = 500
lambda_value = 0.1
clip_value = 1.0

# Set initial value for the parameters
np.random.seed(1234)   # For consistent results
X = np.random.normal(size=(num_movies, num_features))
W = np.random.normal(size=(num_users, num_features))
b = np.random.normal(size=(1,num_users))

for iter in range(iteration):
    update_parameters(X, W, b, Ynorm, R, lambda_value, learning_rate, clip_value)

    # Compute cost periodically and log it
    if iter % 10 == 0:
        cost_value = cofi_cost_func_numpy(X, W, b, Ynorm, R, lambda_value)
        print(f'Training loss : {cost_value:0.2f} after Iteration: {iter}')
        # Print additional information (optional)
        # print(f'X: {X[:2, :2]}, W: {W[:2, :2]}, b: {b[:, :2]}')

Training loss : 4973681.94 after Iteration: 0
Training loss : 2736560.87 after Iteration: 10
Training loss : 1472146.99 after Iteration: 20
Training loss : 775433.19 after Iteration: 30
Training loss : 401640.11 after Iteration: 40
Training loss : 208604.89 after Iteration: 50
Training loss : 114166.67 after Iteration: 60
Training loss : 71499.96 after Iteration: 70
Training loss : 54424.91 after Iteration: 80
Training loss : 48221.65 after Iteration: 90
Training loss : 45555.49 after Iteration: 100
Training loss : 43945.79 after Iteration: 110
Training loss : 42732.27 after Iteration: 120
Training loss : 41701.72 after Iteration: 130
Training loss : 40771.64 after Iteration: 140
Training loss : 39906.21 after Iteration: 150
Training loss : 39086.50 after Iteration: 160
Training loss : 38302.60 after Iteration: 170
Training loss : 37547.48 after Iteration: 180
Training loss : 36816.98 after Iteration: 190
Training loss : 36108.02 after Iteration: 200
Training loss : 35418.36 after Iter

#  Tensorflow implementation Training  <img align="left" src="./images/tf.png" style="width:45px">

Prepare to train the model

In [118]:
# Useful Values
num_movies, num_users = Y.shape
num_features = 100

# Set initial parameters W,X,b and use Tensorflow Variable to track them
tf.random.set_seed(1234)
W = tf.Variable(tf.random.normal((num_users, num_features),dtype=tf.float64),  name='W')
X = tf.Variable(tf.random.normal((num_movies, num_features),dtype=tf.float64), name="X")
b = tf.Variable(tf.random.normal((1, num_users), dtype=tf.float64), name="b")


#Instantiate Optimizer for gradient descent
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-1)


Trai the Model

In [119]:
iterations = 200
lambda_ = 1

for iter in range(iterations):

    # Using TensorFlow's GradientTape to record the operations used to compute the cost
    with tf.GradientTape() as tape:
        
        # Compute cost value using cost function
        cost_value = cofi_cost_func_tf2(X, W, b, Ynorm, R, lambda_)

    # Using Gradient tape to automatically retrieve the gradients of the variables with respect to the loss
    grads = tape.gradient(cost_value,[X, W, b])

    # Run One step of gradient descent by updating the value of the variables to minimize the loss
    optimizer.apply_gradients(zip(grads, [X, W, b]))

    # Log the training loss periodically
    if iter % 5 ==0:
        print(f'Training Loss at iteration {iter}: {cost_value:0.2f}')

Training Loss at iteration 0: 5540870.06
Training Loss at iteration 5: 855711.36
Training Loss at iteration 10: 554613.89
Training Loss at iteration 15: 383891.35
Training Loss at iteration 20: 279380.51
Training Loss at iteration 25: 212556.76
Training Loss at iteration 30: 166472.54
Training Loss at iteration 35: 133095.36
Training Loss at iteration 40: 108042.60
Training Loss at iteration 45: 88889.39
Training Loss at iteration 50: 74017.09
Training Loss at iteration 55: 62325.51
Training Loss at iteration 60: 53026.05
Training Loss at iteration 65: 45540.56
Training Loss at iteration 70: 39456.99
Training Loss at iteration 75: 34459.75
Training Loss at iteration 80: 30317.83
Training Loss at iteration 85: 26856.20
Training Loss at iteration 90: 23941.42
Training Loss at iteration 95: 21470.82
Training Loss at iteration 100: 19364.15
Training Loss at iteration 105: 17558.47
Training Loss at iteration 110: 16003.36
Training Loss at iteration 115: 14658.32
Training Loss at iteration 1

# Prediction

In [120]:
# Make a prediction using trained weights and biases
p = np.matmul(X.numpy(), np.transpose(W.numpy())) + b.numpy()

#restore the mean
pm = p + Ymean

my_predictions = pm[:,0]

# sort predictions
ix = tf.argsort(my_predictions, direction='DESCENDING')

for i in range(17):
    j = ix[i]
    if j not in my_rated:
        print(f'Predicting rating {my_predictions[j]:0.2f} for movie {movie_list[j]}')

print('\n\nOriginal vs Predicted ratings:\n')
for i in range(len(my_ratings)):
    if my_ratings[i] > 0:
        print(f'Original {my_ratings[i]}, Predicted {my_predictions[i]:0.2f} for {movie_list[i]}')

Predicting rating 5.22 for movie Usual Suspects, The (1995)
Predicting rating 4.99 for movie Lesson Faust (1994)
Predicting rating 4.97 for movie Odd Life of Timothy Green, The (2012)
Predicting rating 4.97 for movie Assignment, The (1997)
Predicting rating 4.96 for movie Mephisto (1981)
Predicting rating 4.96 for movie Thin Line Between Love and Hate, A (1996)
Predicting rating 4.96 for movie Advise and Consent (1962)
Predicting rating 4.96 for movie Monster Squad, The (1987)
Predicting rating 4.96 for movie Galaxy of Terror (Quest) (1981)
Predicting rating 4.96 for movie Alien Contamination (1980)
Predicting rating 4.96 for movie Raise Your Voice (2004)
Predicting rating 4.96 for movie Jump In! (2007)
Predicting rating 4.96 for movie Last Hurrah for Chivalry (Hao xia) (1979)
Predicting rating 4.96 for movie Max Manus (2008)
Predicting rating 4.95 for movie The Girl with All the Gifts (2016)
Predicting rating 4.95 for movie Ugly Duckling and Me!, The (2006)
Predicting rating 4.95 for 

#####  Numpy prediction

In [59]:
# Make a prediction using trained weights and biases
p = np.matmul(X, np.transpose(W)) + b

#restore the mean
pm = p + Ymean

my_predictions = pm[:,0]

# sort predictions
ix = np.argsort(-my_predictions)

for i in range(17):
    j = ix[i]
    if j not in my_rated:
        print(f'Predicting rating {my_predictions[j]:0.2f} for movie {movie_list[j]}')

print('\n\nOriginal vs Predicted ratings:\n')
for i in range(len(my_ratings)):
    if my_ratings[i] > 0:
        print(f'Original {my_ratings[i]}, Predicted {my_predictions[i]:0.2f} for {movie_list[i]}')

Predicting rating 18.79 for movie On the Beach (1959)
Predicting rating 17.39 for movie McCabe & Mrs. Miller (1971)
Predicting rating 17.15 for movie Crime and Punishment in Suburbia (2000)
Predicting rating 16.79 for movie Room for Romeo Brass, A (1999)
Predicting rating 16.60 for movie Bring Me the Head of Alfredo Garcia (1974)
Predicting rating 16.47 for movie Fool's Gold (2008)
Predicting rating 16.35 for movie Funny Games U.S. (2007)
Predicting rating 16.15 for movie Carrie (2002)
Predicting rating 16.05 for movie Digimon: The Movie (2000)
Predicting rating 15.81 for movie Camille (1936)
Predicting rating 15.68 for movie Jimmy Hollywood (1994)
Predicting rating 15.65 for movie Sunshine State (2002)
Predicting rating 15.63 for movie Freshman, The (1925)
Predicting rating 15.53 for movie Last King of Scotland, The (2006)
Predicting rating 15.50 for movie Perfect Sense (2011)
Predicting rating 15.44 for movie Defiance (2008)
Predicting rating 15.24 for movie Side Effects (2013)


Ori