## Overview

**In this project the main objective is to create a recommender system (Collaborative Filtering) for movie suggestions. The model aims to suggests some movies to a typical user based on the rating of the user on other movies as well as the ratings of the other users to different movies. In the next section, the dataset, mathematics, model and implementation will be explained thoroughly.**

## 1 - Notation

I will be using the following notations for referencing and mathematics.

| General Notation | Description | Python (if any) |
|:------------------|:-----------------------------------------|:--------------|
| $r(i,j)$         | scalar; = 1 if user j rated movie i, 0 otherwise | |
| $y(i,j)$         | scalar; = rating given by user j on movie i (if r(i,j) = 1 is defined) | |
| $\mathbf{w}^{(j)}$ | vector; parameters for user j | |
| $b^{(j)}$        | scalar; parameter for user j | |
| $\mathbf{x}^{(i)}$ | vector; feature ratings for movie i | |
| $n_u$            | number of users | num_users |
| $n_m$            | number of movies | num_movies |
| $n$              | number of features | num_features |
| $\mathbf{X}$     | matrix of vectors $\mathbf{x}^{(i)}$ | X |
| $\mathbf{W}$     | matrix of vectors $\mathbf{w}^{(j)}$ | W |
| $\mathbf{b}$     | vector of bias parameters $b^{(j)}$ | b |
| $\mathbf{R}$     | matrix of elements $r(i,j)$ | R |


## 2 - Recommender Systems
In this lab, I will implement the collaborative filtering learning algorithm and apply it to a dataset of movie ratings.
The goal of a collaborative filtering recommender system is to generate two vectors: For each user, a 'parameter vector' that embodies the movie tastes of a user. For each movie, a feature vector of the same size which embodies some description of the movie. The dot product of the two vectors plus the bias term should produce an estimate of the rating the user might give to that movie.

The diagram below details how these vectors are learned.

<figure>
   <img src="ColabFilterLearn.PNG"  style="width:740px;height:250px;" >
</figure>

Existing ratings are provided in matrix form as shown. $Y$ contains ratings; 0.5 to 5 inclusive in 0.5 steps. 0 if the movie has not been rated. $R$ has a 1 where movies have been rated. Movies are in rows, users in columns. Each user has a parameter vector $w^{user}$ and bias. Each movie has a feature vector $x^{movie}$. These vectors are simultaneously learned by using the existing user/movie ratings as training data. One training example is shown above: $\mathbf{w}^{(1)} \cdot \mathbf{x}^{(1)} + b^{(1)} = 4$. It is worth noting that the feature vector $x^{movie}$ must satisfy all the users while the user vector $w^{user}$ must satisfy all the movies. This is the source of the name of this approach - all the users collaborate to generate the rating set. 

<figure>
   <img src="ColabFilterUse.PNG"  style="width:640px;height:250px;" >
</figure>

Once the feature vectors and parameters are learned, they can be used to predict how a user might rate an unrated movie. This is shown in the diagram above. The equation is an example of predicting a rating for user one on movie zero.

In this project, I will implement the function cofiCostFunc that computes the collaborative filtering objective function. After implementing the objective function, I will use a TensorFlow custom training loop to learn the parameters for collaborative filtering. The first step is to detail the data set and data structures that will be used in the project.

## 3 - Movie ratings dataset 
The data set is derived from the [MovieLens "ml-latest-small"](https://grouplens.org/datasets/movielens/latest/) dataset.   

The original dataset has  9000 movies rated by 600 users. The dataset has been reduced in size to focus on movies from the years since 2000. This dataset consists of ratings on a scale of 0.5 to 5 in 0.5 step increments. The reduced dataset has $n_u = 443$ users, and $n_m= 4778$ movies. 

Below, I will load the movie dataset into the variables $Y$ and $R$.

The matrix $Y$ (a  $n_m \times n_u$ matrix) stores the ratings $y^{(i,j)}$. The matrix $R$ is an binary-valued indicator matrix, where $R(i,j) = 1$ if user $j$ gave a rating to movie $i$, and $R(i,j)=0$ otherwise. 

Throughout this part of the project, I will also be working with the
matrices, $\mathbf{X}$, $\mathbf{W}$ and $\mathbf{b}$: 

$$\mathbf{X} = 
\begin{bmatrix}
--- (\mathbf{x}^{(0)})^T --- \\
--- (\mathbf{x}^{(1)})^T --- \\
\vdots \\
--- (\mathbf{x}^{(n_m-1)})^T --- \\
\end{bmatrix} , \quad
\mathbf{W} = 
\begin{bmatrix}
--- (\mathbf{w}^{(0)})^T --- \\
--- (\mathbf{w}^{(1)})^T --- \\
\vdots \\
--- (\mathbf{w}^{(n_u-1)})^T --- \\
\end{bmatrix},\quad
\mathbf{ b} = 
\begin{bmatrix}
 b^{(0)}  \\
 b^{(1)} \\
\vdots \\
b^{(n_u-1)} \\
\end{bmatrix}\quad
$$ 

The $i$-th row of $\mathbf{X}$ corresponds to the
feature vector $x^{(i)}$ for the $i$-th movie, and the $j$-th row of
$\mathbf{W}$ corresponds to one parameter vector $\mathbf{w}^{(j)}$, for the
$j$-th user. Both $x^{(i)}$ and $\mathbf{w}^{(j)}$ are $n$-dimensional
vectors. For the purposes of this project, I will use $n=10$, and
therefore, $\mathbf{x}^{(i)}$ and $\mathbf{w}^{(j)}$ have 10 elements.
Correspondingly, $\mathbf{X}$ is a
$n_m \times 10$ matrix and $\mathbf{W}$ is a $n_u \times 10$ matrix.

I will start by loading the movie ratings dataset to understand the structure of the data.
I will load $Y$ and $R$ with the movie dataset.  
I'll also load $\mathbf{X}$, $\mathbf{W}$, and $\mathbf{b}$ with pre-computed values. These values will be learned later in the project, but I'll use pre-computed values to develop the cost model.

### Imports

In [44]:
import numpy as np
import pandas as pd
import tensorflow as tf
from keras.optimizers import Adam
import warnings

warnings.filterwarnings('ignore')

In [46]:
#Load data
movie_titles = pd.read_csv('small_movie_list.csv', usecols=[3])

Y = pd.read_csv('small_movies_Y.csv', names=[f'user{x}' for x in range(443)])
Y = movie_titles.join(Y)

R = pd.read_csv('small_movies_R.csv', names=[f'user{x}' for x in range(443)])
R = movie_titles.join(R)

W = pd.read_csv('small_movies_W.csv', names=[f'w_{x}' for x in range(10)])
W.index = [f'user{x}' for x in range(443)]

X = pd.read_csv('small_movies_X.csv', names=[f'feature_{x}' for x in range(10)])
X = movie_titles.join(X)

b = pd.read_csv('small_movies_b.csv', names=[f'b_movie{x}' for x in range(443)])

print(f'Y: {Y.iloc[:, 1:].shape}')
print(f'R: {R.iloc[:, 1:].shape}')
print(f'W: {W.shape}')
print(f'X: {X.iloc[:, 1:].shape}')
print(f'b: {b.shape}')

print(f"Num of features: {X.iloc[:, 1:].shape[1]}")
print(f"Num of movies: {Y.iloc[:, 1:].shape[0]}")
print(f"Num of users: {W.shape[0]}")

Y: (4778, 443)
R: (4778, 443)
W: (443, 10)
X: (4778, 10)
b: (1, 443)
Num of features: 10
Num of movies: 4778
Num of users: 443


In [47]:
Y.head()

Unnamed: 0,title,user0,user1,user2,user3,user4,user5,user6,user7,user8,...,user433,user434,user435,user436,user437,user438,user439,user440,user441,user442
0,"Yards, The (2000)",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Next Friday (2000),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Supernova (2000),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Down to You (2000),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Scream 3 (2000),5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,4.0,3.0,3.0


In [48]:
R.head()

Unnamed: 0,title,user0,user1,user2,user3,user4,user5,user6,user7,user8,...,user433,user434,user435,user436,user437,user438,user439,user440,user441,user442
0,"Yards, The (2000)",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Next Friday (2000),0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Supernova (2000),0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
3,Down to You (2000),0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Scream 3 (2000),1,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,1,1,1


In [49]:
W.head()

Unnamed: 0,w_0,w_1,w_2,w_3,w_4,w_5,w_6,w_7,w_8,w_9
user0,0.479351,0.4767,0.487941,-0.119804,-0.32864,0.423246,-0.395107,-0.238308,-0.36122,-0.180903
user1,0.036048,-0.381909,0.295857,-0.258234,0.286056,-0.181466,0.293492,0.464079,0.458329,-0.23635
user2,0.055616,-0.058994,-0.245282,0.109871,0.397099,0.363621,0.145976,0.363758,0.018468,0.174881
user3,0.455081,0.159874,-0.452741,0.235758,0.238673,-0.277246,0.367087,-0.327934,0.288718,0.370415
user4,-0.017157,-0.439861,-0.337667,0.183689,-0.458436,0.171238,-0.360265,0.111018,0.048738,-0.439863


In [50]:
X.head()

Unnamed: 0,title,feature_0,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,feature_7,feature_8,feature_9
0,"Yards, The (2000)",-0.033287,1.166746,-0.50649,0.125096,1.53476,-0.210136,0.104176,-0.349303,0.753852,-0.208109
1,Next Friday (2000),0.404058,2.266492,-0.742824,1.374039,1.145788,0.381959,-0.447495,-1.038103,-2.332266,-2.343191
2,Supernova (2000),0.023478,0.428102,-0.426381,0.107177,0.484193,-0.969298,-0.748563,-0.293784,0.43973,1.823422
3,Down to You (2000),0.511281,1.518797,1.602747,-2.296399,-0.570383,-0.480868,-0.662502,1.965215,0.495176,1.044845
4,Scream 3 (2000),-2.056514,0.458517,0.619371,-0.138456,0.011926,-0.282415,0.027502,-0.535465,0.661082,0.195084


In [51]:
b.head()

Unnamed: 0,b_movie0,b_movie1,b_movie2,b_movie3,b_movie4,b_movie5,b_movie6,b_movie7,b_movie8,b_movie9,...,b_movie433,b_movie434,b_movie435,b_movie436,b_movie437,b_movie438,b_movie439,b_movie440,b_movie441,b_movie442
0,0.236095,0.356534,0.1282,0.315502,0.236364,-0.229178,-0.144015,0.075727,-0.370031,-0.021354,...,-0.176575,-0.473329,-0.361718,0.084787,0.429559,-0.405086,0.45839,0.412608,-0.36224,-0.32014


In [52]:
Y.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4778 entries, 0 to 4777
Columns: 444 entries, title to user442
dtypes: float64(443), object(1)
memory usage: 16.2+ MB


In [53]:
R.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4778 entries, 0 to 4777
Columns: 444 entries, title to user442
dtypes: int64(443), object(1)
memory usage: 16.2+ MB


In [54]:
W.info()

<class 'pandas.core.frame.DataFrame'>
Index: 443 entries, user0 to user442
Data columns (total 10 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   w_0     443 non-null    float64
 1   w_1     443 non-null    float64
 2   w_2     443 non-null    float64
 3   w_3     443 non-null    float64
 4   w_4     443 non-null    float64
 5   w_5     443 non-null    float64
 6   w_6     443 non-null    float64
 7   w_7     443 non-null    float64
 8   w_8     443 non-null    float64
 9   w_9     443 non-null    float64
dtypes: float64(10)
memory usage: 38.1+ KB


In [55]:
X.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4778 entries, 0 to 4777
Data columns (total 11 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   title      4778 non-null   object 
 1   feature_0  4778 non-null   float64
 2   feature_1  4778 non-null   float64
 3   feature_2  4778 non-null   float64
 4   feature_3  4778 non-null   float64
 5   feature_4  4778 non-null   float64
 6   feature_5  4778 non-null   float64
 7   feature_6  4778 non-null   float64
 8   feature_7  4778 non-null   float64
 9   feature_8  4778 non-null   float64
 10  feature_9  4778 non-null   float64
dtypes: float64(10), object(1)
memory usage: 410.7+ KB


In [56]:
b.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Columns: 443 entries, b_movie0 to b_movie442
dtypes: float64(443)
memory usage: 3.6 KB


In [57]:
print(f'Total Number of possbile Ratings: {R.iloc[:, 1:].shape[0] * R.iloc[:, 1:].shape[1]}')
print(f'Total Number of given Ratings: {(R == 1).sum().sum()}')
print(f'Precentage of 1s: {((R == 1).sum().sum() / (R.iloc[:, 1:].shape[0] * R.iloc[:, 1:].shape[1])) * 100} %')

Total Number of possbile Ratings: 2116654
Total Number of given Ratings: 39253
Precentage of 1s: 1.8544835386416485 %


In [58]:
print(f"Average rating for movie 1 : {Y.iloc[0][1:].sum() / R.iloc[0][1:].sum():0.3f} / 5" )

Average rating for movie 1 : 3.400 / 5


In [59]:
print(f"Average rating for all movies : {np.mean(np.sum(Y.iloc[:, 1:], axis=1) / np.sum(R.iloc[:, 1:], axis=1)):0.3f} / 5")

Average rating for all movies : 3.241 / 5


<a name="4"></a>
## 4 - Collaborative filtering learning algorithm 

Now, I will begin implementing the collaborative filtering learning
algorithm. I will start by implementing the objective function. 

The collaborative filtering algorithm in the setting of movie
recommendations considers a set of $n$-dimensional parameter vectors
$\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)}$, $\mathbf{w}^{(0)},...,\mathbf{w}^{(n_u-1)}$ and $b^{(0)},...,b^{(n_u-1)}$, where the
model predicts the rating for movie $i$ by user $j$ as
$y^{(i,j)} = \mathbf{w}^{(j)}\cdot \mathbf{x}^{(i)} + b^{(j)}$ . Given a dataset that consists of
a set of ratings produced by some users on some movies, you wish to
learn the parameter vectors $\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)},
\mathbf{w}^{(0)},...,\mathbf{w}^{(n_u-1)}$  and $b^{(0)},...,b^{(n_u-1)}$ that produce the best fit (minimizes
the squared error).

I will create the code in cofiCostFunc to compute the cost
function for collaborative filtering. 

<a name="4.1"></a>
### 4.1 Collaborative filtering cost function

The collaborative filtering cost function is given by
$$J({\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)},\mathbf{w}^{(0)},b^{(0)},...,\mathbf{w}^{(n_u-1)},b^{(n_u-1)}})= \left[ \frac{1}{2}\sum_{(i,j):r(i,j)=1}(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2 \right]
+ \underbrace{\left[
\frac{\lambda}{2}
\sum_{j=0}^{n_u-1}\sum_{k=0}^{n-1}(\mathbf{w}^{(j)}_k)^2
+ \frac{\lambda}{2}\sum_{i=0}^{n_m-1}\sum_{k=0}^{n-1}(\mathbf{x}_k^{(i)})^2
\right]}_{regularization}
\tag{1}$$
The first summation in (1) is "for all $i$, $j$ where $r(i,j)$ equals $1$" and could be written:

$$
= \left[ \frac{1}{2}\sum_{j=0}^{n_u-1} \sum_{i=0}^{n_m-1}r(i,j)*(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2 \right]
+\text{regularization}
$$

I want now to write cofiCostFunc (collaborative filtering cost function) to return this cost.

In [60]:
X_feed = X.drop('title', axis=1).to_numpy()
W_feed = W.to_numpy()
b_feed = b.to_numpy()
Y_feed = Y.drop('title', axis=1).to_numpy()
R_feed = R.drop('title', axis=1).to_numpy()

In [61]:
def cofi_cost_func(X, W, b, Y, R, lambda_):
    """
    Returns the cost for the content-based filtering
    Args:
      X (ndarray (num_movies,num_features)): matrix of item features
      W (ndarray (num_users,num_features)) : matrix of user parameters
      b (ndarray (1, num_users)            : vector of user parameters
      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
      R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
      lambda_ (float): regularization parameter
    Returns:
      J (float) : Cost
    """
    nm, nu = Y.shape
    J = 0
    for j in range(nu):
        for i in range(nm):
            cost_inner_sum = R[i, j] * ((((np.matmul(W[j], X[i])) + b[0, j]) - Y[i, j]) ** 2)
            J += cost_inner_sum
    J /= 2
    J += (lambda_/2) * (np.sum(np.square(W)) + np.sum(np.square(X)))
    return J

In [62]:
# Reduce the data set size so that this runs faster
num_users_r = 4
num_movies_r = 5 
num_features_r = 3

X_r = X_feed[:num_movies_r, :num_features_r]
W_r = W_feed[:num_users_r,  :num_features_r]
b_r = b_feed[0, :num_users_r].reshape(1,-1)
Y_r = Y_feed[:num_movies_r, :num_users_r]
R_r = R_feed[:num_movies_r, :num_users_r]

# Evaluate cost function
J = cofi_cost_func(X_r, W_r, b_r, Y_r, R_r, 0);
print(f"Cost: {J:0.2f}")

Cost: 13.67


In [63]:
# Evaluate cost function with regularization 
J = cofi_cost_func(X_r, W_r, b_r, Y_r, R_r, 1.5);
print(f"Cost (with regularization): {J:0.2f}")

Cost (with regularization): 28.09


**Vectorized Implementation**

It is important to create a vectorized implementation to compute $J$, since it will later be called many times during optimization.

In [64]:
def cofi_cost_func_v(X, W, b, Y, R, lambda_):
    """
    Returns the cost for the content-based filtering
    Vectorized for speed. Uses tensorflow operations to be compatible with custom training loop.
    Args:
      X (ndarray (num_movies,num_features)): matrix of item features
      W (ndarray (num_users,num_features)) : matrix of user parameters
      b (ndarray (1, num_users)            : vector of user parameters
      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
      R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
      lambda_ (float): regularization parameter
    Returns:
      J (float) : Cost
    """
    j = (tf.linalg.matmul(X, tf.transpose(W)) + b - Y)*R
    J = 0.5 * tf.reduce_sum(j**2) + (lambda_/2) * (tf.reduce_sum(X**2) + tf.reduce_sum(W**2))
    return J

In [65]:
# Evaluate cost function
J = cofi_cost_func_v(X_r, W_r, b_r, Y_r, R_r, 0);
print(f"Cost: {J:0.2f}")

# Evaluate cost function with regularization 
J = cofi_cost_func_v(X_r, W_r, b_r, Y_r, R_r, 1.5);
print(f"Cost (with regularization): {J:0.2f}")

Cost: 13.67
Cost (with regularization): 28.09


<a name="5"></a>
## 5 - Learning movie recommendations 

After I have finished implementing the collaborative filtering cost
function, I can start training my algorithm to make
movie recommendations. 

In the cell below, I can enter my own movie choices. The algorithm will then make recommendations for me!

In [66]:
movie_titles

Unnamed: 0,title
0,"Yards, The (2000)"
1,Next Friday (2000)
2,Supernova (2000)
3,Down to You (2000)
4,Scream 3 (2000)
...,...
4773,Jon Stewart Has Left the Building (2015)
4774,Black Butler: Book of the Atlantic (2017)
4775,No Game No Life: Zero (2017)
4776,Flint (2017)


In [67]:
my_ratings = np.zeros(len(movie_titles))          #  Initialize my ratings

# Checking the file small_movie_list.csv for id of each movie in our dataset
# For example, Toy Story 3 (2010) has ID 2700, so to rate it "5", I can set
my_ratings[2700] = 5 

#Or assume I did not enjoy Persuasion (2007), I can set
my_ratings[2609] = 2;

# I have selected a few movies I liked / did not like and the ratings I
# gave are as follows:
my_ratings[929]  = 5   # Lord of the Rings: The Return of the King, The
my_ratings[246]  = 5   # Shrek (2001)
my_ratings[2716] = 3   # Inception
my_ratings[1150] = 5   # Incredibles, The (2004)
my_ratings[382]  = 2   # Amelie (Fabuleux destin d'Amélie Poulain, Le)
my_ratings[366]  = 5   # Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)
my_ratings[622]  = 5   # Harry Potter and the Chamber of Secrets (2002)
my_ratings[988]  = 3   # Eternal Sunshine of the Spotless Mind (2004)
my_ratings[2925] = 1   # Louis Theroux: Law & Disorder (2008)
my_ratings[2937] = 1   # Nothing to Declare (Rien à déclarer)
my_ratings[793]  = 5   # Pirates of the Caribbean: The Curse of the Black Pearl (2003)
my_rated = [i for i in range(len(my_ratings)) if my_ratings[i] > 0]

print('\nNew user ratings:\n')
for i in range(len(my_ratings)):
    if my_ratings[i] > 0 :
        print(f'Rated {my_ratings[i]} for  {movie_titles.loc[i,"title"]}');


New user ratings:

Rated 5.0 for  Shrek (2001)
Rated 5.0 for  Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)
Rated 2.0 for  Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001)
Rated 5.0 for  Harry Potter and the Chamber of Secrets (2002)
Rated 5.0 for  Pirates of the Caribbean: The Curse of the Black Pearl (2003)
Rated 5.0 for  Lord of the Rings: The Return of the King, The (2003)
Rated 3.0 for  Eternal Sunshine of the Spotless Mind (2004)
Rated 5.0 for  Incredibles, The (2004)
Rated 2.0 for  Persuasion (2007)
Rated 5.0 for  Toy Story 3 (2010)
Rated 3.0 for  Inception (2010)
Rated 1.0 for  Louis Theroux: Law & Disorder (2008)
Rated 1.0 for  Nothing to Declare (Rien à déclarer) (2010)


Now, let's add these reviews to $Y$ and $R$ and normalize the ratings.

In [68]:
# Add new user ratings to Y 
Y_feed = np.c_[my_ratings, Y_feed]

# Add new user indicator matrix to R
R_feed = np.c_[(my_ratings != 0).astype(int), R_feed]

In [69]:
def normalizeRatings(Y, R):
    """
    Preprocess data by subtracting mean rating for every movie (every row).
    Only include real ratings R(i,j)=1.
    [Ynorm, Ymean] = normalizeRatings(Y, R) normalized Y so that each movie
    has a rating of 0 on average. Unrated movies then have a mean rating (0)
    Returns the mean rating in Ymean.
    """
    Ymean = (np.sum(Y*R,axis=1)/(np.sum(R, axis=1)+1e-12)).reshape(-1,1)
    Ynorm = Y - np.multiply(Ymean, R) 
    return(Ynorm, Ymean)

In [70]:
Ynorm, Ymean = normalizeRatings(Y_feed, R_feed)

Let's prepare to train the model. Initialize the parameters and select the Adam optimizer.

In [71]:
Y.shape

(4778, 444)

In [72]:
#  Useful Values
num_movies, num_users = Y_feed.shape
num_features = 100

# Setting Initial Parameters (W, X), using tf.Variable to track these variables
tf.random.set_seed(1234) # for consistent results
W = tf.Variable(tf.random.normal((num_users,  num_features),dtype=tf.float64),  name='W')
X = tf.Variable(tf.random.normal((num_movies, num_features),dtype=tf.float64),  name='X')
b = tf.Variable(tf.random.normal((1,          num_users),   dtype=tf.float64),  name='b')

# Instantiate an optimizer.
optimizer = Adam(learning_rate=1e-1)



In [73]:
R_feed

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

Let's now train the collaborative filtering model. This will learn the parameters $\mathbf{X}$, $\mathbf{W}$, and $\mathbf{b}$. 

The operations involved in learning $w$, $b$, and $x$ simultaneously Consequently, I will use a custom training loop.

The steps of gradient descent contains.
- repeat until convergence:
    - compute forward pass
    - compute the derivatives of the loss relative to parameters
    - update the parameters using the learning rate and the computed derivatives 
    
As TensorFlow has the marvelous capability of calculating the derivatives for me, within the `tf.GradientTape()` section, operations on Tensorflow Variables are tracked. When `tape.gradient()` is later called, it will return the gradient of the loss relative to the tracked variables. The gradients can then be applied to the parameters using an optimizer. 

In [74]:
iterations = 200
lambda_ = 1
for iter in range(iterations):
    # Use TensorFlow’s GradientTape
    # to record the operations used to compute the cost 
    with tf.GradientTape() as tape:

        # Compute the cost (forward pass included in cost)
        cost_value = cofi_cost_func_v(X, W, b, Ynorm, R_feed, lambda_)

    # Use the gradient tape to automatically retrieve
    # the gradients of the trainable variables with respect to the loss
    grads = tape.gradient( cost_value, [X,W,b] )

    # Run one step of gradient descent by updating
    # the value of the variables to minimize the loss.
    optimizer.apply_gradients( zip(grads, [X,W,b]) )

    # Log periodically.
    if iter % 20 == 0:
        print(f"Training loss at iteration {iter}: {cost_value:0.1f}")

Training loss at iteration 0: 2321191.3
Training loss at iteration 20: 136169.3
Training loss at iteration 40: 51863.7
Training loss at iteration 60: 24599.0
Training loss at iteration 80: 13630.6
Training loss at iteration 100: 8487.7
Training loss at iteration 120: 5807.8
Training loss at iteration 140: 4311.6
Training loss at iteration 160: 3435.3
Training loss at iteration 180: 2902.1


<a name="6"></a>
## 6 - Recommendations
Below, I compute the ratings for all the movies and users and display the movies that are recommended. These are based on the movies and ratings entered as `my_ratings[]` above. To predict the rating of movie $i$ for user $j$, I compute $\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)}$. This can be computed for all ratings using matrix multiplication.

In [75]:
# Make a prediction using trained weights and biases
p = np.matmul(X.numpy(), np.transpose(W.numpy())) + b.numpy()

#restore the mean
pm = p + Ymean

my_predictions = pm[:,0]

# sort predictions
ix = tf.argsort(my_predictions, direction='DESCENDING')

sugges_pred = []
for i in range(150):
    j = ix[i]
    j = j.numpy()
    if j not in my_rated:
        print(f"Predicting rating {my_predictions[j]:0.2f} for movie {movie_titles['title'][j]}")
        sugges_pred.append([movie_titles['title'][j], my_predictions[j]])

print('\n\nOriginal vs Predicted ratings:\n')
for i in range(len(my_ratings)):
    if my_ratings[i] > 0:
        print(f"Original {my_ratings[i]}, Predicted {my_predictions[i]:0.2f} for {movie_titles['title'][i]}")

Predicting rating 4.49 for movie My Sassy Girl (Yeopgijeogin geunyeo) (2001)
Predicting rating 4.48 for movie Martin Lawrence Live: Runteldat (2002)
Predicting rating 4.48 for movie Memento (2000)
Predicting rating 4.47 for movie Delirium (2014)
Predicting rating 4.47 for movie Laggies (2014)
Predicting rating 4.47 for movie One I Love, The (2014)
Predicting rating 4.46 for movie Particle Fever (2013)
Predicting rating 4.45 for movie Eichmann (2007)
Predicting rating 4.45 for movie Battle Royale 2: Requiem (Batoru rowaiaru II: Chinkonka) (2003)
Predicting rating 4.45 for movie Into the Abyss (2011)
Predicting rating 4.45 for movie Son of the Bride (Hijo de la novia, El) (2001)
Predicting rating 4.44 for movie Rivers and Tides (2001)
Predicting rating 4.44 for movie George Carlin: It's Bad for Ya! (2008)
Predicting rating 4.44 for movie Loving Vincent (2017)
Predicting rating 4.44 for movie PK (2014)
Predicting rating 4.44 for movie Sisters (Syostry) (2001)
Predicting rating 4.44 for mo

In practice, additional information can be utilized to enhance our predictions. Above, the predicted ratings for the first few hundred movies lie in a small range. I can augment the above by selecting from those top movies, movies that have high average ratings and movies with more than 20 ratings.

In [76]:
suggest_df = pd.DataFrame(data=sugges_pred, columns=['title', 'prediction'])

In [78]:
movie_titles = pd.read_csv('small_movie_list.csv', usecols=[2, 3])

In [79]:
final_suggestions = pd.merge(suggest_df, movie_titles, on='title')

In [80]:
final_suggestions[final_suggestions['number of ratings'] > 20].sort_values('prediction', ascending=False)

Unnamed: 0,title,prediction,number of ratings
2,Memento (2000),4.477798,159
27,Harry Potter and the Deathly Hallows: Part 1 (...,4.434171,47
137,"Lord of the Rings: The Two Towers, The (2002)",4.357304,188
139,Casino Royale (2006),4.344999,81
141,Finding Nemo (2003),4.289676,141


**Conclusion: As can be shown, suggestions are completely relevant to my rating as a user. This can be embedded into a block and later on can be assessed using API for practical applications.**