# Recommender Systerms for Movies

I will implement the **collaborative filtering learning algorithm** and apply it to a dataset of movie ratings from [MovieLens 100k Dataset](https://grouplens.org/datasets/movielens/) from GroupLens Research. This dataset consists of ratings on a scale of 1 to 5. Also, if users did not give rating on movies, the ratings are assigned to 0. The dataset has $n_u = 943$ users, and $n_m = 1682$ movies. 

To complete this task, I will implement the function `cofiCostFunc.m` by **Octave/MATLAB** that computes the collaborative filtering objective function and gradient. After implementing the cost function and gradient, I will use `fmincg.m` to learn the parameters for collaborative filtering. 

## 1. Movie Ratings Dataset ##

After loading the dataset, there are two variables constructed: `Y` and `R`.

The matrix `Y` stores the ratings $y^{(i,j)}$, ranging from 0 to 5, done by user $j$ for movie $i$. Note that value 0 indicates users did not rate the movie. The matrix `R` is an binary-valued indicator matrix, where $R(i,j) = 1$ if user $j$ gave a rating to movie $i$, or $R(i,j)=0$. The goal of collaborative filtering is to predict movie ratings for the movies that users have not yet rated, that is, the entries with $R(i,j)=0$. This can allow us to recommend the movies with the highest predicted ratings to the user.

The matrix `Y` is $n_m \times n_u$ (number of movies $\times$ number of users). We denote `X` and `Theta` as the collection of the feature vectors $x^{(i)}$ for the $i$-the movie and parameter vector $\theta^{(j)}$ for the $j$-th user, respectively:

$$ X = \begin{bmatrix} \cdots (x^{(1)})^T  \cdots \\
                       \cdots (x^{(2)})^T \cdots \\
                       \vdots \\
                       \cdots (x^{(n_m)})^T \cdots \end{bmatrix}, \qquad 
   Theta = \begin{bmatrix} \cdots (\theta^{(1)})^T  \cdots \\
                           \cdots (\theta^{(2)})^T \cdots \\
                           \vdots \\
                           \cdots (\theta^{(n_u)})^T \cdots \end{bmatrix}$$


Both $x^{(i)}$ and $\theta^{(j)}$ are $n$-dimensional vectors. In this work, I will use $n=100$. Thus, $x^{(i)}\in \mathbb{R}^{100}$ and $\theta^{(j)} \in \mathbb{R}^{100}$. Correspondingly, `X` is a $n_m \times 100$ matrix and `Theta` is a $n_u \times 100$ matrix.

## 2. Collaborative Filtering Learning Algorithm ##

In this section, I introduce the algorithm, called **Collaborative Filtering Learning Algorithm**, to implement for movie recommender systems.

The collaborative filtering algorithm in the setting of movie recommendations considers a set of $n$-dimensional parameter vectors $x^{(1)}$, $\ldots$, $x^{(n_m)}$ and $\theta^{(1)}$, $\ldots$, $\theta^{(n_u)}$, where the model predicts the rating for movie $i$ by user $j$ as $(\theta^{(j)})^T x^{(i)}$, predicted ratings. Given a dataset, consisting of a set of ratings produced by some users on some movies, the goal here is to learn the parameter vectors $x^{(1)}$, $\ldots$, $x^{(n_m)}$ and $\theta^{(1)}$, $\ldots$, $\theta^{(n_u)}$, where the model predicts the rating for movie $i$ by user $j$ as $(\theta^{(j)})^T x^{(i)}$, **predicted rating**. Given a dataset that consists of a set of ratings produced by some users on some movies, the goal is to learn the parameter vectors $x^{(1)}$, $\ldots$, $x^{(n_m)}$, $\theta^{(1)}$, $\ldots$, $\theta^{(n_u)}$ that yield the best fit, or minimize the **regularized cost function**,

$$
\begin{eqnarray}
 J(x^{(1)}, \ldots, x^{(n_m)}, \theta^{(1)}, \ldots, \theta^{(n_u)}) &=& \frac{1}{2} \sum_{(i,j):r(i,j) = 1} \left((\theta^{(j)})^T x^{(i)} - y^{(i,j)}\right)^2  \\
      && + \left(\frac{\lambda}{2} \sum_{j=1}^{n_u} \sum_{k=1}^n (\theta_k^{(j)})^2 \right) + \left(\frac{\lambda}{2} \sum_{i=1}^{n_m} \sum_{k=1}^n (x_k^{(i)})^2 \right)
\end{eqnarray}
$$

where the summation $\sum_{(i,j):r(i,j) = 1}$ only sums over pairs $(i,j)$ such that $r(i,j)=1$. Recall that $r(i,j)=1$ only if user $j$ gave rating for movie $i$, or $r(i,j)=0$.

Next, the regularized gradient is given by

$$ 
\begin{eqnarray}
\frac{\partial J}{\partial x_k^{(i)}} &=& \sum_{j:r(i,j)=1} \left( (\theta^{(j)})^T x^{(i)} - y^{(i,j)} \right)\theta_k^{(j)} + \lambda x_k^{(i)}, \\
\frac{\partial J}{\partial \theta_k^{(j)}} &=& \sum_{i:r(i,j)=1} \left( (\theta^{(j)})^T x^{(i)} -y^{(i,j)} \right)x_k^{(i)} + \lambda \theta_k^{(j)}.
\end{eqnarray}
$$

where $\lambda$ is the regularized constant.

Then I wrote the function, `cofiCostFunc`, to calculate the cost function and the gradient as shown in the following:

```octave
function [J, grad] = cofiCostFunc(params, Y, R, num_users, num_movies, num_features, lambda)
%COFICOSTFUNC Collaborative filtering cost function
%   [J, grad] = COFICOSTFUNC(params, Y, R, num_users, num_movies, ...
%   num_features, lambda) returns the cost and gradient for the
%   collaborative filtering problem.
%

% Unfold the X and Theta matrices from params
X = reshape(params(1:num_movies*num_features), num_movies, num_features);
Theta = reshape(params(num_movies*num_features+1:end), ...
                num_users, num_features);

            
% You need to return the following values correctly
J = 0;
X_grad = zeros(size(X));
Theta_grad = zeros(size(Theta));

% ----------- Cost Function ---------------------------

J = sum(sum(( ((X * (Theta')) - Y).^2 ).*R ))/2.0;


% ----------- Gradient ---------------------------------
X_grad = ((X*(Theta'))-Y).*R * Theta;
Theta_grad = (((X*(Theta')) - Y).*R)' * X;

% Add regularized terms
if ~(lambda == 0)
  J = J + ( sum(sum(Theta.^2)) + sum(sum(X.^2)) )*(lambda/2.0);
  X_grad = X_grad + lambda* X;
  Theta_grad = Theta_grad + lambda*Theta;
end

% =============================================================

grad = [X_grad(:); Theta_grad(:)];

end
```

Note that `params` is `[X(:) ; Theta(:)]` in our case.

With `cofiCostFunc`, I implement an minimizer `fmincg`. One can find `fmincg` function [here](https://gist.github.com/Mizzlr/c232dc0d724b3726dba29de5717521d7).

## 3. Learning Movie Recommendations ##

After implementing the collaborative filtering cost function and gradient, it is ready to train algorithm to make movie recommendations. I enter my own movie preferences such that when the algorithm runs, I can get my own movie recommendations. The movies I rated are listed in the following:

```
Rated 4 for Toy Story (1995)
Rated 3 for Twelve Monkeys (1995)
Rated 5 for Usual Suspects, The (1995)
Rated 4 for Outbreak (1995)
Rated 5 for Shawshank Redemption, The (1994)
Rated 3 for While You Were Sleeping (1995)
Rated 5 for Forrest Gump (1994)
Rated 2 for Silence of the Lambs, The (1991)
Rated 4 for Alien (1979)
Rated 5 for Die Hard 2 (1990)
Rated 5 for Sphere (1998)
```

When training, $\lambda$ is set to 10.

In the end, the top recommendations for me are listed below:

```
Predicting rating 5.0 for movie Santa with Muscles (1996)
Predicting rating 5.0 for movie Star Kid (1997)
Predicting rating 5.0 for movie Prefontaine (1997)
Predicting rating 5.0 for movie Someone Else's America (1995)
Predicting rating 5.0 for movie Entertaining Angels: The Dorothy Day Story
 (1996)
Predicting rating 5.0 for movie Marlene Dietrich: Shadow and Light (1996)
Predicting rating 5.0 for movie Saint of Fort Washington, The (1993)
Predicting rating 5.0 for movie Aiqing wansui (1994)
Predicting rating 5.0 for movie They Made Me a Criminal (1939)
Predicting rating 5.0 for movie Great Day in Harlem, A (1994)
```