# Introduction to Recommender Systems

<p align="center">
    <img width="721" alt="cover-image" src="https://user-images.githubusercontent.com/49638680/204351915-373011d3-75ac-4e21-a6df-99cd1c552f2c.png">
</p>

---

# Collaborative Filtering

Until now, we used item features to create a prediction for a possible ranking $y_{ij}$ of an item $j$ by a certain user $i$. 
This approach is actually really useful, even if we explored only a subset of possibilities.

Before going further in this very promising approach, let's consider an alternative way to build recommendations.

Indeed, here we do not want to exploit the "knowledge" we have about users and items, but only referring to their ratings.

The idea is to learn about users and items by looking at the "scores" users assign to the same items.
On one hand, we will learn about similar users since they will rate the same items similarly; on the other hand, similar items will be close in scores for the same user.

## Advantages and disadvantages of collaborative filtering

### Advantages

* No domain knowledge necessary.

* The model can help users discover new interests. In isolation, the ML system may not know the user is interested in a given item, but the model might still recommend it because similar users are interested in that item.

* The system needs only the rating matrix to train a collaborative filtering model. In particular, the system does not need contextual features.

### Disadvantages

* Cold start. The system cannot handle new items nor users.

* Including extra features can be messy and toilsome.

## <img align="left" src="../images/movie_camera.png"     style=" width:40px;  " > Collaborative Filtering Recommender Systems

In this notebook, we will implement collaborative filtering to build a recommender system for movies.

# <img align="left" src="../images/film_reel.png"     style=" width:40px;  " > Outline
- [ 1 - Notation](#1)
- [ 2 - Recommender Systems](#2)
- [ 3 - Movie ratings dataset](#3)
- [ 4 - Collaborative filtering learning algorithm](#4)
  - [ 4.1 Collaborative filtering cost function](#4.1)
    - [ Exercise 1](#ex01)
- [ 5 - Learning movie recommendations](#5)
- [ 6 - Recommendations](#6)

##  Packages <img align="left" src="../images/film_strip_vertical.png"     style=" width:40px;   " >
We will use the now familiar NumPy and Tensorflow Packages.

In [1]:
import numpy as np
import tensorflow as tf

from utils.collab_filter import (
    load_precalc_params_small,
    load_ratings_small,
    load_Movie_List_pd,
    normalizeRatings,
)

# tests utilities
from tests.test_coll_filter import test_cofi_cost_func

<a name="1"></a>
## 1 - Notation


<p align="center">
<table>
  <thead>
    <tr>
      <th>General <br />  Notation</th>
      <th>Description</th>
      <th>Python (if any)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>$r(i,j)$
</td>
      <td>scalar; $= 1$ if user $j$ rated movie $i$, $= 0$ otherwise</td>
      <td></td>
    </tr>
    <tr>
      <td>
        $y(i,j)$
      </td>
      <td>scalar; = rating given by user $j$ on movie $i$ (defined only if $r(i,j) = 1$)</td>
      <td></td>
    </tr>
    <tr>
      <td>$w(j)$
</td>
      <td>vector; parameters for user $j$</td>
      <td></td>
    </tr>
    <tr>
      <td>$b(j)$
</td>
      <td>scalar; parameter for user $j$</td>
      <td></td>
    </tr>
    <tr>
      <td>
        $x(i)$
      </td>
      <td>vector; feature ratings for movie $i$</td>
      <td></td>
    </tr>
    <tr>
      <td>$n_u$
</td>
      <td>number of users</td>
      <td>num_users</td>
    </tr>
    <tr>
      <td>$n_m$
</td>
      <td>number of movies</td>
      <td>num_movies</td>
    </tr>
    <tr>
      <td>$n$
</td>
      <td>number of features</td>
      <td>num_features</td>
    </tr>
    <tr>
      <td>
        $X$
      </td>
      <td>
        matrix of vectors $x(i)$
      </td>
      <td>
        `X`
      </td>
    </tr>
    <tr>
      <td>
        $W$
      </td>
      <td>
        matrix of vectors $w(j)$
      </td>
      <td>
        `W`
      </td>
    </tr>
    <tr>
      <td>
        $b$
      </td>
      <td>
        vector of bias parameters $b(j)$
      </td>
      <td>
        `b`
      </td>
    </tr>
    <tr>
      <td>
        $R$
      </td>
      <td>
        matrix of elements $r(i,j)$
      </td>
      <td>
        `R`
      </td>
    </tr>
  </tbody>
</table>
</p>

<a name="2"></a>
## 2 - Recommender Systems <img align="left" src="../images/film_rating.png" style=" width:40px;  " >
In this notebook, we will implement the collaborative filtering learning algorithm and apply it to a dataset of movie ratings.
The goal of a collaborative filtering recommender system is to generate two vectors: 
* For each user, a '_parameter vector_' that embodies the movie tastes of a user. 
* For each movie, a _feature vector_ of the same size which embodies some description of the movie. 

The dot product of the two vectors plus the bias term should produce an estimate of the rating the user might give to that movie.

The diagram below details how these vectors are learned.

<figure>
   <img src="../images/ColabFilterLearn.PNG"  style="width:740px;height:250px;" >
</figure>

Existing ratings are provided in matrix form as shown. $Y$ contains ratings; 0.5 to 5 inclusive in 0.5 steps. 0 if the movie has not been rated. $R$ has a 1 where movies have been rated. Movies are in rows, users in columns. Each user has a parameter vector $w^{user}$ and bias. Each movie has a feature vector $x^{movie}$. These vectors are simultaneously learned by using the existing user/movie ratings as training data. One training example is shown above: $\mathbf{w}^{(1)} \cdot \mathbf{x}^{(1)} + b^{(1)} = 4$. It is worth noting that the feature vector $x^{movie}$ must satisfy all the users while the user vector $w^{user}$ must satisfy all the movies. This is the source of the name of this approach - all the users collaborate to generate the rating set. 

<figure>
   <img src="../images/ColabFilterUse.PNG"  style="width:640px;height:250px;" >
</figure>

Once the feature vectors and parameters are learned, they can be used to predict how a user might rate an unrated movie. This is shown in the diagram above. The equation is an example of predicting a rating for user one on movie zero.


In this exercise, you will implement the function `cofiCostFunc` that computes the collaborative filtering
objective function. After implementing the objective function, you will use a TensorFlow custom training loop to learn the parameters for collaborative filtering. The first step is to detail the data set and data structures that will be used in the lab.

<a name="3"></a>
## 3 - Movie ratings dataset <img align="left" src="../images/film_rating.png"     style=" width:40px;  " >
The data set is derived from the [MovieLens "ml-latest-small"](https://grouplens.org/datasets/movielens/latest/) dataset.   
[F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. <https://doi.org/10.1145/2827872>]

The original dataset has  9000 movies rated by 600 users. The dataset has been reduced in size to focus on movies from the years since 2000. This dataset consists of ratings on a scale of 0.5 to 5 in 0.5 step increments. The reduced dataset has $n_u = 443$ users, and $n_m= 4778$ movies. 

Below, you will load the movie dataset into the variables $Y$ and $R$.

The matrix $Y$ (a  $n_m \times n_u$ matrix) stores the ratings $y^{(i,j)}$. The matrix $R$ is an binary-valued indicator matrix, where $R(i,j) = 1$ if user $j$ gave a rating to movie $i$, and $R(i,j)=0$ otherwise. 

Throughout this part of the exercise, you will also be working with the
matrices, $\mathbf{X}$, $\mathbf{W}$ and $\mathbf{b}$: 

$$\mathbf{X} = 
\begin{bmatrix}
--- (\mathbf{x}^{(0)})^T --- \\
--- (\mathbf{x}^{(1)})^T --- \\
\vdots \\
--- (\mathbf{x}^{(n_m-1)})^T --- \\
\end{bmatrix} , \quad
\mathbf{W} = 
\begin{bmatrix}
--- (\mathbf{w}^{(0)})^T --- \\
--- (\mathbf{w}^{(1)})^T --- \\
\vdots \\
--- (\mathbf{w}^{(n_u-1)})^T --- \\
\end{bmatrix},\quad
\mathbf{ b} = 
\begin{bmatrix}
 b^{(0)}  \\
 b^{(1)} \\
\vdots \\
b^{(n_u-1)} \\
\end{bmatrix}\quad
$$ 

The $i$-th row of $\mathbf{X}$ corresponds to the
feature vector $x^{(i)}$ for the $i$-th movie, and the $j$-th row of
$\mathbf{W}$ corresponds to one parameter vector $\mathbf{w}^{(j)}$, for the
$j$-th user. Both $x^{(i)}$ and $\mathbf{w}^{(j)}$ are $n$-dimensional
vectors. For the purposes of this exercise, you will use $n=10$, and
therefore, $\mathbf{x}^{(i)}$ and $\mathbf{w}^{(j)}$ have 10 elements.
Correspondingly, $\mathbf{X}$ is a
$n_m \times 10$ matrix and $\mathbf{W}$ is a $n_u \times 10$ matrix.

We will start by loading the movie ratings dataset to understand the structure of the data.
We will load $Y$ and $R$ with the movie dataset.  
We'll also load $\mathbf{X}$, $\mathbf{W}$, and $\mathbf{b}$ with pre-computed values. These values will be learned later in the lab, but we'll use pre-computed values to develop the cost model.

In [7]:
%%bash
megadl 'https://mega.nz/file/lRxkiQJL#cpzfEDq055a3U8ICbTZHpCQx8biRRYaocugkfTtTMRQ'
if [ ! -d "../data/" ]; then mkdir -p ../data/; fi
unzip -o movielens_small.zip -d ../data/

ERROR: Download failed for 'https://mega.nz/file/lRxkiQJL#cpzfEDq055a3U8ICbTZHpCQx8biRRYaocugkfTtTMRQ': Local file already exists: ./movielens_small.zip


Archive:  movielens_small.zip
  inflating: ../data/small_movielens/small_movie_list.csv  
  inflating: ../data/small_movielens/__MACOSX/._small_movie_list.csv  
  inflating: ../data/small_movielens/small_movies_b.csv  
  inflating: ../data/small_movielens/__MACOSX/._small_movies_b.csv  
  inflating: ../data/small_movielens/small_movies_R.csv  
  inflating: ../data/small_movielens/__MACOSX/._small_movies_R.csv  
  inflating: ../data/small_movielens/small_movies_W.csv  
  inflating: ../data/small_movielens/__MACOSX/._small_movies_W.csv  
  inflating: ../data/small_movielens/small_movies_X.csv  
  inflating: ../data/small_movielens/__MACOSX/._small_movies_X.csv  
  inflating: ../data/small_movielens/small_movies_Y.csv  
  inflating: ../data/small_movielens/__MACOSX/._small_movies_Y.csv  


In [2]:
# Load data
X, W, b, num_movies, num_features, num_users = load_precalc_params_small()
Y, R = load_ratings_small()

print("Y", Y.shape, "R", R.shape)
print("X", X.shape)
print("W", W.shape)
print("b", b.shape)
print("num_features", num_features)
print("num_movies", num_movies)
print("num_users", num_users)

Y (4778, 443) R (4778, 443)
X (4778, 10)
W (443, 10)
b (1, 443)
num_features 10
num_movies 4778
num_users 443


In [3]:
#  From the matrix, we can compute statistics like average rating.
tsmean = np.mean(Y[0, R[0, :].astype(bool)])
print(f"Average rating for movie 1 : {tsmean:0.3f} / 5")

Average rating for movie 1 : 3.400 / 5


<a name="4"></a>
## 4 - Collaborative filtering learning algorithm <img align="left" src="../images/film_filter.png"     style=" width:40px;  " >

Now, you will begin implementing the collaborative filtering learning
algorithm. You will start by implementing the objective function. 

The collaborative filtering algorithm in the setting of movie
recommendations considers a set of $n$-dimensional parameter vectors
$\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)}$, $\mathbf{w}^{(0)},...,\mathbf{w}^{(n_u-1)}$ and $b^{(0)},...,b^{(n_u-1)}$, where the
model predicts the rating for movie $i$ by user $j$ as
$y^{(i,j)} = \mathbf{w}^{(j)}\cdot \mathbf{x}^{(i)} + b^{(j)}$ . Given a dataset that consists of
a set of ratings produced by some users on some movies, you wish to
learn the parameter vectors $\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)},
\mathbf{w}^{(0)},...,\mathbf{w}^{(n_u-1)}$  and $b^{(0)},...,b^{(n_u-1)}$ that produce the best fit (minimizes
the squared error).

You will complete the code in cofiCostFunc to compute the cost
function for collaborative filtering. 


<a name="4.1"></a>
### 4.1 Collaborative filtering cost function

The collaborative filtering cost function is given by
$$J({\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)},\mathbf{w}^{(0)},b^{(0)},...,\mathbf{w}^{(n_u-1)},b^{(n_u-1)}})= \frac{1}{2}\sum_{(i,j):r(i,j)=1}(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2
+\underbrace{
\frac{\lambda}{2}
\sum_{j=0}^{n_u-1}\sum_{k=0}^{n-1}(\mathbf{w}^{(j)}_k)^2
+ \frac{\lambda}{2}\sum_{i=0}^{n_m-1}\sum_{k=0}^{n-1}(\mathbf{x}_k^{(i)})^2
}_{regularization}
\tag{1}$$
The first summation in (1) is "for all $i$, $j$ where $r(i,j)$ equals $1$" and could be written:

$$
= \frac{1}{2}\sum_{j=0}^{n_u-1} \sum_{i=0}^{n_m-1}r(i,j)*(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2
+\text{regularization}
$$

You should now write cofiCostFunc (collaborative filtering cost function) to return this cost.

<a name="ex01"></a>
### Exercise 1

**For loop Implementation:**   
Start by implementing the cost function using for loops.
Consider developing the cost function in two steps. First, develop the cost function without regularization. A test case that does not include regularization is provided below to test your implementation. Once that is working, add regularization and run the tests that include regularization.  Note that you should be accumulating the cost for user $j$ and movie $i$ only if $R(i,j) = 1$.

In [16]:
def cofi_cost_func(X, W, b, Y, R, lambda_):
    """
    Returns the cost for the content-based filtering

    Parameters
    ----------
    X: np.ndarray (num_movies, num_features)
      matrix of item features
    W: np.ndarray (num_users, num_features)
      matrix of user parameters
    b: np.ndarray (1, num_users)
      vector of user parameters
    Y: np.ndarray (num_movies, num_users)
      matrix of user ratings of movies
    R: np.ndarray (num_movies, num_users)
      matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
    lambda_: float
      regularization parameter

    Returns
    -------
    float
        The value of the cost given the parameters.
    """
    nm, nu = Y.shape
    J = 0
    ### START CODE HERE ###

    ### END CODE HERE ###

    return J

<details>
  <summary><font size="3" color="darkgreen"><b>Click for hints</b></font></summary>
    You can structure the code in two for loops similar to the summation in (1).   
    Implement the code without regularization first.   
    Note that some of the elements in (1) are vectors. Use np.dot(). You can also use np.square().
    Pay close attention to which elements are indexed by i and which are indexed by j. Don't forget to divide by two.
    
```python     
    ### START CODE HERE ###  
    for j in range(nu):
        
        
        for i in range(nm):
            
            
    ### END CODE HERE ### 
```    
<details>
    <summary><font size="2" color="darkblue"><b> Click for more hints</b></font></summary>
        
    Here is some more details. The code below pulls out each element from the matrix before using it. 
    One could also reference the matrix directly.  
    This code does not contain regularization.
    
```python 
    nm,nu = Y.shape
    J = 0
    ### START CODE HERE ###  
    for j in range(nu):
        w = W[j,:]
        b_j = b[0,j]
        for i in range(nm):
            x = 
            y = 
            r =
            J += 
    J = J/2
    ### END CODE HERE ### 

```
    
<details>
    <summary><font size="2" color="darkblue"><b>Last Resort (full non-regularized implementation)</b></font></summary>
    
```python 
    nm,nu = Y.shape
    J = 0
    ### START CODE HERE ###  
    for j in range(nu):
        w = W[j,:]
        b_j = b[0,j]
        for i in range(nm):
            x = X[i,:]
            y = Y[i,j]
            r = R[i,j]
            J += np.square(r * (np.dot(w,x) + b_j - y ) )
    J = J/2
    ### END CODE HERE ### 
```
    
<details>
    <summary><font size="2" color="darkblue"><b>regularization</b></font></summary>
     Regularization just squares each element of the W array and X array and then sums all the squared elements.
     You can utilize np.square() and np.sum().

<details>
    <summary><font size="2" color="darkblue"><b>regularization details</b></font></summary>
    
```python 
    J += (lambda_/2) * (np.sum(np.square(W)) + np.sum(np.square(X)))
```
    
</details>
</details>
</details>
</details>

    


In [17]:
# Reduce the data set size so that this runs faster
num_users_r = 4
num_movies_r = 5
num_features_r = 3

X_r = X[:num_movies_r, :num_features_r]
W_r = W[:num_users_r, :num_features_r]
b_r = b[0, :num_users_r].reshape(1, -1)
Y_r = Y[:num_movies_r, :num_users_r]
R_r = R[:num_movies_r, :num_users_r]

# Evaluate cost function
J = cofi_cost_func(X_r, W_r, b_r, Y_r, R_r, 0)
print(f"Cost: {J:0.2f}")

Cost: 13.67


**Expected Output (lambda = 0)**:  
$13.67$.

In [18]:
# Evaluate cost function with regularization
J = cofi_cost_func(X_r, W_r, b_r, Y_r, R_r, 1.5)
print(f"Cost (with regularization): {J:0.2f}")

Cost (with regularization): 28.09


**Expected Output**:

28.09

In [19]:
# test function
test_cofi_cost_func(cofi_cost_func)

[92mAll tests passed!


##### Vectorised Implementation

It is important to create a vectorized implementation to compute $J$, since it will later be called many times during optimisation. The linear algebra utilised is not the focus of this series, so the implementation is provided. If you are an expert in linear algebra, feel free to create your version without referencing the code below. 

Run the code below and verify that it produces the same results as the non-vectorised version.

In [20]:
def cofi_cost_func_v(X, W, b, Y, R, lambda_):
    """
    Returns the cost for the content-based filtering
    Vectorised for speed. Uses tensorflow operations to be compatible with custom training loop.

    Parameters
    ----------
    X: np.ndarray (num_movies,num_features)
      matrix of item features
    W: np.ndarray (num_users,num_features)
      matrix of user parameters
    b: np.ndarray (1, num_users)
      vector of user parameters
    Y: np.ndarray (num_movies,num_users)
      matrix of user ratings of movies
    R: np.ndarray (num_movies,num_users)
      matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
    lambda_: float
      regularization parameter

    Returns
    -------
    float
        The value of the cost given the parameters.
    """
    # Enter the vectorised implementation of the cost function here
    return J

<details>
  <summary><font size="3" color="darkgreen"><b>Click for the solution</b></font></summary>
```python
    
    def cofi_cost_func_v(X, W, b, Y, R, lambda_):
      """
      Returns the cost for the content-based filtering
      Vectorised for speed. Uses tensorflow operations to be compatible with custom training loop.

      Parameters
      ----------
      X: np.ndarray (num_movies,num_features)
        matrix of item features
      W: np.ndarray (num_users,num_features)
        matrix of user parameters
      b: np.ndarray (1, num_users)
        vector of user parameters
      Y: np.ndarray (num_movies,num_users)
        matrix of user ratings of movies
      R: np.ndarray (num_movies,num_users)
        matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
      lambda_: float
        regularization parameter

      Returns
      -------
      float
          The value of the cost given the parameters.
      """
      j = (tf.linalg.matmul(X, tf.transpose(W)) + b - Y) * R
      J = 0.5 * tf.reduce_sum(j**2) + (lambda_ / 2) * (
          tf.reduce_sum(X**2) + tf.reduce_sum(W**2)
      )
      return J
```
</details>

    


In [21]:
# Evaluate cost function
J = cofi_cost_func_v(X_r, W_r, b_r, Y_r, R_r, 0)
print(f"Cost: {J:0.2f}")

# Evaluate cost function with regularization
J = cofi_cost_func_v(X_r, W_r, b_r, Y_r, R_r, 1.5)
print(f"Cost (with regularization): {J:0.2f}")

Metal device set to: Apple M2

systemMemory: 16.00 GB
maxCacheSize: 5.33 GB

Cost: 13.67
Cost (with regularization): 28.09


2023-03-16 18:44:04.440174: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-03-16 18:44:04.440668: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


**Expected Output**:  
Cost: 13.67  
Cost (with regularization): 28.09

<a name="5"></a>
## 5 - Learning movie recommendations <img align="left" src="../images/film_man_action.png" style=" width:40px;  " >
------------------------------

After you have finished implementing the collaborative filtering cost
function, you can start training your algorithm to make
movie recommendations for yourself. 

In the cell below, you can enter your own movie choices. The algorithm will then make recommendations for you! We have filled out some values according to our preferences, but after you have things working with our choices, you should change this to match your tastes.
A list of all movies in the dataset is in the file [movie list](data/small_movie_list.csv).

In [22]:
movieList, movieList_df = load_Movie_List_pd()

my_ratings = np.zeros(num_movies)  #  Initialise my ratings

# Check the file small_movie_list.csv for id of each movie in our dataset
# For example, Toy Story 3 (2010) has ID 2700, so to rate it "5", you can set
my_ratings[2700] = 5

# Or suppose you did not enjoy Persuasion (2007), you can set
my_ratings[2609] = 2

# We have selected a few movies we liked / did not like and the ratings we
# gave are as follows:
my_ratings[929] = 5  # Lord of the Rings: The Return of the King, The
my_ratings[246] = 5  # Shrek (2001)
my_ratings[2716] = 3  # Inception
my_ratings[1150] = 5  # Incredibles, The (2004)
my_ratings[382] = 2  # Amelie (Fabuleux destin d'Amélie Poulain, Le)
my_ratings[366] = (
    5  # Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)
)
my_ratings[622] = 5  # Harry Potter and the Chamber of Secrets (2002)
my_ratings[988] = 3  # Eternal Sunshine of the Spotless Mind (2004)
my_ratings[2925] = 1  # Louis Theroux: Law & Disorder (2008)
my_ratings[2937] = 1  # Nothing to Declare (Rien à déclarer)
my_ratings[793] = 5  # Pirates of the Caribbean: The Curse of the Black Pearl (2003)
my_rated = [i for i in range(len(my_ratings)) if my_ratings[i] > 0]

print("\nNew user ratings:\n")
for i in range(len(my_ratings)):
    if my_ratings[i] > 0:
        print(f'Rated {my_ratings[i]} for  {movieList_df.loc[i,"title"]}')


New user ratings:

Rated 5.0 for  Shrek (2001)
Rated 5.0 for  Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)
Rated 2.0 for  Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001)
Rated 5.0 for  Harry Potter and the Chamber of Secrets (2002)
Rated 5.0 for  Pirates of the Caribbean: The Curse of the Black Pearl (2003)
Rated 5.0 for  Lord of the Rings: The Return of the King, The (2003)
Rated 3.0 for  Eternal Sunshine of the Spotless Mind (2004)
Rated 5.0 for  Incredibles, The (2004)
Rated 2.0 for  Persuasion (2007)
Rated 5.0 for  Toy Story 3 (2010)
Rated 3.0 for  Inception (2010)
Rated 1.0 for  Louis Theroux: Law & Disorder (2008)
Rated 1.0 for  Nothing to Declare (Rien à déclarer) (2010)


Now, let's add these reviews to $Y$ and $R$ and normalize the ratings.

In [23]:
# Reload ratings
Y, R = load_ratings_small()

# Add new user ratings to Y
Y = np.c_[my_ratings, Y]

# Add new user indicator matrix to R
R = np.c_[(my_ratings != 0).astype(int), R]

# Normalise the Dataset
Ynorm, Ymean = normalizeRatings(Y, R)

In [24]:
my_ratings.shape

(4778,)

Let's prepare to train the model. Initialize the parameters and select the Adam optimizer.

In [25]:
#  Useful Values
num_movies, num_users = Y.shape
num_features = 100

# Set Initial Parameters (W, X), use tf.Variable to track these variables
tf.random.set_seed(1234)  # for consistent results
W = tf.Variable(tf.random.normal((num_users, num_features), dtype=tf.float64), name="W")
X = tf.Variable(
    tf.random.normal((num_movies, num_features), dtype=tf.float64), name="X"
)
b = tf.Variable(tf.random.normal((1, num_users), dtype=tf.float64), name="b")

# Instantiate an optimizer.
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-1)

Let's now train the collaborative filtering model. This will learn the parameters $\mathbf{X}$, $\mathbf{W}$, and $\mathbf{b}$. 

The operations involved in learning $w$, $b$, and $x$ simultaneously do not fall into the typical 'layers' offered in the TensorFlow neural network package.  Consequently, the flow used in Course 2: Model, Compile(), Fit(), Predict(), are not directly applicable. Instead, we can use a custom training loop.

Recall from earlier labs the steps of gradient descent.
- repeat until convergence:
    - compute forward pass
    - compute the derivatives of the loss relative to parameters
    - update the parameters using the learning rate and the computed derivatives 
    
TensorFlow has the marvelous capability of calculating the derivatives for you. This is shown below. Within the `tf.GradientTape()` section, operations on Tensorflow Variables are tracked. When `tape.gradient()` is later called, it will return the gradient of the loss relative to the tracked variables. The gradients can then be applied to the parameters using an optimizer. 
This is a very brief introduction to a useful feature of TensorFlow and other machine learning frameworks. Further information can be found by investigating "custom training loops" within the framework of interest.
    


In [None]:
iterations = 200
lambda_ = 1
for iter in range(iterations):
    # Use TensorFlow’s GradientTape
    # to record the operations used to compute the cost
    with tf.GradientTape() as tape:
        # Compute the cost (forward pass included in cost)
        cost_value = cofi_cost_func_v(X, W, b, Ynorm, R, lambda_)

    # Use the gradient tape to automatically retrieve
    # the gradients of the trainable variables with respect to the loss
    grads = tape.gradient(cost_value, [X, W, b])

    # Run one step of gradient descent by updating
    # the value of the variables to minimize the loss.
    optimizer.apply_gradients(zip(grads, [X, W, b]))

    # Log periodically.
    if iter % 20 == 0:
        print(f"Training loss at iteration {iter}: {cost_value:0.1f}")

<a name="6"></a>
## 6 - Recommendations
Below, we compute the ratings for all the movies and users and display the movies that are recommended. These are based on the movies and ratings entered as `my_ratings[]` above. To predict the rating of movie $i$ for user $j$, you compute $\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)}$. This can be computed for all ratings using matrix multiplication.

In [14]:
# Make a prediction using trained weights and biases
p = np.matmul(X.numpy(), np.transpose(W.numpy())) + b.numpy()

# restore the mean
pm = p + Ymean

my_predictions = pm[:, 0]

# sort predictions
ix = tf.argsort(my_predictions, direction="DESCENDING")

for i in range(17):
    j = ix[i]
    if j not in my_rated:
        print(f"Predicting rating {my_predictions[j]:0.2f} for movie {movieList[j]}")

print("\n\nOriginal vs Predicted ratings:\n")
for i in range(len(my_ratings)):
    if my_ratings[i] > 0:
        print(
            f"Original {my_ratings[i]}, Predicted {my_predictions[i]:0.2f} for {movieList[i]}"
        )

Predicting rating 4.49 for movie My Sassy Girl (Yeopgijeogin geunyeo) (2001)
Predicting rating 4.48 for movie Martin Lawrence Live: Runteldat (2002)
Predicting rating 4.48 for movie Memento (2000)
Predicting rating 4.47 for movie Delirium (2014)
Predicting rating 4.47 for movie Laggies (2014)
Predicting rating 4.47 for movie One I Love, The (2014)
Predicting rating 4.46 for movie Particle Fever (2013)
Predicting rating 4.45 for movie Eichmann (2007)
Predicting rating 4.45 for movie Battle Royale 2: Requiem (Batoru rowaiaru II: Chinkonka) (2003)
Predicting rating 4.45 for movie Into the Abyss (2011)


Original vs Predicted ratings:

Original 5.0, Predicted 4.90 for Shrek (2001)
Original 5.0, Predicted 4.84 for Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)
Original 2.0, Predicted 2.13 for Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001)
Original 5.0, Predicted 4.88 for Harry Potter and the Chamber of Secrets (2002)
Original 5.0, Predic

In practice, additional information can be utilized to enhance our predictions. Above, the predicted ratings for the first few hundred movies lie in a small range. We can augment the above by selecting from those top movies, movies that have high average ratings and movies with more than 20 ratings. This section uses a [Pandas](https://pandas.pydata.org/) data frame which has many handy sorting features.

In [15]:
filter = movieList_df["number of ratings"] > 20
movieList_df["pred"] = my_predictions
movieList_df = movieList_df.reindex(
    columns=["pred", "mean rating", "number of ratings", "title"]
)
movieList_df.loc[ix[:300]].loc[filter].sort_values("mean rating", ascending=False)

Unnamed: 0,pred,mean rating,number of ratings,title
1743,4.030961,4.252336,107,"Departed, The (2006)"
2112,3.985281,4.238255,149,"Dark Knight, The (2008)"
211,4.477798,4.122642,159,Memento (2000)
929,4.887054,4.118919,185,"Lord of the Rings: The Return of the King, The..."
2700,4.796531,4.109091,55,Toy Story 3 (2010)
653,4.357304,4.021277,188,"Lord of the Rings: The Two Towers, The (2002)"
1122,4.004471,4.006494,77,Shaun of the Dead (2004)
1841,3.980649,4.0,61,Hot Fuzz (2007)
3083,4.084643,3.993421,76,"Dark Knight Rises, The (2012)"
2804,4.434171,3.989362,47,Harry Potter and the Deathly Hallows: Part 1 (...


## Exercises

### Explore Different Regularisation Parameters

**Objective**: Understand and mitigate the effects of overfitting and underfitting in the collaborative filtering model by tuning the regularisation parameter.

**Tasks**:
* Experiment with at least three different values of the regularisation parameter (`lambda_`).
* Analyse the impact of these values on the model's performance, particularly looking at the trade-off between bias and variance.
* Identify the regularisation value that provides the best balance between avoiding overfitting and underfitting, and justify your choice.

### Cold Start Problem

**Objective**: Develop a strategy to handle the cold start problem for new users or movies with sparse or no ratings.

**Tasks**:
* Propose a theoretical strategy or algorithm to provide recommendations for new users or movies.
* Implement your proposed solution in Python as an extension to the existing collaborative filtering model.
* Test your solution with hypothetical new users or movies data and discuss its effectiveness and limitations.

### Hybrid Recommender System

**Objective**: Create a hybrid model that integrates collaborative filtering with content-based recommendations to improve overall recommendation quality.

**Tasks**:
* Design a simple content-based filtering approach using movie genres or tags.
* Integrate this content-based filter with the collaborative filtering model developed in the notebook.
* Evaluate the hybrid model's performance by comparing it to the standalone collaborative filtering and content-based models. Discuss any improvements or changes in the recommendations provided by the hybrid model.