 # Outline

 - [1-Packages](#1)
 - [2-Load Dataset](#2)
 - [&nbsp;&nbsp; 2.1-Function to load dataset](#2.1)
 - [&nbsp;&nbsp; 2.2 Load & View Dataset size](#2.2)
 - [3-Cost Function Implementation](#3)
 - [&nbsp;&nbsp;3.1-Cost function using for loop](#3.1)
 - [&nbsp;&nbsp;3.2-Cost Function using vectorization with numpy implementation](#3.2)
 - [&nbsp;&nbsp;3.3-ost Function using vectorization with Tensorflow implementation-1](#3.3)
 - [&nbsp;&nbsp;3.4-ost Function using vectorization with Tensorflow implementation-2](#3.4)

<a name="1"></a>
##  1-Packages <img align="left" src="./images/python-logo.png"     style=" width:40px;   " > 

In [1]:
import warnings

# Ignore all future warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

In [2]:
import numpy as np
from numpy import loadtxt
import pandas as pd
import tensorflow as tf




<a name="2"></a>
# 2- Load dataset <img align="left" src="./images/dataset-logo.png" style="width:30px; ">

<a name='2.1'></a>

### 2.1 Function to load datset

In [3]:
def load_dataset():
    file = open('./data/processed/small_movie_X.csv', 'rb')
    X = loadtxt(file, delimiter=",")
    num_movies, num_features = X.shape
    
    file = open('./data/processed/small_movie_R.csv', 'rb')
    R = loadtxt(file, delimiter=",")
    _,num_users = R.shape
    
    file = open('./data/processed/small_movie_Y.csv', 'rb')
    Y = loadtxt(file, delimiter=",")
    return (X, Y, R, num_movies, num_features, num_users)

def load_movie_list():
    df = pd.read_csv('./data/processed/movie_list_df.csv')
    movie_list = df["title"].to_list()
    return(df, movie_list)
    

<a name="2.2"></a>

### 2.2 Load & View Dataset size

In [4]:
X, Y, R, num_movies, num_features, num_users = load_dataset()

df, movie_list = load_movie_list()

# Initialize parameters
W = np.random.rand(num_users, num_features)
b = np.random.rand(1, num_users)

print("Y", Y.shape, "R", R.shape)
print("X", X.shape)
print("W", W.shape)
print("b", b.shape)
print("num_features", num_features)
print("num_movies",   num_movies)
print("num_users",    num_users)

Y (9724, 610) R (9724, 610)
X (9724, 34)
W (610, 34)
b (1, 610)
num_features 34
num_movies 9724
num_users 610


Average rating for movie 1 : 3.921 / 5


<a name="3"></a>

# 3- Cost Function Implementation

The collaborative filtering algorithm in the setting of movie
recommendations considers a set of $n$-dimensional parameter vectors
$\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)}$, $\mathbf{w}^{(0)},...,\mathbf{w}^{(n_u-1)}$ and $b^{(0)},...,b^{(n_u-1)}$, where the
model predicts the rating for movie $i$ by user $j$ as
$y^{(i,j)} = \mathbf{w}^{(j)}\cdot \mathbf{x}^{(i)} + b^{(j)}$ . Given a dataset that consists of
a set of ratings produced by some users on some movies, you wish to
learn the parameter vectors $\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)},
\mathbf{w}^{(0)},...,\mathbf{w}^{(n_u-1)}$  and $b^{(0)},...,b^{(n_u-1)}$ that produce the best fit (minimizes
the squared error).

#### Collaborative filtering cost function

The collaborative filtering cost function is given by
$$J({\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)},\mathbf{w}^{(0)},b^{(0)},...,\mathbf{w}^{(n_u-1)},b^{(n_u-1)}})= \left[ \frac{1}{2}\sum_{(i,j):r(i,j)=1}(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2 \right]
+ \underbrace{\left[
\frac{\lambda}{2}
\sum_{j=0}^{n_u-1}\sum_{k=0}^{n-1}(\mathbf{w}^{(j)}_k)^2
+ \frac{\lambda}{2}\sum_{i=0}^{n_m-1}\sum_{k=0}^{n-1}(\mathbf{x}_k^{(i)})^2
\right]}_{regularization}
\tag{1}$$
The first summation in (1) is "for all $i$, $j$ where $r(i,j)$ equals $1$" and could be written:

$$
= \left[ \frac{1}{2}\sum_{j=0}^{n_u-1} \sum_{i=0}^{n_m-1}r(i,j)*(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2 \right]
+\text{regularization}
$$

<a name="3.1"></a>

## 3.1- Cost function using for loop

In [None]:
def cofi_cost_func_for_loop(X, W, b, Y, R, lambda_):
    """
    Returns the cost for the content-based filtering
    Args:
      X (ndarray (num_movies,num_features)): matrix of item features
      W (ndarray (num_users,num_features)) : matrix of user parameters
      b (ndarray (1, num_users)            : vector of user parameters
      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
      R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
      lambda_ (float): regularization parameter
    Returns:
      J (float) : Cost
    """
    nm, nu = Y.shape
    J = 0
    
    for j in range(nu):
        w = W[j,:]
        b_j = b[0,j]
        
        for i in range(nm):
            x = X[i,:]
            y = Y[i,j]
            r = R[i,j]
            J += np.square(r * (np.dot(w,x) + b_j -y))
    J = J/2
    J += (lambda_ / 2) * (np.sum(sp.square(W)) + np.sum(np.square(X)))
    return J
    
    

<a name="3.2"></a>

## 3.2- Cost Function using vectorization with numpy implementation <img align="left" src="./images/numpy.png" style="width:25px; ">

In [None]:
def cofi_cost_func_numpy(X, W, b, Y, R, lambda_ ):
    """
    Returns the cost for the content-based filtering
    Args:
      X (ndarray (num_movies,num_features)): matrix of item features
      W (ndarray (num_users,num_features)) : matrix of user parameters
      b (ndarray (1, num_users)            : vector of user parameters
      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
      R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
      lambda_ (float): regularization parameter
    Returns:
      J (float) : Cost
    """
    # Vectorized computation of cost
    J = (1/2) * np.sum(R * np.square(np.dot(X, W.T) + b - Y))
    
    # Regularization term
    reg_term = (lambda_/2) * (np.sum(np.square(W)) + np.sum(np.square(X)))
    
    # Compute cost with regularization
    J += reg_term
    
    return J
    

<a name="3.3"></a>

## 3.3- Cost Function using vectorization with Tensorflow implementation-1 <img align="left" src="./images/tf.png" style="width:25px; ">


This implementation requires more memory so below Tensorflow implementation-2 is recommended to compute cost for collaborative filtering: Both are same implementation

In [None]:
def cofi_cost_func_tf1(X, W, b, Y, R, lambda_):
    # Tensorflow Variable
    X_tf = tf.Variable(X, dtype=tf.float32)
    W_tf = tf.Variable(W, dtype=tf.float32)
    b_tf = tf.Variable(b, dtype=tf.float32)
    Y_tf = tf.Variable(Y, dtype=tf.float32)
    R_tf = tf.Variable(R, dtype=tf.float32)
    # Cost computation
    J = 0.5 * tf.reduce_sum(R_tf * tf.square(tf.matmul(X_tf, tf.transpose(W_tf))+ b_tf -Y_tf))
    # Regularization term
    reg_term = 0.5 * lambda_ * (tf.reduce_sum(tf.square(W)) + tf.reduce_sum(tf.square(X)))
    # Compute cost with regularization
    J += reg_term
    return J

<a name="3.4"></a>
 
## 3.4- Cost Function using vectorization with Tensorflow implementation-2 <img align="left" src="./images/tf.png" style="width:25px; ">

Recommended Cost Function Implementation if using Tensorflow

In [None]:
def cofi_cost_func_tf2(X, W, b, Y, R, lambda_):
    """
    Returns the cost for the content-based filtering
    Args:
      X (ndarray (num_movies,num_features)): matrix of item features
      W (ndarray (num_users,num_features)) : matrix of user parameters
      b (ndarray (1, num_users)            : vector of user parameters
      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
      R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
      lambda_ (float): regularization parameter
    Returns:
      J (float) : Cost
    """
    # Compute cost
    j = (tf.linalg.matmul(X, tf.transpose(W)) + b - Y) * R
    J = 0.5 * tf.reduce_sum(j**2) + (lambda_/2) * (tf.reduce_sum(X**2) + tf.reduce_sum(W**2))
    return J
    

<a name="4"></a>

# 4- Learning Recomendation  <img align="left" src="./images/film_rating.png" style="width:40px">

<a name="4.1"> </a>

## 4.1- Initialize my/user ratings for at least 10 movies


You can choose your own ratings here

In [24]:
#---------------------------------------------------------> This part is totally optional and for experimental purpose <------------------------------------------------------

# Set seed to get consistent result for creating movie index
np.random.seed(42)

my_ratings_list_index = np.random.randint(1,9725,size=10).tolist()
my_ratings_list_index

[1290, 7294, 1345, 7292, 9373, 4830, 1521, 9225, 9290, 6401]

In [5]:
my_ratings_list_index = [3569,6726,7571,7750,7752,7784,1182,9173,9288,9417]

In [6]:
# Check the movie names
for i in range(len(my_ratings_list_index)):
    print(f"Index:{my_ratings_list_index[i]} is movie :{df.loc[my_ratings_list_index[i], 'title']}")

Index:3569 is movie :Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)
Index:6726 is movie :Iron Man (2008)
Index:7571 is movie :Thor (2011)
Index:7750 is movie :Dark Knight Rises, The (2012)
Index:7752 is movie :Sherlock Holmes: A Game of Shadows (2011)
Index:7784 is movie :Intouchables (2011)
Index:1182 is movie :Men in Black (a.k.a. MIB) (1997)
Index:9173 is movie :The Devil's Candy (2015)
Index:9288 is movie :Now You See Me 2 (2016)
Index:9417 is movie :Underworld: Blood Wars (2016)


<a name="4.1.1"> </a>


### 4.1.1- Take movie ratings input

In [20]:
my_ratings = np.zeros(num_movies)

# Take inputs
for i in range(len(my_ratings_list_index)):
    while True:
        try:
            user_rating = int(input(f"Enter your rating between 1 & 5 for the movie:{df.loc[my_ratings_list_index[i], 'title']} "))
            if 1 <= user_rating <= 5:
                my_ratings[my_ratings_list_index[i].append(user_rating)
                break
            else:
                print()

Index:1763 is movie :Glen or Glenda (1953)
Index:3513 is movie :Silkwood (1983)
Index:5709 is movie :In My Father's Den (2004)
Index:5025 is movie :Gunga Din (1939)
Index:7800 is movie :For a Good Time, Call... (2012)
Index:7438 is movie :Megamind (2010)
Index:5844 is movie :National Lampoon's Lady Killers (National Lampoon's Gold Diggers) (2003)
Index:1646 is movie :Knock Off (1998)
Index:6149 is movie :Youth of the Beast (Yaju no seishun) (1963)
Index:1179 is movie :Contempt (Mépris, Le) (1963)
