<img src="http://www.ubu.es/sites/default/files/portal_page/images/logo_color_2l_dcha.jpg" height="150" width="150" align="right"/>

## Collaborative Filtering (2)
[Nacho Santos](www.nacho.santos.name)

## Import python packages

In [1]:
import numpy as np
from scipy.optimize import minimize
import matplotlib.pyplot as plt

In [2]:
y = np.load('matrizy.npy')
r = np.load('matrizr.npy')

In [3]:
# Other functions necessary for this assignment
# The python file "recommender_system.py" must be in the same folder as this notebook, otherwise,
# you have to add the path to the file
from recommender_system import *

In [4]:
# This line is necessary to show matplotlib plots inside the jupyter notebook
%matplotlib inline

## 2 The cost funcion J and the gradient of J
The objective of this point is to build a function to compute the cost J and the corresponding gradient of J. In particular, you are going to implement a function called **cofiCostFunc()** with the arguments(inputs) and outputs detailed below (the code of the function is partially predefined in a cell right after).

**Arguments** (in this order)
* *parameters* (the paramaters of the cost function, i.e. X and $\theta$
* *Y* (matrix of ratings)
* *R* (matrix of watched movies)
* *n_users* (number of users)
* *n_movies* (number of movies)
* *n_characteristics* (number of the filter characteristics)
* *landa* (regularization parameter)

**Outputs** (in this order)
* *cost* (value of the cost function J)
* *gradient* (gradient of the cost function J)

In [5]:
# Cost and gradient function
n_features = 100
x_0 = np.random.random_sample((y.shape[0],n_features))
thetha_0 = np.random.random_sample((y.shape[1],n_features))
vecFlat_0 = np.append(x_0.flatten(),thetha_0.flatten())
n_users = y.shape[1]
n_movies = y.shape[0]

def cofiCostFunc (parameters, Y, R, n_users, n_movies, n_features, lamb):
# parameters: vector with the matrices X and Theta foldes
# Y: matrix of ratings
# R: matrix of watched movies
# n_users: number of users (number of columns of the matrix Y)
# n_movies: number of movies (number of rows of the matrix Y)
# n_features: number of movies' features (a parameter of the CF algorithm)
# lamb: regularization term
#
# cofiCostFunc rerturns the cost J and the gradient of J

    # You need to return the following values correctly
    cost=0
    gradient=np.zeros_like(parameters)
    
    # Remember:
    #  (1) To unfold X and Theta from parameters before computing J and the gradients
    #  (2) To fold the gradient of J with respect to X and to Theta into a flattened vector gradient 
    #      that is returned by the function
    
    
    # YOUR CODE ..................................
    
    x = parameters[0:(n_movies*n_features)].reshape((n_movies,n_features))
    thetha = parameters[(n_movies*n_features):].reshape((n_users,n_features))
    thethaT = thetha.transpose()  
    mul = np.dot(x,thethaT)   
    ycapuchino = np.multiply(mul,R)
    
    primera_parte_coste = (1/2)*np.sum((ycapuchino-Y)**2)
    segunda_parte_coste = (lamb/2)*np.sum((x)**2)
    tercera_parte_coste = (lamb/2)*np.sum((thetha)**2)
    
    cost = primera_parte_coste+segunda_parte_coste+tercera_parte_coste
    
    a = np.multiply(np.dot(x,thetha.T),R) - Y
    grad_thetha = np.dot(a,thetha) + lamb*x

    a = np.multiply(np.dot(x,thetha.T),R) - Y
    grad_x = np.dot(a.T,x) + lamb*thetha

    gradient = np.append(grad_thetha.flatten(),grad_x.flatten())
    
    
    
    

    # YOUR CODE (end) ..................................
    
    return (cost, gradient)

### 2.1 Input *parameters* of the cofiCostFunc

**Features X and preferences Theta**

The Collaborative Filtering (CF) algorithm is based on two sets of lineas regressions, the first one corresponds to the movies' features X, and the second one corresponds to the users' preferences Theta. Assuming n features, the matrix X will be:

$$X=\begin{bmatrix}x^{(1)}_{1} & ...& x^{(1)}_{n} \\. & ...& .\\x^{(m)}_{1} & ...& x^{(m)}_{n} \end{bmatrix}$$

where the i-th row of X corresponds to the feature vector $x^{(i)}$ for the i-th movie.

And the matrix Theta will be:

$$Theta=\Theta=\begin{bmatrix}\theta^{(1)}_{1} & ...& \theta^{(1)}_{n} \\. & ...& .\\\theta^{(u)}_{1} & ...& \theta^{(u)}_{n} \end{bmatrix}$$

where the j-th row of Theta corresponds to the preference vector $\theta^{(j)}$ for the j-th user. 

**Passing X and Theta to cofiCostFunc**

We are going to use a optimize package scipy.optimize that requires using a **flattened vector** of parameters. However, in our problem tha parameters to be optimized are represented by two matrices, i.e. X and Theta. So, X and Theta must be passed to the cofiCostFunc as a **(mxn)+(u+n) vector**, called **parameters**:

$${ \left[ \begin{matrix} { x }^{ (1) }, & ... & { x }^{ (m) }, \end{matrix}\begin{matrix} \theta ^{ (1) }, & ... & \theta ^{ (u) } \end{matrix} \right]  }_{ (m\cdot n)+(u\cdot n) }$$ 

However, inside the function, you can unfold the vector **parameters** and build the matrices X and Theta to compute J and the gradients according to the equations explained in class.

### 2.2 Computing the cost J
Suppose that the vector of features $x^{(i)}$ of the film i and the vector of preferences $\theta^{(j)}$ of the user j are known, then the **estimate of the rating** of the user j for the movie i will be:

$$\widehat{y}^{(i,j)}=x^{(i)}(\theta^{(j)})^{T}$$

The error of the estimate will be the difference between the estimate of rating $\widehat{y}^{(i,j)}$ and the real ratings $y^{(i,j)}$

The **cost J** is defined as the the average of the squares of the errors plus two regularization terms:

$$J=\frac { 1 }{ 2 } \sum _{ (i,j):r(i,j)=1 }^{  }{ \left( { x }^{ (i) }({ \theta  }^{ (j) })^{ T }-{ y }^{ (i,j) } \right) ^{ 2 } } +\quad \frac { \lambda  }{ 2 } \sum _{ i=1 }^{ m }{ \sum _{ k=1 }^{ n }{ ({ x }_{ k }^{ (i) })^{ 2 } }  } +\frac { \lambda  }{ 2 } \sum _{ j=1 }^{ u }{ \sum _{ k=1 }^{ n }{ ({ \theta  }_{ k }^{ (j) })^{ 2 } }  } $$


### Task 7
***
Implement the cost J as a vectorized expression (recommended). For example, the estimate of ratings can be expressed as:

$$\widehat{Y}=X\Theta^{T}$$

Now, go back and **complete the cofiCostFunc code to compute the cost J**. Remeber that J is scalar value.

### 2.3 Checking the cost J
Now, you will import a data set and check the cofiCostFunc.

In [6]:
# load dataset for checking
Y=np.load('YmatrixTest.npy')
R=np.load('RmatrixTest.npy')
X=np.load('XmatrixTest.npy')
Theta=np.load('ThetamatrixTest.npy')

# dimension
n_users = Y.shape[1]
n_movies = Y.shape[0]
n_features = X.shape[1]

### Task 8
***
Call cofiCostFunc with lamb=0 (without regularization term) and check the result

In [7]:
# Evaluate Cost J (without regularization term)
J=0
parameters=np.append(X.flatten(),Theta.flatten())

# YOUR CODE ..................................
# call cofiCostFunc with lamb=0 (without regularization term)


J = cofiCostFunc(parameters, Y, R, n_users, n_movies, n_features, 0)


# YOUR CODE (end)..................................

print('The value of J (without regularization term) is %0.2f (it should be 22.22)' % J[0] )

The value of J (without regularization term) is 22.22 (it should be 22.22)


### Task 9
***
Call cofiCostFunc with lamb=1.5 (with regularization term) and check the result

In [8]:
# Evaluate Cost J (with regularization term)
J=0
parameters=np.append(X.flatten(),Theta.flatten())

# YOUR CODE ..................................
# call cofiCostFunc with lamb=1.5 (without regularization term)

J = cofiCostFunc(parameters, Y, R, n_users, n_movies, n_features, 1.5)



# YOUR CODE (end)..................................

print('The value of J (with regularization term equal to 1.5) is %0.2f (it should be 31.34)' % J[0] )

The value of J (with regularization term equal to 1.5) is 31.34 (it should be 31.34)


### 2.4 Computing the gradient of J
The **gradient of J** depends on the two types of parameters, i.e. X and Theta. The corresponding equations are:

$$\frac { \partial J }{ \partial { \theta  }_{ k }^{ (j) } } =\sum _{ i:r(i,j)=1 }^{  }{ \left( { x }^{ (i) }({ \theta  }^{ (j) })^{ T }-{ y }^{ (i,j) } \right) { x }_{ k }^{ (i) } } +\lambda { \theta  }_{ k }^{ (j) }$$

$$\frac { \partial J }{ \partial { x }_{ k }^{ (i) } } =\sum _{ j:r(i,j)=1 }^{  }{ \left( { x }^{ (i) }({ \theta  }^{ (j) })^{ T }-{ y }^{ (i,j) } \right) \theta _{ k }^{ (j) } } +\lambda { x }_{ k }^{ (i) }$$

### Task 10
***
Now, go back and **complete the cofiCostFunc code to compute the gradient of J**. Remember to use vectorized operations instead of using for loops.

Note that the outputs of cofiCostFunc are the cost J (scalar value) and the gradient, again a **flattened vector of the corresponding gradients of X and Theta**:

$${ \left[ \begin{matrix} \frac { \partial J }{ \partial { x }^{ (1) } } , & ... & \frac { \partial J }{ \partial { x }^{ (m) } } , & \frac { \partial J }{ \partial \theta ^{ (1) } } , & ... & \frac { \partial J }{ \partial \theta ^{ (u) } }  \end{matrix} \right]  }_{ (m\cdot n)+(u\cdot n) }$$

After computing both gradients, you should reshape them into a flattened vector called **gradient** that will be returned by the cofiCostFunc.

### 2.5 Checking the gradient of J
For the same dataset of the last poit, you will check the gradient of J computed by your cofiCostFunc

In [9]:
# Check gradients (without regularization term) by running the next function
checkCostFunction(cofiCostFunc,0)

The above two columns you get should be very similar.
(Left - Your Numerical Gradient, Right - Analytical Gradient)

(0.0, 0.0)
(0.0, 0.0)
(0.0, 0.0)
(0.13914663143738126, 0.13914663143761774)
(0.29425974343050276, 0.294259743430629)
(-0.0018939304988196959, -0.0018939304986384112)
(0.13741826980107064, 0.13741826980111788)
(-0.1227344320250956, -0.12273443202536469)
(-0.15667466363916693, -0.15667466363923907)
(0.08165012110483705, 0.0816501211048018)
(0.03325332634362965, 0.03325332634326789)
(-0.04390643372903513, -0.04390643372902411)
(0.302528269811031, 0.3025282698108845)
(0.10717178863167698, 0.10717178863165408)
(0.1914689673099268, 0.1914689673097734)
(0.14915860018144267, 0.14915860018170904)
(0.09970986004942395, 0.09970986004956667)
(0.1931390510570563, 0.1931390510572159)
(-0.4042698612843898, -0.4042698612844372)
(-0.14321413384227322, -0.1432141338421293)
(-0.2558608255123773, -0.25586082551222)
(-0.09432547203452879, -0.09432547203451784)
(-0.03224766665227419, -0.03224

In [10]:
# Check gradients (with regularization term) by running the next function
checkCostFunction(cofiCostFunc,1.5)

The above two columns you get should be very similar.
(Left - Your Numerical Gradient, Right - Analytical Gradient)

(0.483518665537197, 0.4835186655359085)
(1.611961700236364, 1.6119617002407032)
(0.5068703660926488, 0.5068703660877449)
(0.846115356294419, 0.8461153562895725)
(-0.20355843182606748, -0.20355843182665895)
(0.05847829966487694, 0.05847829966740137)
(0.31934752900752983, 0.3193475290118088)
(0.7329575198511407, 0.7329575198517473)
(-0.33461800763134875, -0.33461800763578575)
(1.3462341342718176, 1.3462341342749935)
(0.1262787004385757, 0.12627870043892853)
(-0.1811259135564569, -0.18112591355619434)
(0.28030342934304286, 0.28030342934181185)
(-0.4900461302126402, -0.49004613021164867)
(0.9287725298889882, 0.9287725298923415)
(0.06119377570001916, 0.06119377570069112)
(-0.12353679874177459, -0.12353679874011753)
(0.4859171143589691, 0.48591711435583285)
(0.6890297719364114, 0.6890297719384975)
(0.16271619480701105, 0.16271619480412047)
(0.22435456617397875, 0.2243545661748

## 3 Learning and recommendation
Finally, you will use your cofiCostFun to make predictions using the initial Movielens dataset. Part of the python code you need is already written in the next cells. You only have to complete those lines that are explicitly required.

### Task 11
***
Load the matrix Y and R computed in the first notebook.

In [11]:
# Task: load matrix Y and R
# YOUR CODE ..................................


Y = np.load('matrizy.npy')
R = np.load('matrizr.npy')


### Task 12
***
* Get the number of users and movies and assign the corresponding variables n_users, n_movies.
* Set the initial parameters (Theta, X) with random values.
* Fold X and Theta into the variable initial_parameters.

In [12]:
# Set the number of features
n_features = 100
n_users = Y.shape[1]
n_movies = Y.shape[0]
print(n_users)
print(n_movies)

X = np.random.random_sample((Y.shape[0],n_features))
Theta = np.random.random_sample((Y.shape[1],n_features))

initial_parameters = np.append(X.flatten(),Theta.flatten())

943
1682


Now, we set the rest of the parameters and minimize the function

In [13]:
# Set the regularization parameter
lamb = 10

# Define a function to be minimized
def cofiCostFunc_minimize(parameters):
    return cofiCostFunc(parameters,Y, R, n_users, n_movies, n_features,lamb)

# Set the number of iteations
max_iter=200

In [14]:
# Minimize the function using minimize from the package scipy.optimize and get the optimized parameters
parameters = (minimize(cofiCostFunc_minimize,initial_parameters,method="CG",jac=True,
                   options={'maxiter':max_iter, "disp":True})).x

         Current function value: 66587.663568
         Iterations: 200
         Function evaluations: 299
         Gradient evaluations: 299


### Task 13
***
Get the matrix of predictions P


In [15]:
# YOUR CODE ..................................

mul = np.dot(X,Theta.T)
P= np.multiply(mul,R)

### Task 14
***
Show the titles of the top-5 predictions for the first user u=0, for those films user u did not watch: r(i,u)=0 (they will be the top-5 recommendations)

#### Tips
* You can import movies' titles using Pandas (see the first notebook)


In [34]:
# YOUR CODE ..................................

from pandas import read_table
items = read_table('u.item',header=None,sep='|',encoding='ISO-8859-1')
items.drop(range(2,24),axis=1, inplace=True)
items.columns = ['itemid','title']

recomendacion_user0 = P[:,0]
no_vistas = np.where(R[0]==0)[0]
prediccion = recomendacion_user0[no_vistas]
top_5 = np.argsort(prediccion)[0:5]

for i in top_5:
    print(items.iloc[i]["title"])





Chasing Amy (1997)
How to Be a Player (1997)
U Turn (1997)
Game, The (1997)
Kiss the Girls (1997)
