<a href="https://colab.research.google.com/github/proveindia/MLAI/blob/main/colab_activity_19_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Colab Activity 19.3: Implementing Funk SVD


**Expected Time = 60 minutes**


This activity focuses on using gradient descent to provide recommendations with collaborative filtering.  The purpose here is to get a high level introduction to the implementation of SVD Funk.  You will use the earlier ratings and a given user and item matrix to update the user factors.  In the next activity, you will implement the algorithms using `Surprise`.

### Index


- [Problem 1](#-Problem-1)
- [Problem 2](#-Problem-2)
- [Problem 3](#-Problem-3)
- [Problem 4](#-Problem-4)

In [3]:
import pandas as pd
import numpy as np

#### The Data

Below, the user reviews data is loaded as well as a $Q$ and $P$ matrix with some randomly built values from a similar process to the last activity.

In [4]:
reviews = pd.read_csv('data/user_rated.csv', index_col=0).iloc[:, :-2]
Q = pd.read_csv('data/Q.csv', index_col=0)
P = pd.read_csv('data/P.csv', index_col=0)
Q = Q[['F1', 'F2']]
P = P[['F1', 'F2']]

In [5]:
reviews.head()

Unnamed: 0,Michael Jackson,Clint Black,Dropdead,Anti-Cimex,Cardi B
Alfred,3.0,4.0,,4.0,4.0
Mandy,,9.0,,3.0,8.0
Lenny,2.0,5.0,8.0,9.0,
Joan,3.0,,9.0,4.0,9.0
Tino,1.0,1.0,,9.0,5.0


In [6]:
Q.T.head() #item factors

Unnamed: 0,Michael Jackson,Clint Black,Dropdead,Anti-Cimex,Cardi B
F1,0.16406,-0.207666,0.882355,1.569998,0.570041
F2,0.248037,1.421081,0.436072,-0.429204,0.670451


In [7]:
P.head() #user factors

Unnamed: 0,F1,F2
Alfred,3.820956,3.395762
Mandy,3.710347,7.006197
Lenny,7.113263,3.952502
Joan,5.240167,10.035759
Tino,5.86328,2.197482


[Back to top](#-Index)

### Problem 1


#### Making Predictions

To make predictions you multiply a given row of $P$ by a column of $Q$.  Perform this operation for all users and items and assign a DataFrame of predicted values to `pred_df` below.  

HINT: For this step, use matrix multiplication rather than a nested loop. Matrix multiplication can be achieved using the `@` operator.

In [8]:

pred_df = P@Q.T


### ANSWER CHECK
pred_df

Unnamed: 0,Michael Jackson,Clint Black,Dropdead,Anti-Cimex,Cardi B
Alfred,1.46914,4.032171,4.852237,4.541419,4.454797
Mandy,2.346515,9.185859,6.329049,2.81815,6.812366
Lenny,2.147366,4.139643,8.0,9.47138,6.704814
Joan,3.348941,13.173421,9.0,3.919665,9.715601
Tino,1.506984,1.905196,6.131757,8.262171,4.815617


### Problem 2


#### Measuring Error

Use your prediction for `Mandy` in terms of `Clint Black` to determine the error squared.  Assign this value to `ans2` below.

In [9]:

ans2 = (pred_df.iloc[1, 1] - reviews.iloc[1, 1])**2


### ANSWER CHECK
print(ans2)

0.034543629342140704


### Problem 3


#### Error for all Mandy Predictions

Now, compute the error squared for each of `Mandy`'s ratings where she had them -- `Clint Black`, `Anti-Cimex`, and `Cardi B`.  Assign these as a numpy array to `ans3`.

In [10]:

ans3 = ((reviews.iloc[1].dropna() - pred_df.iloc[1].loc[reviews.iloc[1].notnull()])**2).values


### ANSWER CHECK
print(ans3)

[0.03454363 0.03306925 1.41047524]


### Problem 4


#### Updating the Values

Now, perform the update for matrix $P$ based on the rule:

$$P_{a,b} := P_{a,b} - \alpha \sum_{j \in R_a}^N e_{a,j}Q_{b,j}$$

You will do this for the first factor of Mandy.  This means:

$$P_{1, 0} = -9.019710 - \alpha(e_{1, 1}Q_{1, 0} + e_{1, 3}Q_{3, 0} + e_{1, 4}Q_{4, 0})$$

Use $\alpha = 0.1$, and assign this new value as a float to `P_new`.

In [11]:

P_new = -9.019710 - 0.1*ans3**0.5@Q.loc[reviews.iloc[1].notnull()]['F1']


### ANSWER CHECK
print(P_new)

-9.112100758216553


As an extra exercise, consider how to modularize this for each value of $P$.  Further, the update for $Q$ that occurs consistent with that of $P$ -- consider working through the full update process and modularizing the update process.