### SVD Funk

This activity focuses on using gradient descent to provide recommendations with collaborative filtering.  The purpose here is to get a high level introduction to the implementation of SVD Funk.  You will use the earlier ratings and a given user and item matrix to update the user factors.  In the next activity, you will implement the algorithms using `Surprise`.

### Index


- [Problem 1](#-Problem-1)
- [Problem 2](#-Problem-2)
- [Problem 3](#-Problem-3)
- [Problem 4](#-Problem-4)

In [1]:
import pandas as pd
import numpy as np

#### The Data

Below, we load in the user reviews as well as a $Q$ and $P$ matrix with some randomly built values from a similar process to the last activity.

In [37]:
reviews = pd.read_csv('data/user_rated.csv', index_col=0).iloc[:, :-2]
Q = pd.read_csv('data/Q.csv', index_col=0)
P = pd.read_csv('data/P.csv', index_col=0)
Q = Q[['F1', 'F2']]
P = P[['F1', 'F2']]

In [38]:
reviews.head()

Unnamed: 0,Michael Jackson,Clint Black,Dropdead,Anti-Cimex,Cardi B
Alfred,3.0,4.0,,4.0,4.0
Mandy,,9.0,,3.0,8.0
Lenny,2.0,5.0,8.0,9.0,
Joan,3.0,,9.0,4.0,9.0
Tino,1.0,1.0,,9.0,5.0


In [39]:
Q.T.head() #item factors

Unnamed: 0,Michael Jackson,Clint Black,Dropdead,Anti-Cimex,Cardi B
F1,-0.510093,0.181804,-7.554766,-0.520113,-0.458392
F2,-0.480414,-3.22799,-0.348831,-0.533289,-1.413967


In [40]:
P.head() #user factors

Unnamed: 0,F1,F2
Alfred,-4.427436,-1.58782
Mandy,-9.01971,-3.437908
Lenny,-1.015713,-0.936057
Joan,-0.932923,-5.595791
Tino,-2.538133,-0.043783


[Back to top](#-Index)

### Problem 1

#### Making Predictions

To make predictions you multiply a given row of $P$ by a column of $Q$.  Perform this operation for all users and items and assign a DataFrame of predicted values to `pred_df` below.  Try to do this using matrix multiplication rather than a nested loop.

In [44]:
### GRADED
pred_df = ''

    
### BEGIN SOLUTION
pred_df = P@Q.T
### END SOLUTION

### ANSWER CHECK
pred_df

Unnamed: 0,Michael Jackson,Clint Black,Dropdead,Anti-Cimex,Cardi B
Alfred,3.021214,4.320545,34.002121,3.149535,4.274625
Mandy,6.252507,9.457719,69.341043,6.524669,8.995648
Lenny,0.967803,2.836922,8.0,1.027474,1.789148
Joan,3.164175,17.89355,9.0,3.469398,8.339908
Tino,1.315717,-0.32011,19.19027,1.343466,1.225366


In [45]:
### BEGIN HIDDEN TESTS
pred_df_ = P@Q.T
#
#
#
pd.testing.assert_frame_equal(pred_df, pred_df_)
### END HIDDEN TESTS

### Problem 2

#### Measuring Error

Use your prediction for Mandy in terms of Clint Black to determine the error squared.  Assign this value to `ans2` below.

In [50]:
### GRADED
ans2 = ''

    
### BEGIN SOLUTION
ans2 = (pred_df.iloc[1, 1] - reviews.iloc[1, 1])**2
### END SOLUTION

### ANSWER CHECK

print(ans2)

0.20950654368339033


In [52]:
### BEGIN HIDDEN TESTS
ans2_ = (pred_df_.iloc[1, 1] - reviews.iloc[1, 1])**2
#
#
#
assert ans2 == ans2_
### END HIDDEN TESTS

### Problem 3

#### Error for all Mandy Predictions

Now, compute the error squared for each of Mandy's ratings where she had them -- Clint Black, Anti-Cimex, and Cardi B.  Assign these as a numpy array to `ans3`.

In [60]:
### GRADED
ans3 = ''

    
### BEGIN SOLUTION
ans3 = ((reviews.iloc[1].dropna() - pred_df.iloc[1].loc[reviews.iloc[1].notnull()])**2).values
### END SOLUTION

### ANSWER CHECK
print(ans3)

[ 0.20950654 12.42328982  0.99131421]


In [61]:
### BEGIN HIDDEN TESTS
ans3_ = ((reviews.iloc[1].dropna() - pred_df.iloc[1].loc[reviews.iloc[1].notnull()])**2).values
#
#
#
np.testing.assert_array_equal(ans3, ans3_)
### END HIDDEN TESTS

### Problem 4

#### Updating the Values

Now, perform the update for matrix $P$ based on the rule:

$$P_{a,b} := P_{a,b} - \alpha \sum_{j \in R_a}^N e_{a,j}Q_{b,j}$$

You will do this for the first factor of Mandy.  This means:

$$P_{1, 0} = -9.019710 - \alpha(e_{1, 1}Q_{1, 0} + e_{1, 3}Q_{3, 0} + e_{1, 4}Q_{4, 0})$$

Use $\alpha = 0.1$, and assign this new value as a float to `P_new`.

In [69]:
### GRADED
P_new = ''

    
### BEGIN SOLUTION
P_new = -9.019710 - 0.1*ans3**0.5@Q.loc[reviews.iloc[1].notnull()]['F1']### END SOLUTION
### ANSWER CHECK
print(P_new)

-8.331926013496945


In [70]:
### BEGIN HIDDEN TESTS
P_new_ = -9.019710 - 0.1*ans3_**0.5@Q.loc[reviews.iloc[1].notnull()]['F1']### END SOLUTION
#
#
#
assert P_new == P_new_
### END HIDDEN TESTS

As an extra exercise, consider how to modularize this for each value of $P$.  Further, the update for $Q$ that occurs consistent with that of $P$ -- consider working through the full update process and modularizing the update process.