### Implementing FunkSVD with vectorization.

In this notebook we will take a look at writing our own (vectorized) function that performs FunkSVD.

To test our algorithm, we will run it on the subset of the data we worked with earlier.  Run the cell below to get started.

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import sparse
import svd_tests as t
%matplotlib inline

# Read in the datasets
movies = pd.read_csv('../Content based/movies_clean.csv')
reviews = pd.read_csv('../Content based/reviews_clean.csv')

del movies['Unnamed: 0']
del reviews['Unnamed: 0']

# Create user-by-item matrix
user_items = reviews[['user_id', 'movie_id', 'rating', 'timestamp']]
user_by_movie = user_items.groupby(['user_id', 'movie_id'])['rating'].max().unstack()

# Create data subset
user_movie_subset = user_by_movie[[73486, 75314,  68646, 99685]].dropna(axis=0)
ratings_mat = np.matrix(user_movie_subset)
print(ratings_mat)

[[10. 10. 10. 10.]
 [10.  4.  9. 10.]
 [ 8.  9. 10.  5.]
 [ 9.  8. 10. 10.]
 [10.  5.  9.  9.]
 [ 6.  4. 10.  6.]
 [ 9.  8. 10.  9.]
 [10.  5.  9.  8.]
 [ 7.  8. 10.  8.]
 [ 9.  5.  9.  7.]
 [ 9.  8. 10.  8.]
 [ 9. 10. 10.  9.]
 [10.  9. 10.  8.]
 [ 5.  8.  5.  8.]
 [10.  8. 10. 10.]
 [ 9.  9. 10. 10.]
 [ 9.  8.  8.  8.]
 [10.  8.  1. 10.]
 [ 5.  6. 10. 10.]
 [ 8.  7. 10.  7.]]


`1.` We will use the **user_movie_subset** matrix to show that our FunkSVD algorithm will converge.  



In [8]:
def FunkSVD(ratings_mat, latent_features=4, learning_rate=0.0001, iters=100):
    '''
    This function performs matrix factorization using a basic form of FunkSVD with no regularization
    
    INPUT:
    ratings_mat - (numpy array) a matrix with users as rows, movies as columns, and ratings as values
    latent_features - (int) the number of latent features used
    learning_rate - (float) the learning rate 
    iters - (int) the number of iterations
    
    OUTPUT:
    user_mat - (numpy array) a user by latent feature matrix
    movie_mat - (numpy array) a latent feature by movie matrix
    '''
    
    # Set up useful values to be used through the rest of the function
    n_users = ratings_mat.shape[0]
    n_movies = ratings_mat.shape[1]
    num_ratings = np.sum(~np.isnan(ratings_mat)) # total number of ratings in the matrix
    
    # initialize the user and movie matrices with random values
    
    user_mat = np.random.rand(n_users, latent_features) 
    movie_mat = np.random.rand(latent_features, n_movies)
    
    # initialize sse at 0 for first iteration
    sse_accum = 0
    
    # header for running results
    print("Optimization Statistics")
    print("Iterations | Mean Squared Error ")
    
    for i in range(iters):

        # update our sse
        old_sse = sse_accum
        sse_accum = 0 
                    
        # compute the error as the actual minus the dot product of the user and movie latent features
        error = np.nan_to_num(ratings_mat-user_mat@movie_mat)
        # Keep track of the total sum of squared errors for the matrix
        sse_accum = np.sum(np.multiply(error,error))            
        # update the values in each matrix in the direction of the gradient
        user_mat += 2*learning_rate*error@(movie_mat.T)
        movie_mat += 2*learning_rate*(user_mat.T)@error
        # print results for iteration
        print(f'Iteration {i}, Sum of Square Errors is {sse_accum}.')
        
    return user_mat, movie_mat 

`2.` Try out the function on the **user_movie_subset** dataset.  First try 4 latent features, a learning rate of 0.005, and 10 iterations.  

In [9]:
# 4 latent features, lr of 0.005 and 10 iterations
user_mat, movie_mat = FunkSVD(ratings_mat, latent_features=4, learning_rate=0.005, iters=10)

Optimization Statistics
Iterations | Mean Squared Error 
Iteration 0, Sum of Square Errors is 4695.610448140116.
Iteration 1, Sum of Square Errors is 2121.256087383552.
Iteration 2, Sum of Square Errors is 243.37812974910824.
Iteration 3, Sum of Square Errors is 173.93187463035878.
Iteration 4, Sum of Square Errors is 169.96950895409245.
Iteration 5, Sum of Square Errors is 167.82388765616182.
Iteration 6, Sum of Square Errors is 165.56095342778667.
Iteration 7, Sum of Square Errors is 163.06199171656374.
Iteration 8, Sum of Square Errors is 160.2774547976628.
Iteration 9, Sum of Square Errors is 157.16801660432347.


In [12]:
#Compare the predicted and actual results
print(np.dot(user_mat, movie_mat).round())
print(ratings_mat)

[[10.  9. 11. 10.]
 [ 9.  7.  9.  8.]
 [ 8.  7.  9.  8.]
 [10.  8. 10.  9.]
 [ 9.  7.  9.  8.]
 [ 7.  6.  7.  7.]
 [ 9.  8. 10.  9.]
 [ 9.  7.  8.  8.]
 [ 8.  7.  9.  8.]
 [ 8.  6.  8.  8.]
 [ 9.  8. 10.  9.]
 [ 9.  9. 10.  9.]
 [10.  8. 10.  9.]
 [ 7.  5.  7.  7.]
 [10.  8. 10. 10.]
 [ 9.  8. 11.  9.]
 [ 8.  7.  9.  8.]
 [ 8.  6.  7.  7.]
 [ 8.  7.  9.  8.]
 [ 8.  7.  9.  8.]]
[[10. 10. 10. 10.]
 [10.  4.  9. 10.]
 [ 8.  9. 10.  5.]
 [ 9.  8. 10. 10.]
 [10.  5.  9.  9.]
 [ 6.  4. 10.  6.]
 [ 9.  8. 10.  9.]
 [10.  5.  9.  8.]
 [ 7.  8. 10.  8.]
 [ 9.  5.  9.  7.]
 [ 9.  8. 10.  8.]
 [ 9. 10. 10.  9.]
 [10.  9. 10.  8.]
 [ 5.  8.  5.  8.]
 [10.  8. 10. 10.]
 [ 9.  9. 10. 10.]
 [ 9.  8.  8.  8.]
 [10.  8.  1. 10.]
 [ 5.  6. 10. 10.]
 [ 8.  7. 10.  7.]]


**With 10 iterations, the FunkSVD method is going to converge and getting close to the input data.**

`3.` Let's try out the function again on the **user_movie_subset** dataset.  This time we will again use 4 latent features and a learning rate of 0.005.  However, let's bump up the number of iterations to 250.

In [13]:
#4 latent features, lr of 0.005 and 250 iterations
user_mat, movie_mat = FunkSVD(ratings_mat, latent_features=4, learning_rate=0.005, iters=250)

Optimization Statistics
Iterations | Mean Squared Error 
Iteration 0, Sum of Square Errors is 4826.222302182867.
Iteration 1, Sum of Square Errors is 2388.406437798486.
Iteration 2, Sum of Square Errors is 298.2232240809206.
Iteration 3, Sum of Square Errors is 184.21639178115743.
Iteration 4, Sum of Square Errors is 175.24141290394027.
Iteration 5, Sum of Square Errors is 172.99219125690047.
Iteration 6, Sum of Square Errors is 171.50509490766095.
Iteration 7, Sum of Square Errors is 170.11729557964387.
Iteration 8, Sum of Square Errors is 168.67304195874826.
Iteration 9, Sum of Square Errors is 167.11532461816952.
Iteration 10, Sum of Square Errors is 165.4132178115434.
Iteration 11, Sum of Square Errors is 163.5422056173107.
Iteration 12, Sum of Square Errors is 161.47920420928526.
Iteration 13, Sum of Square Errors is 159.20120120775312.
Iteration 14, Sum of Square Errors is 156.684977799582.
Iteration 15, Sum of Square Errors is 153.90721119872077.
Iteration 16, Sum of Square Erro

In [16]:
#Compare the predicted and actual results
print(np.dot(user_mat, movie_mat).round())
print(ratings_mat)

np.dot(user_mat, movie_mat).round()==ratings_mat

[[10. 10. 10. 10.]
 [10.  4.  9. 10.]
 [ 8.  9. 10.  5.]
 [ 9.  8. 10. 10.]
 [10.  5.  9.  9.]
 [ 6.  4. 10.  6.]
 [ 9.  8. 10.  9.]
 [10.  5.  9.  8.]
 [ 7.  8. 10.  8.]
 [ 9.  5.  9.  7.]
 [ 9.  8. 10.  8.]
 [ 9. 10. 10.  9.]
 [10.  9. 10.  8.]
 [ 5.  8.  5.  8.]
 [10.  8. 10. 10.]
 [ 9.  9. 10. 10.]
 [ 9.  8.  8.  8.]
 [10.  8.  1. 10.]
 [ 5.  6. 10. 10.]
 [ 8.  7. 10.  7.]]
[[10. 10. 10. 10.]
 [10.  4.  9. 10.]
 [ 8.  9. 10.  5.]
 [ 9.  8. 10. 10.]
 [10.  5.  9.  9.]
 [ 6.  4. 10.  6.]
 [ 9.  8. 10.  9.]
 [10.  5.  9.  8.]
 [ 7.  8. 10.  8.]
 [ 9.  5.  9.  7.]
 [ 9.  8. 10.  8.]
 [ 9. 10. 10.  9.]
 [10.  9. 10.  8.]
 [ 5.  8.  5.  8.]
 [10.  8. 10. 10.]
 [ 9.  9. 10. 10.]
 [ 9.  8.  8.  8.]
 [10.  8.  1. 10.]
 [ 5.  6. 10. 10.]
 [ 8.  7. 10.  7.]]


matrix([[ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True]])

**With 250 iterations, the FunkSVD method converges to the input data.**

The last time we placed an **np.nan** value into this matrix the entire svd algorithm in python broke.  Let's see if that is still the case using your FunkSVD function.  In the below cell, I have placed a nan into the first cell of your numpy array.  

`4.` Use 4 latent features, a learning rate of 0.005, and 160 iterations. 

In [17]:
# Here we are placing a nan into our original subset matrix
ratings_mat[0, 0] = np.nan
ratings_mat

matrix([[nan, 10., 10., 10.],
        [10.,  4.,  9., 10.],
        [ 8.,  9., 10.,  5.],
        [ 9.,  8., 10., 10.],
        [10.,  5.,  9.,  9.],
        [ 6.,  4., 10.,  6.],
        [ 9.,  8., 10.,  9.],
        [10.,  5.,  9.,  8.],
        [ 7.,  8., 10.,  8.],
        [ 9.,  5.,  9.,  7.],
        [ 9.,  8., 10.,  8.],
        [ 9., 10., 10.,  9.],
        [10.,  9., 10.,  8.],
        [ 5.,  8.,  5.,  8.],
        [10.,  8., 10., 10.],
        [ 9.,  9., 10., 10.],
        [ 9.,  8.,  8.,  8.],
        [10.,  8.,  1., 10.],
        [ 5.,  6., 10., 10.],
        [ 8.,  7., 10.,  7.]])

In [32]:
# run SVD on the matrix with the missing value
# we pick the parameter iters=140 based on the gradient descent error reports
user_mat, movie_mat = FunkSVD(ratings_mat, latent_features=4, learning_rate=0.003, iters=170)

Optimization Statistics
Iterations | Mean Squared Error 
Iteration 0, Sum of Square Errors is 4384.8268010163.
Iteration 1, Sum of Square Errors is 2875.370382671522.
Iteration 2, Sum of Square Errors is 1269.4701105474269.
Iteration 3, Sum of Square Errors is 388.0331647812819.
Iteration 4, Sum of Square Errors is 208.7159629007571.
Iteration 5, Sum of Square Errors is 183.22470588059068.
Iteration 6, Sum of Square Errors is 174.86690329985635.
Iteration 7, Sum of Square Errors is 170.90015940908893.
Iteration 8, Sum of Square Errors is 168.49540522090945.
Iteration 9, Sum of Square Errors is 166.6563533095037.
Iteration 10, Sum of Square Errors is 165.00430889335362.
Iteration 11, Sum of Square Errors is 163.38509929016976.
Iteration 12, Sum of Square Errors is 161.7301788541514.
Iteration 13, Sum of Square Errors is 160.005307570013.
Iteration 14, Sum of Square Errors is 158.19068162570426.
Iteration 15, Sum of Square Errors is 156.27290126300804.
Iteration 16, Sum of Square Errors 

In [33]:
# this cell shows that we are able to predict for the missing value
preds = np.dot(user_mat, movie_mat)
print("The predicted value for the missing rating is {}:".format(preds[0,0]))
print()
print("The actual value for the missing rating is {}:".format(ratings_mat[0,0]))

print("We just predicted a rating for a user-movie pair that was never rated!")
print("If we look in the original matrix, this was actually a value of 10. Not bad!")

The predicted value for the missing rating is 10.339980448147628:

The actual value for the missing rating is nan:
We just predicted a rating for a user-movie pair that was never rated!
If we look in the original matrix, this was actually a value of 10. Not bad!


Now let's extend this to a more realistic example. Unfortunately, running this function on a realistic user-movie matrix is still not something we likely be able to do on local machine.  However, we can see how well this example extends to 1000 users.  

`5.` Given the size of this matrix, this will take quite a bit of time.  We start with the following hyperparameters: 4 latent features, 0.0001 learning rate, and 20 iterations. 

In [22]:
# Setting up a matrix of the first 1000 users with movie ratings
first_1000_users = np.matrix(user_by_movie.head(1000))

# perform funkSVD on the matrix of the top 1000 users
# since the size of input data increase, we choose a smaller learing rate = 0.0001
user_mat, movie_mat = FunkSVD(first_1000_users, latent_features=4, learning_rate=0.0001, iters=20)

Optimization Statistics
Iterations | Mean Squared Error 
Iteration 0, Sum of Square Errors is 491846.0608011803.
Iteration 1, Sum of Square Errors is 475409.5593923182.
Iteration 2, Sum of Square Errors is 459325.3300650695.
Iteration 3, Sum of Square Errors is 443581.3046262823.
Iteration 4, Sum of Square Errors is 428180.6129619386.
Iteration 5, Sum of Square Errors is 413138.1177543705.
Iteration 6, Sum of Square Errors is 398477.13146615133.
Iteration 7, Sum of Square Errors is 384226.3036802334.
Iteration 8, Sum of Square Errors is 370416.7168794455.
Iteration 9, Sum of Square Errors is 357079.2592528531.
Iteration 10, Sum of Square Errors is 344242.35451722774.
Iteration 11, Sum of Square Errors is 331930.1224526031.
Iteration 12, Sum of Square Errors is 320161.0229222469.
Iteration 13, Sum of Square Errors is 308947.0052450753.
Iteration 14, Sum of Square Errors is 298293.14961641736.
Iteration 15, Sum of Square Errors is 288197.75364764425.
Iteration 16, Sum of Square Errors is

In [23]:
# iters=20 was not too bad, and we saw the error is decreasing steadily but a bit slow.
# try learning rate = 0.0005, iters=100
user_mat, movie_mat = FunkSVD(first_1000_users, latent_features=4, learning_rate=0.0005, iters=100)

Optimization Statistics
Iterations | Mean Squared Error 
Iteration 0, Sum of Square Errors is 489343.7798939081.
Iteration 1, Sum of Square Errors is 404325.03625786636.
Iteration 2, Sum of Square Errors is 331101.6110006428.
Iteration 3, Sum of Square Errors is 274233.1102668536.
Iteration 4, Sum of Square Errors is 233370.6804447274.
Iteration 5, Sum of Square Errors is 204214.7163234572.
Iteration 6, Sum of Square Errors is 182639.06216156867.
Iteration 7, Sum of Square Errors is 165997.85547260824.
Iteration 8, Sum of Square Errors is 152714.66070486946.
Iteration 9, Sum of Square Errors is 141814.620873554.
Iteration 10, Sum of Square Errors is 132666.5929917988.
Iteration 11, Sum of Square Errors is 124846.71733890435.
Iteration 12, Sum of Square Errors is 118060.94905786942.
Iteration 13, Sum of Square Errors is 112098.96390800794.
Iteration 14, Sum of Square Errors is 106806.0374337026.
Iteration 15, Sum of Square Errors is 102065.48298146496.
Iteration 16, Sum of Square Errors

`6.` Now that we have a set of predictions for each user-movie pair,  let's look at a few statistics about our results. 

In [24]:
# How many actual ratings exist in first_1000_users
num_ratings = np.sum(~np.isnan(first_1000_users))
print(f"The number of actual ratings in the first_1000_users is {num_ratings}.")


# How many ratings did we make for user-movie pairs that didn't actually have ratings
ratings_for_missing = np.sum(np.isnan(first_1000_users))
print(f"The number of ratings made for user-movie pairs that didn't have ratings is {ratings_for_missing}")

The number of actual ratings in the first_1000_users is 10852.
The number of ratings made for user-movie pairs that didn't have ratings is 31234148


In [25]:
# Test your results against the solution
assert num_ratings == 10852, "Oops!  The number of actual ratings doesn't quite look right."
assert ratings_for_missing == 31234148, "Oops!  The number of movie-user pairs that you made ratings for that didn't actually have ratings doesn't look right."

# Make sure you made predictions on all the missing user-movie pairs
preds = np.dot(user_mat, movie_mat)
assert np.isnan(preds).sum() == 0
print("Nice job!  Looks like you have predictions made for all the missing user-movie pairs! But I still have one question... How good are they?")

Nice job!  Looks like you have predictions made for all the missing user-movie pairs! But I still have one question... How good are they?
