# Classic Closed-Form Singular Value Decomposition

This theory notebook demonstrates classic SVD. It will also demonstrate that _this technique is not suitable for data sets with missing values._ 

It is based on pre-cleaned MovieTweetings data. Data source: [MovieTweetings Data](https://github.com/sidooms/MovieTweetings/tree/master/recsyschallenge2014)

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [3]:
# Read in the datasets

movies = pd.read_csv('data/movies_clean.csv')
reviews = pd.read_csv('data/reviews_clean.csv')

del movies['Unnamed: 0']
del reviews['Unnamed: 0']

In [4]:
# create data subset for SVD demonstration

# create user-by-item matrix
user_items = reviews[['user_id', 'movie_id', 'rating']]
user_by_movie = user_items.groupby(['user_id', 'movie_id'])['rating'].max().unstack()

# create subset for four movies
user_movie_subset = user_by_movie[[73486, 75314,  68646, 99685]].dropna(axis=0)
user_movie_subset

movie_id,73486,75314,68646,99685
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
265,10.0,10.0,10.0,10.0
1023,10.0,4.0,9.0,10.0
1683,8.0,9.0,10.0,5.0
6571,9.0,8.0,10.0,10.0
11639,10.0,5.0,9.0,9.0
13006,6.0,4.0,10.0,6.0
14076,9.0,8.0,10.0,9.0
14725,10.0,5.0,9.0,8.0
23548,7.0,8.0,10.0,8.0
24760,9.0,5.0,9.0,7.0


## SVD Intro

Now we will be performing Singular Value Decomposition.  To get started, let's remind ourselves about the dimensions of each of the matrices we are going to get back. Essentially, we are going to split the **user_movie_subset** matrix into three matrices:

$$ U \Sigma V^T $$

- U: How users are related to potential latent factors
- Σ (sigma): Weights for latent factors in diagonal (used to determine how any latent factors we keep)
- V-Transpose: How items are related to potential latent factors

Ensure the number of rows in each matrix is equal to the number of columns in the preceding matrix, which needs to be true in order to perform the dot productThe matrices should have the following dimensions:

$$ U_{n x k} $$

$$\Sigma_{k x k} $$

$$V^T_{k x m} $$

where:

1. n is the number of users
2. k is the number of latent features to keep
3. m is the number of movies

In [7]:
# perform svd here on user_movie_subset

u, s, vt = np.linalg.svd(user_movie_subset) 
u.shape, s.shape, vt.shape

((20, 20), (4,), (4, 4))

(More about this functionality in the [documentation here](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.linalg.svd.html).)  

Note: THe shapes of your matrices do not allow us yet to take the dot product of the three objects you get back.

Looking at the dimensions of the three returned objects, we can see the following:

 1. The u matrix is a square matrix with the number of rows and columns equaling the number of users. 
 2. The v transpose matrix is also a square matrix with the number of rows and columns equaling the number of items.
 3. The sigma matrix is actually returned as just an array with 4 values, but should be a diagonal matrix.  Numpy has a diag method to help with this.  

In order to set up the matrices in a way that they can be multiplied together, we have a few steps to perform: 

 1. Turn sigma into a square matrix with the number of latent features we would like to keep. 
 2. Change the columns of u and the rows of v transpose to match this number of dimensions. 

 If we would like to exactly re-create the user-movie matrix, we could choose to keep all of the latent features.

In [8]:
# Change the dimensions of u, s, and vt as necessary to use four latent features

# update the shape of u and store in u_new
u_new = u[:, :len(s)]

# update the shape of s and store in s_new
s_new = np.zeros((len(s), len(s)))
s_new[:len(s), :len(s)] = np.diag(s) 

# Because 4 latent features for 4 movies,vt and vt_new are the same
vt_new = vt

In [9]:
# check results

s_new.shape

(4, 4)

### Calculate variability captured through latent features

The sigma matrix can actually tell us how much of the original variability in the user-movie matrix is captured by each latent feature.  _The total amount of variability to be explained is the sum of the squared diagonal elements._ The amount of variability explained by the first component is the square of the first value in the diagonal.  The amount of variability explained by the second component is the square of the second value in the diagonal.   

In [16]:
# calculate variance in matrix and percentage of explained variance for 2 first latent features
total_var = np.sum(s**2)
var_exp_comp1_and_comp2 = s[0]**2 + s[1]**2
perc_exp = (var_exp_comp1_and_comp2 / total_var)

# print your results

print("The total variance in the original matrix is {}.".format(total_var))
print("Ther percentage of variability captured by the first two components is {}%.".format(round(perc_exp * 100, 2)))

The total variance in the original matrix is 5877.0.
Ther percentage of variability captured by the first two components is 98.55%.


### Reduce number of latent features

Similar to the previous question, change the shapes of your u, sigma, and v transpose matrices.  However, this time consider only using the first 2 components to reproduce the user-movie matrix instead of all 4. 

In [17]:
# update the shape of u 
u_2 = u[:, :2]

# update the shape of s 
s_2 = np.zeros((2, 2))
s_2[:2, :2] = np.diag(s[:2]) 

# update the shape of vt 
vt_2 = vt[:2]

In [23]:
# Check that your matrices are the correct shapes
assert u_2.shape == (20, 2), "Oops!  The shape of the u matrix doesn't look right. It should be 20 by 2."
assert s_2.shape == (2, 2), "Oops!  The shape of the sigma matrix doesn't look right.  It should be 2 x 2."
assert vt_2.shape == (2, 4), "Oops! The shape of the v transpose matrix doesn't look right.  It should be 2 x 4."

### Reproduce original matrix with only 2 latent features

When using all 4 latent features, we saw that we could exactly reproduce the user-movie matrix.  Now that we only have 2 latent features, we might measure how well we are able to reproduce the original matrix by looking at the sum of squared errors from each rating produced by taking the dot product as compared to the actual rating.  Find the sum of squared error based on only the two latent features, and use the following cell to test against the solution. 

In [19]:
# Compute the dot product
pred_ratings = u_2.dot(s_2).dot(vt_2)

# Compute the squared error for each predicted vs. actual rating
sum_square_errs = np.sum(np.sum((user_movie_subset - pred_ratings) ** 2))

In [20]:
sum_square_errs

85.33796548142435

**Usefulness of reduced latent features:** Why would we want to choose a k that doesn't just give us back the full user-movie matrix with all the original ratings.  This is a good question.  

One reason might be for computational reasons - sure, you may want to reduce the dimensionality of the data you are keeping, but really this isn't the main reason we would want to perform reduce k to lesser than the minimum of the number of movies or users.

Let's take a step back for a second.  In this example we just went through, your matrix was very clean.  That is, for every user-movie combination, we had a rating.  **There were no missing values.** But what we know from the previous lesson is that the user-movie matrix is full of missing values.  

Therefore, if we keep all k latent features it is likely that latent features with smaller values in the sigma matrix will explain variability that is probably due to noise and not signal. Furthermore, if we use these "noisey" latent features to assist in re-constructing the original user-movie matrix it will potentially (and likely) lead to worse ratings than if we only have latent features associated with signal.

# Add a missing value.

This demonstrates that SVD cannot converge wit NaN values.

In [22]:
# This line adds one nan value as the very first entry in our matrix
user_movie_subset.iloc[0, 0] = np.nan # no changes to this line

# Try svd with this new matrix
u, s, vt = np.linalg.svd(user_movie_subset) # perform svd here on user_movie_subset

LinAlgError: SVD did not converge

---