# Recommended Joke Notebook

Today, we'll use some joke rating data to check out recommendating systems, in particular collaborative filtering:

Here's a link to the place where I got the data:
http://goldberg.berkeley.edu/jester-data/

Check out some jokes, and get recommended jokes based on your ratings:
http://eigentaste.berkeley.edu/

In [None]:
%matplotlib inline

import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt

### Load the Data

- Each row is a user
- The first column is the total number of jokes rated by that user.
- The remaining columns are the ratings of the individual jokes, from -10 to 10
- An entry of 99 means no rating.

In [None]:
data = pd.read_csv('jester_data.csv', header=None).values
data[data==99] = 0
number_rated = data[:, 0]
ratings = data[:, 1:]
ratings[:5,:5]

In [None]:
ratings.shape

# Do some EDA on the data

How many jokes does an average person rate? What is most jokes rated? What is the fewest number of jokes rated?

How many ratings on average does each joke have?

What is a joke with the most ratings? Which is one with the fewest ratings?

What is the average rating for all jokes? (Be sure to exclude the 0 values!)

# Unregularized SVD Based Collaborative Filtering

## step 1: full SVD

We'll do a simple though not very effective way at collaborative filtering: a simple SVD approach.

SVD (singular value decomposition) is a way of decomposing a matrix into thing like eigenvectors and eigenvalues. It looks like this for a matrix M:

M = U * S * V

Here, if we suppose M is an n by p matrix:
U is an N by p matrix
S is a p by p diagonal matrix
V is a p by p matrix

In our problem, we have n users rating p jokes

In recommender problems, it has the interpretation as follows:
U is a representation of the users, as p features
V is a representation of the jokes, as p features (different from those in U)
S is a vector that gives the joint importance of both feature sets

In [None]:
# full svd; note the first column is dropped since that has the number of jokes rated, we don't care about that
u, s, v = np.linalg.svd(ratings, full_matrices=False)

print u.shape
print s.shape
print v.shape

# these are (nearly) the same, that's the decomposition!
print data[:, 1:]
print np.dot(np.dot(u, np.diag(s)), v) # the full reconstruction

##  Step 2: Filling in values by truncating the SVD

The SVD above just recovers the original matrix. That is not very interesting, we want to get a new estimate for those missing values. To do this, we need to drop dimensions.

Inspect S, you should see that it is sorted in decreasing order. Write code to only take the first few dimensions in each of the projections. This means simply to replace the matrix S (you will have to construct it using np.diag) with a new matrix with many of the diagonal values zeroed out.

You should now have an approximation to the original matrix. Take the filled in values only and compare them to the non filled in values, do this for a few users and jokes. You might find it useful to write a function to inspect a row or column. Considering doing things like:

0. Count how many values are filled in, how many were already present for that row / column
1. Compare the range of the filled in values with those not filled in within the row or column
2. Compare the mean of the filled in values with those not filled in within the row or column

In [None]:
def inspect_filled_in_joke(ii):
    pass
    
def inspect_filled_in_user(ii):
    pass

inspect_filled_in_joke(14)
print "\n-----\n"
inspect_filled_in_user(2)

## Step 3: Let's actually use this thing. 

Write a function to recommend a joke (s)he's not rated to a user who has at least one unrated joke. Use it for a few people and print the results.

Use it on all users and make a histogram of the recommendations. Compare this to a histogram of missing values (number of times a joke was not rated). Does this system recommend a joke very often? Does it make sense?

## Step 4: Examining the joke and user spaces

The SVD gave us projections of both the users and the jokes. Let's see if those projections give us any insight.

Now, plot the first two rows (the projection dimensions appear on the rows here) of the joke projection matrix. Are there different types of jokes?

Plot the first two columns (the projection dimensions appear on the columns here) of the user projection matrix. Are there different types of users?