[Assignment Lecture](https://youtu.be/g18M1B_PFJw)

[View in Colaboratory](https://colab.research.google.com/github/schwaaweb/aimlds1_08-UnsupervisedLearning/blob/master/Th08_A--DJ--Foundations_of_Recommendation_Systems.ipynb)

**Ojectives:**

1) Acquire an understanding of the different flavors/techniques for making recommendations 

2) Utilize Matrix Factorization to estimate/predict missing user ratings
 
3) Measure Recommendation Accuracy

### Objective 1: Acquire an understanding of the different flavors/techniques for making recommendations

**1)** **Content-based Recommendations** i.e. recommend products/items based on the attibutes of the product/item

Content-based recommendation engines utilize their knowledge of the attibutes of each product/item to recommend new products/items. 

Let's say that you inform your colleague at work that you just watched the movie **Rock** starring *Nicholas Cage* and that you really liked the movie. Based on this information, your colleague might recommend that you watch the movie **Con Air** next. Both movies are Action-packed Thrillers and both movies feature the same movie star. It could be a good recommendation because the movies have a few attributes in common. This is the fundamental idea behind content-based recommendation engines. Content-based recommendation engines recommend products that have similar attributes to a product that the user already liked.

**Point to Note**: Content-based recommendations can be effective only if you have *descriptive data* available for each product/item that you want to recommend. However, creating detailed information for each product/item in your inventory is not only a time-consuming process but it also introduces a fair degree of subjectivity that can throw off your recommendation results.

**2)** **Collaborative Filtering** i.e. recommend products solely based on user ratings

Collaborative filtering recommendation engines generate recommendations solely based on how users rated products in the past; it only possesses knowledge on how other users rated the product and it uses those past ratings to make new recommendations.

Let's say that **Thomson** has rated the following movies:

Rock: 5-Star

Con Air: 5-Star

Finding Nemo: 2-Star

Kubo and the 2 Strings: 1-Star


**Caleb** has given the following ratings to the movies:

Rock: 5 Star

Con Air: (No Rating)

Finding Nemo: 2-Star

Kubo and the 2 Strings: (No Rating)


**Aaron** has given the following ratings to the movies:

Rock: 2 Star

Con Air: 1 Star

Finding Nemo: 5-Star

Kubo and the 2 Strings: 4-Star

**Susanna** has given the following ratings to the movies:

Rock: 2 Star

Con Air: 1 Star

Finding Nemo: 5-Star

Kubo and the 2 Strings: 4-Star


Based on the user rating, a Collaborative filtering recommendation engine would recomend the movie "Con Air" to Caleb since Thomson and Caleb seem to have similar tastes.

**Collaborative filtering **has an advantage over content-based recommendations. Once you have user review data, you can build a collaborative filtering recommendation system. No knowledge about the attributes of the products/items being recommended is required. However if there is no information on user reviews/ratings, recommendations cannot be made.

**Some limitations of Collaborative Filtering:**

Collaborative Filtering only works when you already have user reviews to work from. If you do not have any reviews, you cannot make recommendations. Also, collaborative filtering tends to favor items with lots of reviews over items with few reviews. This can make it difficult for users to uncover new releases since they are not likely to get recommended as often.




### Objective 2: Utilize Matrix Factorization to estimate/predict missing user ratings

To develop a framework for making recommendations, we will walk-through an example:

**Goal**: Predict missing user ratings so that recommendations about similar products/items (i.e. movies for the purposes of this example) can be made

**Available Data**: **a)** User ratings for movies, **b)** List of movie titles and the respective genres

**Process**: In order to compute missing user ratings, we need 2 key pieces of information:

1) the user preferences across a range of attributes (the attributes could encompass Action, Comedy, Romance, Horror, etc.) 

2) the ratings for *each* movie across the same set of attributes

We will utilize the "User ratings for movies" data set to factor out a "User Ratings" matrix as well as a "Movie Rating" matrix. Once we have the 2 pieces of information highlighted above, we will leverage the concepts of Linear Algebra (i.e. **Matrix Multiplication**) to arrive at a value for "User Movie Rating" for each movie across every user which will provide an estimate/prediction for the missing user ratings so that the appropriate recommendations can be surfaced to each user

**User Movie Ratings** = User Ratings (across attributes) **X** Movie Ratings (across the same set of attributes) 

*Point to note*: When computing the following matrices: User Ratings (across attributes) and the Movie Ratings (across attributes), we have no idea what each attribute/feature is. All we know is that each attribute/feature represents some characteristic that has attracted users to certain products/items (in this case movies). Since we are unsure of how to describe those characteristics in words, they are deemed as **latent** features. The word latent just means hidden. 



In [2]:
# Import the requisite packages
import numpy as np
import pandas as pd
from pandas import Series, DataFrame


#df = pd.read_csv('https://www.dropbox.com/s/4ec9l887mth6rep/movie_ratings.csv?raw=1')
#print(df['userid'])
#df.head()

In [5]:
#!wget -c https://www.dropbox.com/s/4ec9l887mth6rep/movie_ratings.csv?raw=1
#!mv movie_ratings.csv?raw=1 movie_ratings.csv
#!ls -lh movie_ratings.csv ml-latest-small/*
df = pd.read_csv('movie_ratings.csv')
df.head()


   userid  movieid  rating
0       1       28       4
1       1       26       4
2       1        9       4
3       1        1       4
4       1       14       4

In [6]:
# Transpose the dataset set so that the columns represent the individual movies
df_pivot = pd.pivot_table(df, index='userid', columns='movieid', aggfunc=np.max).fillna(0)
print(df_pivot)

        rating                                              ...             \
movieid     1    2    3    4    5    6    7    8    9    10 ...    25   26   
userid                                                      ...              
1          4.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  4.0  0.0 ...   0.0  4.0   
2          5.0  5.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0   
3          4.0  4.0  5.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0   
4          5.0  5.0  0.0  5.0  5.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0   
5          5.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  5.0  0.0 ...   0.0  0.0   
6          5.0  5.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0   
7          5.0  0.0  0.0  2.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0   
8          4.0  0.0  5.0  0.0  0.0  0.0  0.0  0.0  0.0  5.0 ...   0.0  0.0   
9          5.0  0.0  5.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0   
10         4.0  0.0  4.0  0.0  0.0  0.0  0.0  4.0  0.0  0.0 ... 

We will utilize ** Matrix Factorization** to arrive at the 2 matrices i.e. **a)** User Ratings (across attributes)** b) ** Movie Ratings (across attributes). Once we have the 2 matrices, we will compute the "dot" product of the 2 matrices to come up with an estimate/prediction for the  missing user ratings 

There are a few different variations of the "Matrix Factorization" implementation, we will use the code put together by Albert Yeung: http://www.quuxlabs.com/wp-content/uploads/2010/09/mf.py_.txt

The mathematics behind "Matrix Factorization" can be reviewed here: http://www.quuxlabs.com/blog/2010/09/matrix-factorization-a-simple-tutorial-and-implementation-in-python/#source-code

Another resource on the concepts of "Matrix Factorization": https://www.youtube.com/watch?v=o8PiWO8C3zs




In [8]:
#
# Created by Albert Au Yeung (2010)
#
# An implementation of matrix factorization
#


def matrix_factorization(R, P, Q, K, steps=5000, alpha=0.0002, beta=0.02):
    Q = Q.T
    for step in range(steps):
        for i in range(len(R)):
            for j in range(len(R[i])):
                if R[i][j] > 0:
                    eij = R[i][j] - np.dot(P[i,:],Q[:,j])
                    for k in range(K):
                        P[i][k] = P[i][k] + alpha * (2 * eij * Q[k][j] - beta * P[i][k])
                        Q[k][j] = Q[k][j] + alpha * (2 * eij * P[i][k] - beta * Q[k][j])
        eR = np.dot(P,Q)
        e = 0
        for i in range(len(R)):
            for j in range(len(R[i])):
                if R[i][j] > 0:
                    e = e + pow(R[i][j] - np.dot(P[i,:],Q[:,j]), 2)
                    for k in range(K):
                        e = e + (beta/2) * ( pow(P[i][k],2) + pow(Q[k][j],2) )
        if e < 0.001:
            break
    return P, Q.T

In [12]:
%%time
   # R     : a matrix to be factorized, dimension N x M
   # P     : an initial matrix of dimension N x K
   # Q     : an initial matrix of dimension M x K
   # K     : the number of latent features
   # steps : the maximum number of steps to perform the optimization
   # alpha : the learning rate
   # beta  : the regularization parameter  
   R = df_pivot.as_matrix()
   N = len(R)
   M = len(R[0])
   K = 2

   P = np.random.rand(N,K) # Matrix of user attributes
   Q = np.random.rand(M,K) # Matrix of movie attributes

   nP, nQ = matrix_factorization(R, P, Q, K)

CPU times: user 2min 25s, sys: 462 ms, total: 2min 26s
Wall time: 2min 26s


In [13]:
# User Movie Ratings = User Ratings (across attributes) **X** Movie Ratings (across the same set of attributes) 

nR = np.dot(nP, nQ.T)

print(nR) # Estimate/Prediction for the user movie ratings
print(nR.shape)
print(nR[0,:]) # Ratings from first user
#print(nP)

#print(nQ)

[[4.23005046 4.12387145 4.40580732 ... 3.81956507 3.54905316 3.82854076]
 [4.65133688 4.52913345 4.84795448 ... 4.26417755 3.87550731 4.28575577]
 [4.33281832 4.26529507 4.48744814 ... 3.42652526 3.83964444 3.34712464]
 ...
 [5.61403284 5.47140451 5.84834668 ... 5.08939029 4.70175372 5.10497616]
 [4.40819766 4.33906574 4.5657849  ... 3.49125041 3.90429315 3.41140078]
 [3.74908253 3.58653006 3.9470117  ... 4.19170476 2.80628727 4.34671711]]
(100, 34)
[4.23005046 4.12387145 4.40580732 3.52214983 4.2507842  4.14965001
 3.89504467 4.15875503 4.15002301 4.33940683 3.95003292 4.13607658
 4.23989422 4.01826964 3.76730842 4.08296213 2.67825045 3.00565499
 3.52235554 3.78140249 4.11902559 4.03674408 3.39647884 4.29631405
 3.96961579 4.28912501 3.85984153 4.01363584 3.25424957 3.29716875
 3.07097633 3.81956507 3.54905316 3.82854076]


### Objective 3: Measure Recommendation Accuracy

To measure the accuracy of estimated "User Movie Ratings" , we will use a  statistical metric called root-mean-square-error (RMSE). RMSE is a meaure of the difference between the user's *actual *movie ratings and the ratings that were predicted for the same movies. 

The lower the root-mean-square-error, the more accurate the model. A root-mean-square-error of zero means our model perfectly estimates user ratings. If the root-mean-square-error for the test set is much higher than that of the training set, it is likely that you may have overfitted the data. In this example scenario, we are off by about one rating star on average when predicting user ratings.

We could adjust the **regularization** amount parameter to improve the accuracy of the recommendations. Regularization limits the amount of weight we place on a single attribute when finding user/item features with matrix factorization. The higher we set the regularization amount, the less weight we put on any single attribute. When you are building a recommendation engine, you will want to experiment with different regularization values to see how it affects the quality of your recommendations.

**Note**: One limitation that we faced in this example scenario is that we only had a few hundred movie reviews to work with; the best thing that could be done to improve accuracy in this case is to get more user reviews. More movie reviews will give our engine more information to work with so it can do a better job of making recommendations.





In [15]:
# Import the requisite packages
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

# Load user ratings
df_training = pd.read_csv('https://www.dropbox.com/s/ej76ujnyxmxn2mi/movie_ratings_training.csv?raw=1')
df_testing = pd.read_csv('https://www.dropbox.com/s/h5jwlmnaxu5xdvq/movie_ratings_testing.csv?raw=1')

# Convert the running list of user ratings into a matrix
df_training = pd.pivot_table(df_training, index='userid', columns='movieid', aggfunc=np.max)
df_testing = pd.pivot_table(df_testing, index='userid', columns='movieid', aggfunc=np.max)

R = df_training.as_matrix()
N = len(R)
M = len(R[0])
K = 2

P = np.random.rand(N,K) # Matrix of user attributes
Q = np.random.rand(M,K) # Matrix of movie attributes

nP, nQ = matrix_factorization(R, P, Q, K)

# Find all predicted ratings by multiplying nP and nQ 
   
nR = np.dot(nP, nQ.T)

# print(nR) # Estimate/Prediction for the user movie ratings

def RMSE(real, predicted):

  return np.sqrt(np.nanmean(np.square(real - predicted)))

# Measure RMSE
training_rmse = RMSE(df_training.as_matrix(), nR)
testing_rmse = RMSE(df_testing.as_matrix(), nR)

print("Training RMSE is: {}".format(training_rmse))
print("Testing RMSE is: {}".format(testing_rmse))

Training RMSE is: 0.5892891585261517
Testing RMSE is: 1.3177666668894445


Training RMSE is: 0.6045913541547454
Testing RMSE is: 1.2873565533092435


**Additional Resources on Collaborative Filtering for your reference:** https://towardsdatascience.com/various-implementations-of-collaborative-filtering-100385c6dfe0

# Assignment

Write a function `recommend_for_user(user)` that takes a user and returns the top 5 recommended movies for them to watch that they have not already rated.

You should use the code from the lecture and coding challenge, and experiment with different approaches - you can use the predicted ratings we generated above, and you can experiment with different distance measurements or hybrid approaches (considering item similarity). You can also try an approach where e.g. you find the `n` users most similar to the given user, and average their preferences to generate new ratings and recommendations.

The end result should be a function that is reusable and suitable for an application and that satisfies this spec:

```
def recommend_for_user(user, n=5):
  """
  Generate movie recommendations for a user.
  Input: userId (from MovieLens data)
  Output: list of tuples (movieId, predictedRating) of top n movies recommended for user (not previously rated by them)
  """
  pass
```

As a stretch goal, prepare the data such that you generate test data - get a subset of users and drop some of their top ratings (favorite movies), then generate recommendations for them and see if they get recommended those movies and at what rating.

Part of the goal here is to think about making your code general and reusable, so it could plug into an actual live recommendation system. As a super-stretch goal, you can explore popular Python web application frameworks - these are the tools used to build the back-end that exposes an API so data can be displayed to the user.

- http://flask.pocoo.org/ - popular minimal framework
- https://www.djangoproject.com/ - "industry-grade" framework (more complicated, includes more features)

In a real-world situation, your role in a team could be ensuring that the API endpoint for generating recommendations does what it should (so everyone else can build on it). In web application terminology, this means that some route (e.g. `/recommendations/<user>`) can accept requests and return JSON data in response, essentially wrapping the function you built for this assignment.