## (30%) Finding similiarity
Distance based metrics are commonly used methods to create similiarity between two items. This similarity will be applied in content-based recommendation. They also have been used for a long time in Information Retrieval methods.

Here is a good article on [Distance Metrics for Fun and Profit](http://www.benfrederickson.com/distance-metrics/) by Ben on better understanding of these metrics

Could you based on the provided dataset (stories.csv) to generate similarity matrix between each story? The i, j, cell of this matrix will store the similarity value between story i and story j. You can define your own similarity measures.   

#### Solutions

## (40%) Model - Matrix Completion by Alternating Least Square (ALS)

Since in practical, customers may refuse to rate some items, study following materials. 

For a better and visual understanding of Matrix Factorization Techniques, read the following links
- The original paper on [Collaborative Filtering for Implicit Feedback Datasets](http://yifanhu.net/PUB/cf.pdf)
- [Finding Similar Music using Matrix Factorization](http://www.benfrederickson.com/matrix-factorization/) by Ben Frederickson
- [Intro to Implicit Matrix Factorization: Classic ALS with Sketchfab Models](http://blog.ethanrosenthal.com/2016/10/19/implicit-mf-part-1/)

We will apply ALS method to fill those missing values (put as 0) for following matrix, a rating matrix for six customers (rows number, say 1,2,3,4,5,6) to seven items (columns number, say A,B,C,D,E,F,G):

R = np.array([[5, 0, 5, 0, 0, 1, 2], [0, 4, 0, 0, 2, 2, 1], [1, 0, 0, 1, 0, 1, 0], [0, 0, 1, 0, 1, 1, 0], [0, 2, 1, 0, 0, 1, 0], [0, 0, 3, 0, 1, 2, 0]])

**Decomposition of the rating matrix**  
R -> m users, n items  
Row in P -> user's affinity to the features (m users, f latent factors/features)  
Row in Q -> item's relation to the features (n items, f latent factors/features)  

- P indicates how much user likes f latent factors  
- Q indicates how much one item obtains f latent factors
- The dot product indicates how much user likes item  
- The decomposition automatically ranks features by their impact on the ratings
- Features may not be intuitive though !
- Model has hyperparameters (regularization, learning rate)

Your task is to write a function using iterative method, e.g., stochastic gradient descent, to factorize
matrix as P with dimension (6 x K) and Q with dimension (K x 7), where K is the number of latent features. 
After you get two factorized matrix P and Q, try to multiply them back, could you recommend the top three items of second customers based on this new rating matrix rating scores (0 value cells will have rating scores now). 


#### Solutions

## (20%) Learning to Rank

**Bayesian Personalized Ranking pairwise loss:**

Maximises the prediction difference between a positive example and a randomly chosen negative example. Useful when only positive interactions are present and optimising ROC AUC is desired.

More details can be found for BPR at 

https://www.coursera.org/lecture/matrix-factorization/personalized-ranking-with-daniel-kluver-s3XJo

https://www.slideshare.net/zenogantner/bayesian-personalized-ranking-for-nonuniformly-sampled-items

**Weighted Approximate-Rank Pairwise loss**:  
Maximises the rank of positive examples by repeatedly sampling negative examples until rank violating one is found. Useful when only positive interactions are present and optimising the top of the recommendation list (precision@k) is desired.WARP deals with (user, positive item, negative item) triplets. 

See following for WARP

https://medium.com/@gabrieltseng/intro-to-warp-loss-automatic-differentiation-and-pytorch-b6aa5083187a


This procedure yields roughly the following algorithm:

- For a given (user, positive item pair), sample a negative item at random from all the remaining items. Compute predictions for both items; if the negative item’s prediction exceeds that of the positive item plus a margin, perform a gradient update to rank the positive item higher and the negative item lower. If there is no rank violation, continue sampling negative items until a violation is found.

- If you found a violating negative example at the first try, make a large gradient update: this indicates that a lot of negative items are ranked higher than positives items given the current state of the model, and the model must be updated by a large amount. If it took a lot of sampling to find a violating example, perform a small update: the model is likely close to the optimum and should be updated at a low rate.

You can install lightfm package and use its fetch_movielens data sets.

In [2]:
#Get the data

import numpy as np
from lightfm.datasets import fetch_movielens
from lightfm import LightFM
from lightfm.evaluation import precision_at_k
from lightfm.evaluation import auc_score

**Movie lens dataset:**

![](img/movielens.png)
GroupLens Research has collected and made available rating data sets from the MovieLens web site (http://movielens.org). The data sets were collected over various periods of time, depending on the size of the set.

http://grouplens.org/datasets/movielens/

**Fetch movielens 100k dataset**

The dataset contains 100,000 interactions from 1000 users on 1700 movies, and is exhaustively described
#in its README http://files.grouplens.org/datasets/movielens/ml-100k-README.txt

This data set consists of:
- 100,000 ratings (1-5) from 943 users on 1682 movies. 
- Each user has rated at least 20 movies. 
- Simple demographic info for the users (age, gender, occupation, zip)

### Questions 1: 

Using lightfm package builtin evaluation methods, e.g., precision_at_k and auc_score
to evaluate recommendation performance from Bayesian Personalized Ranking pairwise loss, and Weighted Approximate-Rank Pairwise loss.  

### Questions 2: 

Can you build a recommendation system based on Weighted Approximate-Rank Pairwise loss and perform recommendation for used with id [3, 25, 450]? You should show both known preference of used and predictions results (recommendatdions).
        

#### Solutions

## (10%) Can you implement Discounted cumulative gain (DCG) and Normalized DCG. Then use example of ranking list to demonstrate your implementation.

#### Solutions