<a href="https://colab.research.google.com/github/leukschrauber/LearningPortfolio/blob/main/learn_portfolio_7_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Learning Portfolio

*by Fabian Leuk (csba6437/12215478)*

## Session 7: Collaborative Filtering

### Key Learnings


**Collaborative filtering**

*   Collaborative filtering is a technique used in Recommender Systems, so that past similar preferences of users inform future preferences. It works by displaying the preferences of each user in a vector. The similarity between users is measured as cosine similarity. Computed cosine similarity in turn can be used as weights for the ratings of other users to predict a rating for a certain user.
*  Generally, collaborative filtering is a matrix completion problem.
*  User and item biases, embedding distances and principal component analysises are ways to interpret collaborative filterings results.
*  Collaborative filtering models are trained using latent factors of movies and users.

**Limitations of collaborative filtering**

* One problem of collaborative filtering is the grey sheep problem, where a user has similarities with different types of users and can not be clearly matched to one of the groups. Predicting ratings using user metadata can help in these cases.
* One problem of collaborative filtering is the black sheep problem, where a user has no similarities with other users. Predicting ratings using user metadata can help in these cases.
* One problem of collaborative filtering is Matrix sparsity, where very few users actually rate products. In such cases, user actions on those items (views, etc.) can be used to predict ratings.
* As embedding matrices can become quite huge in real life scenarios, lots of computation power may be needed. Usual methods such a batching and gradient accumulation can be used to mitigate this.
* Certain subgroups overrepresenting the user base can introduce bias to the ratings. This bias in turn attracts more users of the group and the bias becomes stronger. Monitoring the system involving humans is required to solve this issue.
* The bootstrapping problem is related to the fact that new items and users do not have every rating. One solution to the problem is to user item or user metadata to predict initial ratings and replace those ratings over time.

**Machine Learning General Concepts**

* Overfitting of models can be mitigated using L2 Regularization where a penalty is imposed proportionally to current parameter estimations to the loss function.
* Gradient Accumulation can be used to decrease batch sizes and still train as if higher batch sizes would have been processed. This is relevant for decreasing GPU memory usage.
* Rule of thumb: Dividing the batch size by two should result in a reduction of the learning rate by 2.
* Softmax is the exponentiated prediction of the model divided by the sum of exponentiated predictions over every class. It is best suitable for models where exactly one class should be predicted as the output
* Cross entropy is the log of the softmax output for the actual prediction category
* Multi-target models work by calculating different losses for the outputs standing for the respective targets and adding them together. The training will then as usual tweak weights to reduce the loss.
* A Dot Product is the sum of the multiplication of two vectors
* A look-up can be depicted as a multiplication of a vector and a one-hot encoded matrix.

**Python Librariers**

* How to merge two dataframes
* How to use CollabLearner and CollabDataLoaders
* How to include L2 Regularization in CollabLearner
* How to define Modules
* How to use Sigmoid to squash values in custom ranges
* How to cross-tabulate a pandas dataframe


### Collaborative filtering Code

In [1]:
from fastai.collab import *
from fastai.tabular.all import *
set_seed(42)

In [2]:
path = untar_data(URLs.ML_100k)

ratings = pd.read_csv(path/'u.data', delimiter='\t', header=None,
                      names=['user','movie','rating','timestamp'])

movies = pd.read_csv(path/'u.item',  delimiter='|', encoding='latin-1',
                     usecols=(0,1), names=('movie','title'), header=None)

ratings = ratings.merge(movies)

dls = CollabDataLoaders.from_df(ratings, item_name='title', bs=64)

#### From Scratch

In [3]:
class DotProductBias(Module):
    def __init__(self, n_users, n_movies, n_factors, y_range=(0,5.5)):
        self.user_factors = Embedding(n_users, n_factors)
        self.user_bias = Embedding(n_users, 1)
        self.movie_factors = Embedding(n_movies, n_factors)
        self.movie_bias = Embedding(n_movies, 1)
        self.y_range = y_range
        
    def forward(self, x):
        users = self.user_factors(x[:,0])
        movies = self.movie_factors(x[:,1])
        res = (users * movies).sum(dim=1, keepdim=True)
        res += self.user_bias(x[:,0]) + self.movie_bias(x[:,1])
        return sigmoid_range(res, *self.y_range)

In [4]:
model = DotProductBias(len(dls.classes['user']), len(dls.classes['title']), 50)
learn = Learner(dls, model, loss_func=MSELossFlat())
learn.fit_one_cycle(5, 5e-3, wd=0.1)

epoch,train_loss,valid_loss,time
0,0.938159,0.958896,00:08
1,0.866531,0.877066,00:07
2,0.747698,0.831983,00:08
3,0.593823,0.820023,00:07
4,0.493328,0.820173,00:09


#### Using Collab Learner

In [5]:
learn = collab_learner(dls, n_factors=50, y_range=(0, 5.5))
learn.fit_one_cycle(5, 5e-3, wd=0.1)

epoch,train_loss,valid_loss,time
0,0.931108,0.947211,00:08
1,0.845715,0.87791,00:08
2,0.732135,0.835817,00:07
3,0.598224,0.824254,00:08
4,0.490251,0.824255,00:08


#### Using Neural Network

In [6]:
learn = collab_learner(dls, use_nn=True, y_range=(0, 5.5), layers=[100,50])
learn.fit_one_cycle(5, 5e-3, wd=0.1)

epoch,train_loss,valid_loss,time
0,0.962253,0.998203,00:10
1,0.935017,0.915465,00:10
2,0.884734,0.892936,00:10
3,0.859502,0.872788,00:10
4,0.75897,0.869513,00:10
