<a href="https://colab.research.google.com/github/leukschrauber/Assignments/blob/main/assignment_7_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment

*by Fabian Leuk (csba6437/12215478)*

The following assignment consists again of a theoretical part (learning portfolio) and a practical part (assignment). The goal is to train a neural model for a recommendation system.

The plan would be that in the first week we will discuss your learnings from the theory part, that means you are relatively free to fill your Learning Portfolio on this topic and in the following week we will discuss your solutions of the practical part.

In preparation for the practical part, I ask you to familiarize yourself with the following video sources in the next week:

1) Please watch the following videos:

https://www.youtube.com/watch?v=Fmtorg_dmM0&ab_channel=ritvikmath (not absolutely necessary, only for the overview)

https://course.fast.ai/Lessons/lesson7.html (The second part of the presentation starting with the topic collaborative filtering is mandatory)

Note: The first part of the video mainly contains tips for neural networks to submit a Kaggle Competition. For that, you would have to watch the end of the 6th video to understand this better. But this is not mandatory.

2) Please download the following notebooks and edit it in Google-Colab. Try to answer a few questions that are asked at the end. Take notes and update your Learning Portfolio.

https://www.kaggle.com/code/jhoward/collaborative-filtering-deep-dive/notebook

## Key Learnings


**Collaborative filtering**

*   Collaborative filtering is a technique used in Recommender Systems, so that past similar preferences of users inform future preferences. It works by displaying the preferences of each user in a vector. The similarity between users is measured as cosine similarity. Computed cosine similarity in turn can be used as weights for the ratings of other users to predict a rating for a certain user.
*  Generally, collaborative filtering is a matrix completion problem.
*  User and item biases, embedding distances and principal component analysises are ways to interpret collaborative filterings results.
*  Collaborative filtering models are trained using latent factors of movies and users.

**Limitations of collaborative filtering**

* One problem of collaborative filtering is the grey sheep problem, where a user has similarities with different types of users and can not be clearly matched to one of the groups. Predicting ratings using user metadata can help in these cases.
* One problem of collaborative filtering is the black sheep problem, where a user has no similarities with other users. Predicting ratings using user metadata can help in these cases.
* One problem of collaborative filtering is Matrix sparsity, where very few users actually rate products. In such cases, user actions on those items (views, etc.) can be used to predict ratings.
* As embedding matrices can become quite huge in real life scenarios, lots of computation power may be needed. Usual methods such a batching and gradient accumulation can be used to mitigate this.
* Certain subgroups overrepresenting the user base can introduce bias to the ratings. This bias in turn attracts more users of the group and the bias becomes stronger. Monitoring the system involving humans is required to solve this issue.
* The bootstrapping problem is related to the fact that new items and users do not have every rating. One solution to the problem is to user item or user metadata to predict initial ratings and replace those ratings over time.

**Machine Learning General Concepts**

* Overfitting of models can be mitigated using L2 Regularization where a penalty is imposed proportionally to current parameter estimations to the loss function.
* Gradient Accumulation can be used to decrease batch sizes and still train as if higher batch sizes would have been processed. This is relevant for decreasing GPU memory usage.
* Rule of thumb: Dividing the batch size by two should result in a reduction of the learning rate by 2.
* Softmax is the exponentiated prediction of the model divided by the sum of exponentiated predictions over every class. It is best suitable for models where exactly one class should be predicted as the output
* Cross entropy is the log of the softmax output for the actual prediction category
* Multi-target models work by calculating different losses for the outputs standing for the respective targets and adding them together. The training will then as usual tweak weights to reduce the loss.
* A Dot Product is the sum of the multiplication of two vectors
* A look-up can be depicted as a multiplication of a vector and a one-hot encoded matrix.

**Python Librariers**

* How to merge two dataframes
* How to use CollabLearner and CollabDataLoaders
* How to include L2 Regularization in CollabLearner
* How to define Modules
* How to use Sigmoid to squash values in custom ranges
* How to cross-tabulate a pandas dataframe


## Collaborative filtering Code

In [1]:
from fastai.collab import *
from fastai.tabular.all import *
set_seed(42)

In [2]:
path = untar_data(URLs.ML_100k)

ratings = pd.read_csv(path/'u.data', delimiter='\t', header=None,
                      names=['user','movie','rating','timestamp'])

movies = pd.read_csv(path/'u.item',  delimiter='|', encoding='latin-1',
                     usecols=(0,1), names=('movie','title'), header=None)

ratings = ratings.merge(movies)

dls = CollabDataLoaders.from_df(ratings, item_name='title', bs=64)

### From Scratch

In [3]:
class DotProductBias(Module):
    def __init__(self, n_users, n_movies, n_factors, y_range=(0,5.5)):
        self.user_factors = Embedding(n_users, n_factors)
        self.user_bias = Embedding(n_users, 1)
        self.movie_factors = Embedding(n_movies, n_factors)
        self.movie_bias = Embedding(n_movies, 1)
        self.y_range = y_range
        
    def forward(self, x):
        users = self.user_factors(x[:,0])
        movies = self.movie_factors(x[:,1])
        res = (users * movies).sum(dim=1, keepdim=True)
        res += self.user_bias(x[:,0]) + self.movie_bias(x[:,1])
        return sigmoid_range(res, *self.y_range)

In [4]:
model = DotProductBias(len(dls.classes['user']), len(dls.classes['title']), 50)
learn = Learner(dls, model, loss_func=MSELossFlat())
learn.fit_one_cycle(5, 5e-3, wd=0.1)

epoch,train_loss,valid_loss,time
0,0.938159,0.958896,00:08
1,0.866531,0.877066,00:07
2,0.747698,0.831983,00:08
3,0.593823,0.820023,00:07
4,0.493328,0.820173,00:09


### Using Collab Learner

In [5]:
learn = collab_learner(dls, n_factors=50, y_range=(0, 5.5))
learn.fit_one_cycle(5, 5e-3, wd=0.1)

epoch,train_loss,valid_loss,time
0,0.931108,0.947211,00:08
1,0.845715,0.87791,00:08
2,0.732135,0.835817,00:07
3,0.598224,0.824254,00:08
4,0.490251,0.824255,00:08


### Using Neural Network

In [6]:
learn = collab_learner(dls, use_nn=True, y_range=(0, 5.5), layers=[100,50])
learn.fit_one_cycle(5, 5e-3, wd=0.1)

epoch,train_loss,valid_loss,time
0,0.962253,0.998203,00:10
1,0.935017,0.915465,00:10
2,0.884734,0.892936,00:10
3,0.859502,0.872788,00:10
4,0.75897,0.869513,00:10


## Questions

**What problem does collaborative filtering solve?**

The problem to solve is when you have a number of users and a number of products, and you want to recommend which products are most likely to be useful for which users.

**How does it solve it?**

Look at what products the current user has used or liked, find other users that have used or liked similar products, and then recommend other products that those users have used or liked.

**Why might a collaborative filtering predictive model fail to be a very useful recommendation system?**

There is a multitude of reasons, why such a system might turn out to be a bad recommendation system, among them are Overfitting, Bootstrapping issues, Overrepresentation of certain user groups, Black Sheep problems, Grey sheep problems and matrix sparsity.

**What does a crosstab representation of collaborative filtering data look like?**

A crosstab representation of collaborative filtering data is a matrix, where each cell represents the rating of an user U for a movie M.


**What is a latent factor? Why is it "latent"?**

A latent factor is an underlying attribute of an item, which is not included in the data set. It is latent, because it is not explicitly in the dataset.


**What does `pandas.DataFrame.merge` do?**

pandas.DataFrame.merge merges two dataframes based on a common attribute. It is similar to an SQL join.

**What is an embedding matrix?**

The embedding matrix holds the embeddings of users and items for their latent factors in a matrix format

**What is the relationship between an embedding and a matrix of one-hot-encoded vectors?**

The multiplication of an embedding with a one-hot-encoded vector is the same as a lookup for the vector part that has been set to 1.

**Why do we need `Embedding` if we could use one-hot-encoded vectors for the same thing?**

Embeddings provide a more compact, meaningful, and generalizable representation of textual data compared to one-hot-encoded vectors. This means higher computing efficiency.

**What does an embedding contain before we start training (assuming we're not using a pretained model)?**

Randomly initialized numbers.

**What does `x[:,0]` return?**

It is a slicing operation that returns the first column of a two-dimensional array or matrix x. It selects all rows (denoted by :) and the element at index 0 in each row.


**What is a good loss function to use for MovieLens? Why?**

A good loss function for MovieLens is Root Mean Squared Error or Mean Absolute Error between the predicted rating and the actual rating of a user, because we are trying to predict a continuous variable between 0 and 5. 

To make better predictions, we can squeeze the predictions between 0 and 5 using sigmoid before calculating RMSE.


**What would happen if we used cross-entropy loss with MovieLens? How would we need to change the model?**

In order to use cross-entropy loss, the model would have to predict 5 categories (1 to 5) instead of a single continuous score between 0 and 5.

**What is the use of bias in a dot product model?**

Adding a user bias and an item bias will help in capturing base level intercepts for the users and items.

**What is another name for weight decay?**

L2 Regularization

**Write the equation for weight decay (without peeking!).**



```
loss_with_wd = loss + wd * (parameters**2).sum()
```


**Write the equation for the gradient of weight decay. Why does it help reduce weights?**

```
parameters.grad += wd * 2 * parameters
```

It does so by adding a penalty proportional to the parameters.

**Why does reducing weights lead to better generalization?**

By including the penalty term, L2 regularization encourages the model to find a balance between fitting the training data well (low loss) and keeping the weights small (low penalty). As a result, it tends to push the model towards smaller weight values and less overfitting of data.

**What does `argsort` do in PyTorch?**

Sorting a tensor.

**Does sorting the movie biases give the same result as averaging overall movie ratings by movie? Why/why not?**

Yes, the movie bias represents the general tendency of a movie rating as well as the mean of the ratings.

**How do you print the names and details of the layers in a model?**

```
learn.model
```

**What is the "bootstrapping problem" in collaborative filtering?**

The bootstrapping problem occurs when a new user signs up or a new item is introduced, which have no ratings yet. Because of this, no similar users/items can be found.

**How could you deal with the bootstrapping problem for new users? For new movies?**

There are several possibilites. One of them is to set the ratings  to reflect the average taste initially and slowly replace by the ratings. Another solution is to ask questions to generate some user metadata and user the user metadata to predict an initial set of ratings.

**How can feedback loops impact collaborative filtering systems?**

If a small number of your users tend to set the direction of your recommendation system, then they are naturally going to end up attracting more people like them to your system. And that will, of course, amplify the original representation bias. This type of bias has a natural tendency to be amplified exponentially. 

**When using a neural network in collaborative filtering, why can we have different numbers of factors for movies and users?**

This is because users and movies often have different characteristics and behaviors, and it can be beneficial to represent them with a different number of factors.

**Why is there an `nn.Sequential` in the `CollabNN` model?**

In the context of the CollabNN model, the nn.Sequential is used to define the overall architecture of the model by stacking multiple layers together. It helps in creating a sequential flow of data through the layers, where the output of one layer serves as the input to the next layer. This allows the model to learn hierarchical representations of the input data.


**What kind of model should we use if we want to add metadata about users and items, or information such as date and time, to a collaborative filtering model?**

A possible way to incorporate such information would be to build an Ensemble where the Recommender System does a rating and another neural network incorporates the metadata.


***Write the code to create a crosstab representation of the MovieLens data (you might need to do some web searching!).***


In [7]:
pd.pivot_table(ratings, values='rating', index='user', columns='movie', aggfunc='mean')

movie,1,2,3,4,5,6,7,8,9,10,...,1673,1674,1675,1676,1677,1678,1679,1680,1681,1682
user,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,5.0,3.0,4.0,3.0,3.0,5.0,4.0,1.0,5.0,3.0,...,,,,,,,,,,
2,4.0,,,,,,,,,2.0,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
5,4.0,3.0,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
939,,,,,,,,,5.0,,...,,,,,,,,,,
940,,,,2.0,,,4.0,5.0,3.0,,...,,,,,,,,,,
941,5.0,,,,,,4.0,,,,...,,,,,,,,,,
942,,,,,,,,,,,...,,,,,,,,,,


***What is a dot product? Calculate a dot product manually using pure Python with lists.***

It is an operation that takes two vectors and returns a scalar. It is defined as the sum of the products of the corresponding components of the vectors

In [8]:
def dot_product(vector1, vector2):
    if len(vector1) != len(vector2):
        raise ValueError("Vectors must have the same length.")
    
    result = 0
    for i in range(len(vector1)):
        result += vector1[i] * vector2[i]
    
    return result

vector1 = [2, 3, 4]
vector2 = [5, 6, 7]
dot_product(vector1, vector2)

56

***Create a class (without peeking, if possible!) and use it.***

In [9]:
class Example():
  def __init__(self):
    pass

  def print_yo(self):
    print("yo")

abc = Example()
abc.print_yo()

yo
