# Collaborative Filtering Models on MovieLens Dataset

## Process flow

![process_flow](https://github.com/RecoHut-Stanzas/S632684/raw/main/images/process_flow.svg)

## Tutorials

### Data Acquisition and Exploration

[direct link to notebook →](https://nbviewer.org/github/RecoHut-Stanzas/S632684/blob/main/nbs/2021-06-23-collaborative-filtering-movielens-latest-small-01.ipynb)

We are downloading the ml-latest (small) dataset from Grouplens site and reading as a pandas dataframe. Then we are exploring the rating and movie tables and plotting the relations. For instance, here is the rating plot of 100 users over 100 movies:

![Untitled](https://github.com/RecoHut-Stanzas/S632684/raw/main/images/img1.png)

Here, the brighter color indicates higher rating. Black indicates 0 i.e. rating not provided.

### Finding recommendations using memory-based methods

[direct link to notebook (user-CF) →](https://nbviewer.org/github/RecoHut-Stanzas/S632684/blob/main/nbs/2021-06-23-collaborative-filtering-movielens-latest-small-02.ipynb)

[direct link to notebook (item-CF) →](https://nbviewer.org/github/RecoHut-Stanzas/S632684/blob/main/nbs/2021-06-23-collaborative-filtering-movielens-latest-small-03.ipynb)

Here, we implemented user-CF and item-CF models on ML-100k dataset. And evaluated on MAE metric.

### Finding recommendations using model-based methods

[direct link to notebook (SVD) →](https://nbviewer.org/github/RecoHut-Stanzas/S632684/blob/main/nbs/2021-06-23-collaborative-filtering-movielens-latest-small-04.ipynb)

[direct link to notebook (MF) →](https://nbviewer.org/github/RecoHut-Stanzas/S632684/blob/main/nbs/2021-06-23-collaborative-filtering-movielens-latest-small-05.ipynb)

[direct link to notebook (NMF) →](https://nbviewer.org/github/RecoHut-Stanzas/S632684/blob/main/nbs/2021-06-23-collaborative-filtering-movielens-latest-small-06.ipynb)

[direct link to notebook (EMF) →](https://nbviewer.org/github/RecoHut-Stanzas/S632684/blob/main/nbs/2021-06-23-collaborative-filtering-movielens-latest-small-07.ipynb)

Here, we implemented SVD, MF, NMF, and explainable MF models.

### Evaluation of various methods

[direct link to notebook →](https://nbviewer.org/github/RecoHut-Stanzas/S632684/blob/main/nbs/2021-06-23-collaborative-filtering-movielens-latest-small-08.ipynb)

Here, we compared the performance of various memory-based and model-based CF methods on ML-100k and ML-1m datasets on various metrics.

**MAE comparison between User-based and Item-based CF**

| Metric | Dataset | User-based | Item-based |
| --- | --- | --- | --- |
| Euclidean | ML-100k | 0.81 | 0.83 |
| Euclidean | ML-1M | 0.81 | 0.82 |
| Cosine | ML-100k | 0.75 | 0.51 |
| Cosine | ML-1M | 0.73 | 0.42 |

**MAE comparison between MF, NMF and EMF**

| Preprocessing | Dataset | MF | NMF | EMF |
| --- | --- | --- | --- | --- |
| Raw data | ML-100k | 1.497 | 0.951 | 0.797 |
| Raw data | ML-1M | 1.482 | 0.9567 | 0.76 |
| Normalized data | ML-100k | 0.828 | --- | 0.783 |
| Normalized data | ML-1M | 0.825 | --- | 0.758 |

## References

1. [https://github.com/RecoHut-Stanzas/S632684](https://github.com/RecoHut-Stanzas/S632684)
2. [https://github.com/nzhinusoftcm/review-on-collaborative-filtering](https://github.com/nzhinusoftcm/review-on-collaborative-filtering)