<a href="https://colab.research.google.com/github/suryagokul/Data-Science-Portfolio/blob/master/model_based_collaborative_filtering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#  Model-based collaborative filtering


Model-based collaborative filtering methods first create a model of the user, and then build the predictions.

### Types of models
- Probabilistic
- Classification
- Regression
- Clustering
- Rule-based


The netflix prize was a model-baed collaborative filtering.

Top 2 algorithms in the Netflix prize:
- SVD (matrix factorization) -- RMSE 0.8914
- RBM (Restricted Boltzman Machines - neural network) -- RMSE 0.8990
- Ensemble of the two -- RMSE 0.88

# Using SVD (SINGULAR VALUE DECOMPOSITION):


### Intution behind SVD:

The basic idea is that the matrix I start off with, the $X$ matrix, which is very sparse and had the users and the items, I want to colapse it into something that has less dimensions and is much less sparse.

We're going to do that by decomposing my original matrix X into 3:
* $U$ == left singular matrix, representing the relationship between users and latent factors
* $S$ == diagonal matrix describing the strength of each latent factor
* $V$ == right singular matrix, indicating the similarity between items and latent factors. 






r is the number of factors that are in my decompositions.



### **Latent factors?**
Latent factors describe a property or concept that a user or an item have. 
For instance, for music, latent factor can refer to the genre that the music belongs to. SVD decreases the dimension of the utility matrix by extracting its latent factors. Essentially, we map each user and each item into a latent space with dimension r. Therefore, it helps us better understand the relationship between users and items as they become directly comparable.

**Video References of SVD Working**

1. [Fantastic Explanation of Netflix Prize Using SVD.](https://www.youtube.com/watch?v=sooj-_bXWgk&list=PLMrJAkhIeNNSVjnsviglFoY2nXildDCcv&index=9)

2. [Explanation on Matrix Factorization used by SVD.](https://www.youtube.com/watch?v=ZspR5PZemcs)


## Hands-on

In [1]:
!pip install surprise

Collecting surprise
  Downloading https://files.pythonhosted.org/packages/61/de/e5cba8682201fcf9c3719a6fdda95693468ed061945493dea2dd37c5618b/surprise-0.1-py2.py3-none-any.whl
Collecting scikit-surprise
[?25l  Downloading https://files.pythonhosted.org/packages/97/37/5d334adaf5ddd65da99fc65f6507e0e4599d092ba048f4302fe8775619e8/scikit-surprise-1.1.1.tar.gz (11.8MB)
[K     |████████████████████████████████| 11.8MB 13.6MB/s 
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.1-cp36-cp36m-linux_x86_64.whl size=1670913 sha256=c2d82bbaef477784fc2f5802c030ff361a940885d42854dacc0f33a5ed98d51a
  Stored in directory: /root/.cache/pip/wheels/78/9c/3d/41b419c9d2aff5b6e2b4c0fc8d25c538202834058f9ed110d0
Successfully built scikit-surprise
Installing collected packages: scikit-surprise, surprise
Successfully installed scikit-surprise-1.1.1 surprise-0.1


**Note:** After installing, please restart the runtime (Runtime --> Restart runtime)


### 2. Import packages 

In [2]:
from surprise import Dataset, SVD
from surprise.model_selection import cross_validate, split


### 3. Getting data


The `surprise` package has support for downloading movielens dataset, and we're going to use it.

In [3]:
# Load the movielens-100k dataset (download it if needed).
data = Dataset.load_builtin('ml-100k')

Dataset ml-100k could not be found. Do you want to download it? [Y/n] 
Trying to download dataset from http://files.grouplens.org/datasets/movielens/ml-100k.zip...
Done! Dataset ml-100k has been saved to /root/.surprise_data/ml-100k


### 4. Plug-in SVD

In [4]:
# Use the famous SVD algorithm.
algo = SVD()

# Run 5-fold cross-validation and print results.
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9291  0.9439  0.9372  0.9327  0.9363  0.9358  0.0050  
MAE (testset)     0.7333  0.7435  0.7380  0.7362  0.7397  0.7381  0.0034  
Fit time          4.74    4.76    4.73    4.75    4.74    4.74    0.01    
Test time         0.20    0.13    0.20    0.13    0.20    0.17    0.03    


{'fit_time': (4.738926649093628,
  4.763303518295288,
  4.72601056098938,
  4.749563455581665,
  4.743380784988403),
 'test_mae': array([0.73329184, 0.74350207, 0.73796941, 0.73618822, 0.73971734]),
 'test_rmse': array([0.92905726, 0.94389489, 0.9371952 , 0.93271503, 0.93630608]),
 'test_time': (0.20218849182128906,
  0.13275957107543945,
  0.1959681510925293,
  0.1342453956604004,
  0.19782733917236328)}

## Surprise Library Documentations

Surprise already has implemented multiple models, they can be found here: http://surprise.readthedocs.io/en/stable/prediction_algorithms_package.html

Also a benchmark of them can be found: http://surpriselib.com
