# First steps
In this notebook, we will reproduce the experiments on real data that the original researchers did in their paper. We will also do some additional experiments to see how the results change when we change the parameters of the algorithm.

In order to reuse the code from the original repository, we will clone it into the current directory. We will then import the necessary classes and functions from the original code. We will also need to download the data used in the experiments. The data is available [here](https://webscope.sandbox.yahoo.com/catalog.php?datatype=r), you should look for the R3 - Yahoo! Music rating for User Selected and Randomly Selected songs. I have added the relevant files to the `data` directory in this repository.

Run the next cell to clone the original repository. Once the repository is cloned, you will be able to find the orignal code in the directory `unbiased-implicit-rec-real`. We should also change the name of the folder to `base_code' to avoid confusion, the following cell also does this.

In [19]:
! git clone https://github.com/usaito/unbiased-implicit-rec-real.git
! mv unbiased-implicit-rec-real base_code

Cloning into 'unbiased-implicit-rec-real'...
remote: Enumerating objects: 92, done.[K
remote: Counting objects: 100% (92/92), done.[K
remote: Compressing objects: 100% (65/65), done.[K
remote: Total 92 (delta 31), reused 68 (delta 17), pack-reused 0[K
Receiving objects: 100% (92/92), 479.73 KiB | 3.45 MiB/s, done.
Resolving deltas: 100% (31/31), done.


Now we should make sure we have the necessary packages installed. Run the next cell to install the necessary packages.

In [9]:
! pip install -r unbiased-implicit-rec-real/requirements.txt

Collecting numpy==1.16.2
  Downloading numpy-1.16.2.zip (5.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.1/5.1 MB[0m [31m16.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting pandas==0.24.2
  Downloading pandas-0.24.2.tar.gz (11.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.8/11.8 MB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting scikit-learn==0.20.3
  Downloading scikit-learn-0.20.3.tar.gz (11.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.8/11.8 MB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
[?25h[31mERROR: Could not find a version that satisfies the requirement tensorflow==1.15.0 (from versions: 2.5.0, 2.5.1, 2.5.2, 2.5.3, 2.6.0rc0, 2.6.0rc1, 2.6.0rc2, 2.6.0, 2.6.1, 2.6.2, 2.6.3, 2

# Imports

In [28]:
from base_code.src.trainer import Trainer

ModuleNotFoundError: No module named 'evaluate'

# Train 3 models
In this next part, we will train 3 models on the data, so that we can compare them.

The models are: 
* Weighted Matrix Factorization (WMF)
* Exposure Matrix Factorization (ExpoMF)
* Relevance Matrix Factorization (Rel-MF) (the model proposed in the paper)

In order to train the models, we will use the ```Trainer``` class from the original repository. The ```Trainer``` takes as input the following parameters:
* ```batch_size```: the size of the batches used to train the model
* ```max_iters```: the maximum number of iterations to train the model
* ```lam```
* ```eta```
* ```model_name```: the name of the model to train (one of ```wmf```, ```expomf```, ```crmf```)

## Define parameters
We will define the parameters for the models. We will use the same parameters as the original researchers.


In [10]:
# Trainer parameters
batch_size = 15
max_iters = 301
lam = 0.0001
eta = 0.005


## Train WMF model
This is one of the baselines used in the paper. We will train it on the data and see how it performs.

In [None]:
model_name = 'wmf'
wmf_trainer = Trainer(model_name, batch_size, max_iters, lam, eta)