<div class="alert alert-block alert-info">
<b>Deadline:</b> March 20, 2024 (Wednesday) 23:00
</div>

# Exercise 2. Recommender system

In this exercise, your task is to design a recommender system.

## Learning goals:
* Practise tuning a neural network model by using different regularization methods.

In [2]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

import tools
import data

In [3]:
skip_training = False  # Set this flag to True before validation and submission

In [4]:
# During evaluation, this cell sets skip_training to True
# skip_training = True

import tools, warnings
warnings.showwarning = tools.customwarn

In [5]:
# When running on your own computer, you can specify the data directory by:
# data_dir = tools.select_data_dir('/your/local/data/directory')
data_dir = tools.select_data_dir("data")

The data directory is data


In [6]:
# Select the device for training (use GPU if you have one)
#device = torch.device('cuda:0')
device = torch.device('cpu')

In [7]:
if skip_training:
    # The models are always evaluated on CPU
    device = torch.device("cpu")

## Ratings dataset

We will train the recommender system on the dataset in which element consists of three values:
* `user_id` - id of the user (the smallest user id is 1)
* `item_id` - id of the item (the smallest item id is 1)
* `rating` - rating given by the user to the item (ratings are integer numbers between 1 and 5).

The recommender system need to predict the rating for any given pair of `user_id` and `item_id`.

We measure the quality of the predicted ratings using the mean-squared error (MSE) loss:
$$
  \frac{1}{N}\sum_{i=1}^N (r_i - \hat{r}_i)^2
$$
where $r_i$ is a real rating and $\hat{r}_i$ is a predicted one.

Note: The predicted rating $\hat{r}_i$ does not have to be an integer number.

In [8]:
trainset = data.RatingsData(root=data_dir, train=True)
testset = data.RatingsData(root=data_dir, train=False)

In [9]:
# Print one sample from the dataset
x = trainset[0]
print(f'user_id={x[0]}, item_id={x[1]}, rating={x[2]}')

user_id=1, item_id=1, rating=5


# Model

You need to design a recommender system model with the API described in the cell below.

Hints on the model architecture:
* You need to use [torch.nn.Embedding](https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html?highlight=embedding#torch.nn.Embedding) layer to convert inputs `user_ids` and `item_ids` into reasonable representations. The idea of the embedding layer is that we want to represent similar users with values that are close to each other. The original representation as integers is not good for that. By using the embedding layer, we can learn such useful representations automatically.

### Model tuning

In this exercise, you need to tune the architecture of your model to achieve the best performance on the provided test set. You will notice that overfitting is a severe problem for this data: The model can easily overfit the training set producing poor accuracy on the out-of-training (test) data.

You need to find an optimal combination of the hyperparameters, with some hyperparameters corresponding to the regularization techniques that we studied in the lecture.

The hyperparameters that you are advised to consider:
* Learning rate value and learning rate schedule (decresing the learning rate often has positive effect on the model performance)
* Number of training epochs
* Network size
* Weight decay
* Early stopping
* Dropout
* Increase amount of data:
  * Data augmentation
  * Injecting noise

You can tune the hyperparameters by, for example, grid search, random search or manual tuning. In that case, you can use `architecture` argument to specify the hyperparameters that define the architecture of your network. After you have tuned the hyperparameters, set the default value of this argument to the optimal set of the hyparameters so that the best architecture is used in the accuracy tests.

Note:
* The number of points that you will get from this exercise depends on the MSE loss on the test set:
  * below 1.00: 1 point
  * below 0.95: 2 points
  * below 0.92: 3 points
  * below 0.90: 4 points
  * below 0.89: 5 points
  * below 0.88: 6 points 

In [61]:
class RecommenderSystem(nn.Module):
    def __init__(self, n_users, n_items,
                 architecture=None  # If you want to tune the hyperparameters automatically (e.g. using random
                                    # search), use this argument to specify the hyperparameters that define the
                                    # architecture of your network. After you have tuned the hyperparameters,
                                    # set the default value of this argument to the optimal set of the hyparameters
                                    # so that the best architecture is used in the accuracy tests.
                ):
        """
        Args:
          n_users: Number of users.
          n_items: Number of items.
        """
        super(RecommenderSystem, self).__init__()
        self.dim1 = 2
        self.dim2 = 10
        self.dim3 = 20
        self.embedded1  = nn.Embedding(n_users, self.dim1)
        self.embedded2  = nn.Embedding(n_items, self.dim1)
        self.block      = nn.Sequential(
            #nn.Linear(2 * self.dim1, self.dim2),
            #nn.ReLU(),
            #nn.Linear(self.dim2, self.dim3),
            #nn.ReLU(),
            #nn.Linear(self.dim3,1)
            nn.Linear(2 * self.dim1, self.dim2),
            nn.ReLU(),
            nn.Linear(self.dim2,1)
        )
        
        
    def forward(self, user_ids, item_ids):
        """
        Args:
          user_ids of shape (batch_size): User ids (starting from 1).
          item_ids of shape (batch_size): Item ids (starting from 1).
        
        Returns:
          outputs of shape (batch_size): Predictions of ratings.
        """
        #print('User IDs shape:', user_ids.shape)
        #print('Item IDs shape:', item_ids.shape)
        #print('These should be the batchsize 5')
        user_ids = self.embedded1(user_ids)
        item_ids = self.embedded2(item_ids)
        concat = torch.cat([user_ids, item_ids], dim=1)
        
        return self.block(concat).squeeze()
        

You can test the shapes of the model outputs using the function below.

In [62]:
def test_RecommenderSystem_shapes():
    n_users, n_items = 100, 1000
    model = RecommenderSystem(n_users, n_items)
    batch_size = 10
    user_ids = torch.arange(1, batch_size+1)
    item_ids = torch.arange(1, batch_size+1)
    output = model(user_ids, item_ids)
    print(output.shape)
    assert output.shape == torch.Size([batch_size]), "Wrong output shape."
    print('Success')

test_RecommenderSystem_shapes()

torch.Size([10])
Success


In [63]:
# This cell is reserved for testing

In [64]:
class EarlyStopping:
    def __init__(self, tolerance, patience):
        """
        Args:
          patience (int):    Maximum number of epochs with unsuccessful updates.
          tolerance (float): We assume that the update is unsuccessful if the validation error is larger
                              than the best validation error so far plus this tolerance.
        """
        self.tolerance = tolerance
        self.patience = patience
    
    def stop_criterion(self, val_errors):
        """
        Args:
          val_errors (iterable): Validation errors after every update during training.
        
        Returns: True if training should be stopped: when the validation error is larger than the best
                  validation error obtained so far (with given tolearance) for patience epochs (number of consecutive epochs for which the criterion is satisfied).
                 
                 Otherwise, False.
        """
        if len(val_errors) <= self.patience:
            return False

        min_val_error = min(val_errors)
        val_errors = np.array(val_errors[-self.patience:])
        return all(val_errors > min_val_error + self.tolerance)

## Train the model

You need to train a recommender system using **only the training data.** Please use the test set to select the best model: the model that generalizes best to out-of-training data.

**IMPORTANT**:
* During testing, the predictions are produced by `predictions = model(user_ids, item_ids)` with the `user_ids` and `item_ids` loaded from `RatingsData`.
* There is a size limit of 30Mb for saved models.

In [65]:
# Create the model
# IMPORTANT: the default value of the architecture argument should define your best model.
model = RecommenderSystem(trainset.n_users, trainset.n_items)

In [66]:
# Implement the training loop in this cell
def to_device(data, device):
    if isinstance(data, (list,tuple)):
        return [to_device(x, device) for x in data]
    return data.to(device)#, non_blocking=True)

if torch.cuda.is_available():
    # Storing ID of current CUDA device
    cuda_id = torch.cuda.current_device()
    print(f"ID of current CUDA device: {torch.cuda.current_device()}")
        
    print(f"Name of current CUDA device: {torch.cuda.get_device_name(cuda_id)}")

    print(device.type)

ID of current CUDA device: 0
Name of current CUDA device: NVIDIA GeForce RTX 2080
cpu


In [67]:
def compute_accuracy(net, ids, items, labels):
    net.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        outputs = net.forward(ids, items)
        predicted = torch.round(outputs).long()
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    return correct / total

In [68]:
def compute_mse_loss(net, x, y, z):
    net.eval()
    mse_loss = nn.MSELoss()
    total_loss = 0
    total_samples = 0
    
    with torch.no_grad():
        #ids = to_device(x, device)
        #items = to_device(y, device)
        #labels = to_device(z, device)
        ids = x
        items = y
        labels = z
        
        outputs = net.forward(ids, items)
        loss = mse_loss(outputs, labels)
        total_loss += loss.item() * labels.size(0)
        total_samples += labels.size(0)
        
    return total_loss / total_samples

In [69]:
# This is the function to print the progress during training
def print_progress(epoch, train_error, val_error):
    print('Epoch {}: Train error: {:.4f}, Test error: {:.4f}'.format(
        epoch, train_error, val_error))

In [70]:
def add_noise(x, noise_std):
    """Add Gaussian noise to a PyTorch tensor.
    
    Args:
      x (tensor): PyTorch tensor of inputs.
      noise_std (float): Standard deviation of the Gaussian noise.
      
    Returns:
      x: Tensor with Gaussian noise added.
    """
    return x + noise_std * torch.randn(x.shape)

In [71]:
# Implement the training loop in this cell
#model = RecommenderSystem(trainset.n_users, trainset.n_items)

#print('Number of trainable parameters: ', sum(p.numel() for p in model.parameters() if p.requires_grad))

# TODO: Increase the amount of data
# Either by inserting noise or data augmentation


# Learning rate value and learning rate schedule 
# (decresing the learning rate often has positive 
# effect on the model performance)
# Number of training epochs
# Network size
# Weight decay
# Early stopping
# Dropout
# Increase amount of data:
  # Data augmentation
  # Injecting noise
"""
if not skip_training:
    #torch.cuda.empty_cache()
    #memory_allocated = torch.cuda.memory_allocated()
    #print("Memory allocated on GPU:", memory_allocated * 1e-9, "gigabytes")

    #memory_reserved = torch.cuda.memory_reserved()
    #print("Memory reserved on GPU:", memory_reserved * 1e-9, "gigabytes")

    loss_fn = nn.MSELoss()
    #optimizer = optim.Adam(model.parameters(), lr=0.0001)
    optimizer = optim.SGD(model.parameters(), lr=0.0001, momentum=0.9, weight_decay=0.0001)
    epochs = 2000
    model.to(device)

    train_errors = []  # Keep track of the training error
    val_errors = []  # Keep track of the validation error
    early_stop = EarlyStopping(tolerance=0.0001, patience=20)

    #ids         = to_device(torch.LongTensor(trainset[0]), device)
    #items       = to_device(torch.LongTensor(trainset[1]), device)
    #labels      = to_device(torch.FloatTensor(trainset[2]), device)

    

    ids = []
    items = []
    labels = []

    for sample in trainset:
        ids.append(sample[0])
        items.append(sample[1])
        labels.append(sample[2])

    ids_test = []
    items_test = []
    labels_test = []

    for sample in testset:
        ids_test.append(sample[0])
        items_test.append(sample[1])
        labels_test.append(sample[2])
    


    #ids         = to_device(torch.LongTensor(ids), device)
    #items       = to_device(torch.LongTensor(items), device)
    #labels      = to_device(torch.FloatTensor(labels), device)

    #ids = torch.LongTensor(ids)
    #items = torch.LongTensor(items)
    #labels = torch.FloatTensor(labels)

    #ids = to_device(ids, device)
    #items = to_device(items, device)
    #labels = to_device(labels, device)

    #print('Ids device: ', ids.device)
    #print('Items device: ', items.device)
    #print('Labels device: ', labels.device)


    #ids_test    = to_device(torch.LongTensor(ids_test), device)
    #items_test  = to_device(torch.LongTensor(ids_test), device)
    #labels_test = to_device(torch.FloatTensor(ids_test), device)

    #ids_test = torch.LongTensor(ids_test)
    #items_test = torch.LongTensor(items_test)
    #labels_test = torch.FloatTensor(labels_test)

    #ids_test = to_device(ids_test, device)
    #items_test = to_device(items_test, device)
    #labels_test = to_device(labels_test, device)


    #print('Ids_test device: ', ids_test.device)
    #print('Items_test device: ', items_test.device)
    #print('Labels_test device: ', labels_test.device)

    std1 = 0.5
    std2 = 0.3

    #ids_augmented   = torch.cuda.LongTensor((ids + to_device(torch.rand(ids.shape) * std1, device)).long())
    #items_augmented = torch.cuda.LongTensor((items + to_device(torch.rand(items.shape) * std2, device)).long())
    #labels = torch.cat([labels, labels], dim=0)

    #ids = to_device(torch.cat([ids, ids_augmented]), device)
    #items = to_device(torch.cat([items, items_augmented]), device)

    for epoch in range(epochs):
        #torch.cuda.empty_cache()
        model.train()

        memory_allocated = torch.cuda.memory_allocated()
        print("Memory allocated on GPU on Epoch ", epoch, ":", memory_allocated * 1e-9, "gigabytes")

        for batch in train_loader:
            optimizer.zero_grad()
            ids, items, labels = batch

            #ids = to_device(torch.LongTensor(ids), device)
            #items = to_device(torch.LongTensor(items), device)
            #labels = to_device(torch.FloatTensor(labels.float()), device)
            
            ids = torch.LongTensor(ids)
            items = torch.LongTensor(items)
            labels = torch.FloatTensor(labels.float())

            ids = ids.to(device)
            items = items.to(device)
            labels = labels.to(device)

            outputs = model.forward(ids, items)
            loss = loss_fn(outputs, labels)
            loss.backward()
            optimizer.step()

            #train_errors.append(compute_mse_loss(model, ids, items, outputs))
            train_errors.append(F.mse_loss(outputs, labels))

        model.eval()
        for batch in test_loader:
            ids, items, labels = batch
            ids = to_device(torch.LongTensor(ids), device)
            items = to_device(torch.LongTensor(items), device)
            labels = to_device(torch.FloatTensor(labels.float()), device)

            outputs = model.forward(ids, items)
            #val_errors.append(compute_mse_loss(model, ids, items, outputs))
            val_errors.append(F.mse_loss(outputs, labels))

        if early_stop.stop_criterion(val_errors):
            print('Stop after %d epochs' % epoch)
            break

        if (epoch+1) % 100 == 0:
            print_progress(epoch, train_errors[epoch], val_errors[epoch])
    print('Finished Training')
    #print('Test accuracy: ', compute_accuracy(model, testset))
    #print('MSELoss for trainset: ', compute_mse_loss(model, ids, items, outputs))
    #print('MSELoss for testset: ', compute_mse_loss(model, ids_test, items_test, labels_test))
"""

"""
    for epoch in range(epochs):
        model.train()
        for batch in train_loader:
            optimizer.zero_grad()
            
            ids, items, labels = batch
            ids = to_device(torch.LongTensor(ids), device)
            items = to_device(torch.LongTensor(items), device)
            labels = to_device(torch.FloatTensor(labels.float()), device)

            print('Datatype of ids: ', ids.dtype)
            print('Datatype of items: ', items.dtype)
            print('Datatype of labels: ', labels.dtype)

            outputs = model.forward(ids, items)
            print('Outputs shape: ', outputs.dtype)
            loss = loss_fn(outputs, labels)
            loss.backward()
            optimizer.step()
            print('Epoch %d, loss = %.4f' % (epoch, loss.item()))

            train_errors.append(compute_mse_loss(model, ids, items, outputs))
            val_errors.append(compute_mse_loss(model, ids_test, items_test, labels_test))
        if early_stop.stop_criterion(val_errors):
            print(val_errors[epoch])
            print('Stop after %d epochs' % epoch)
            break

        if (epoch+1) % 100 == 0:
            print_progress(epoch, train_errors[epoch], val_errors[epoch])
"""

    

"\n    for epoch in range(epochs):\n        model.train()\n        for batch in train_loader:\n            optimizer.zero_grad()\n            \n            ids, items, labels = batch\n            ids = to_device(torch.LongTensor(ids), device)\n            items = to_device(torch.LongTensor(items), device)\n            labels = to_device(torch.FloatTensor(labels.float()), device)\n\n            print('Datatype of ids: ', ids.dtype)\n            print('Datatype of items: ', items.dtype)\n            print('Datatype of labels: ', labels.dtype)\n\n            outputs = model.forward(ids, items)\n            print('Outputs shape: ', outputs.dtype)\n            loss = loss_fn(outputs, labels)\n            loss.backward()\n            optimizer.step()\n            print('Epoch %d, loss = %.4f' % (epoch, loss.item()))\n\n            train_errors.append(compute_mse_loss(model, ids, items, outputs))\n            val_errors.append(compute_mse_loss(model, ids_test, items_test, labels_test))\n     

In [76]:
# Implement the training loop in this cell
model = RecommenderSystem(trainset.n_users, trainset.n_items)

#print('Number of trainable parameters: ', sum(p.numel() for p in model.parameters() if p.requires_grad))

# TODO: Increase the amount of data
# Either by inserting noise or data augmentation

# Learning rate value and learning rate schedule 
# (decresing the learning rate often has positive 
# effect on the model performance)
# Number of training epochs
# Network size
# Weight decay
# Early stopping
# Dropout
# Increase amount of data:
  # Data augmentation
  # Injecting noise

batch_size      = 10
train_loader    = torch.utils.data.DataLoader(trainset  , batch_size=batch_size, shuffle=True)
test_loader     = torch.utils.data.DataLoader(testset   , batch_size=batch_size, shuffle=False)

if not skip_training:
    loss_fn = nn.MSELoss()
    optimizer = optim.SGD(model.parameters(), lr=0.0001, momentum=0.9, weight_decay=0.0001)
    epochs = 2000
    #model.to(device)

    #train_errors = []  # Keep track of the training error
    #val_errors = []  # Keep track of the validation error
    #early_stop = EarlyStopping(tolerance=0.0001, patience=20)
    
    #ids = []
    #items = []
    #labels = []

    #for sample in trainset:
        #ids.append(sample[0])
        #items.append(sample[1])
        #labels.append(sample[2])
    
    #ids_test = []
    #items_test = []
    #labels_test = []

    #for sample in testset:
        #ids_test.append(sample[0])
        #items_test.append(sample[1])
        #labels_test.append(sample[2])
    
    #ids         = to_device(torch.LongTensor(ids), device)
    #items       = to_device(torch.LongTensor(items), device)
    #labels      = to_device(torch.FloatTensor(labels), device)
    
    #ids = torch.LongTensor(ids)
    #items = torch.LongTensor(items)
    #labels = torch.FloatTensor(labels)

    #ids = to_device(ids, device)
    #items = to_device(items, device)
    #labels = to_device(labels, device)
    
    #unique_ids      = torch.unique(ids)
    #unique_items    = torch.unique(items)

    #print('Trainset n_users: ', trainset.n_users)
    #print('Trainset n_items: ', trainset.n_items)

    #print('Ids shape: ', ids.shape)
    #print('Items shape: ', items.shape)
    #print('Labels shape: ', labels.shape)

    #print('Ids device: ', ids.device)
    #print('Items device: ', items.device)
    #print('Labels device: ', labels.device)

    #ids_test    = to_device(torch.LongTensor(ids_test), device)
    #items_test  = to_device(torch.LongTensor(ids_test), device)
    #labels_test = to_device(torch.FloatTensor(ids_test), device)

    #ids_test = torch.LongTensor(ids_test)
    #items_test = torch.LongTensor(items_test)
    #labels_test = torch.FloatTensor(labels_test)

    #ids_test = to_device(ids_test, device)
    #items_test = to_device(items_test, device)
    #labels_test = to_device(labels_test, device)

    #print('Ids_test device: ', ids_test.device)
    #print('Items_test device: ', items_test.device)
    #print('Labels_test device: ', labels_test.device)

    std1 = 0.5
    std2 = 0.3

    #ids     = torch.arange(1, trainset.n_users+1)
    #items   = torch.arange(1, trainset.n_items+1)

    #ids_augmented   = torch.cuda.LongTensor((ids + to_device(torch.rand(ids.shape) * std1, device)).long())
    #items_augmented = torch.cuda.LongTensor((items + to_device(torch.rand(items.shape) * std2, device)).long())
    #labels = torch.cat([labels, labels], dim=0)

    #ids = to_device(torch.cat([ids, ids_augmented]), device)
    #items = to_device(torch.cat([items, items_augmented]), device)

    for epoch in range(3):
        #model.train()
        for batch_idx, (ids, items, labels) in enumerate(train_loader):
            #print(ids, "\n")
            #print(items, "\n")

            optimizer.zero_grad()
            outputs = model.forward(ids, items)
            #print('Outputs shape: ', outputs.shape)
            #print('Labels shape: ', labels.shape)
            loss = loss_fn(outputs, labels.float().squeeze())
            #print(loss)
            loss.backward()
            optimizer.step()
        #train_errors.append(F.mse_loss(outputs, labels))
        
        # Validation
        #model.eval()
        #outputs = model.forward(torch.unique(ids_test), torch.unique(items_test))
        #val_errors.append(F.mse_loss(outputs, labels_test))

        #if early_stop.stop_criterion(val_errors):
            #print('Stop after %d epochs' % epoch)
            #break

        #if (epoch+1) % 100 == 0:
            #print_progress(epoch, train_errors[epoch], val_errors[epoch])
    print('Finished Training')
    #print('Test accuracy: ', compute_accuracy(model, testset))
    #print('MSELoss for trainset: ', compute_mse_loss(model, ids, items, outputs))
    #print('MSELoss for testset: ', compute_mse_loss(model, ids_test, items_test, labels_test))

IndexError: index out of range in self

In [None]:
# Save the model to disk (the pth-files will be submitted automatically together with your notebook)
# Set confirm=False if you do not want to be asked for confirmation before saving.
if not skip_training:
    tools.save_model(model, 'recsys.pth', confirm=True)

In [None]:
# This cell loads your best model
if skip_training:
    model = RecommenderSystem(trainset.n_users, trainset.n_items)
    tools.load_model(model, 'recsys.pth', device)

The next cell tests the accuracy of your best model. It is enough to submit .pth files.

**IMPORTANT**:
* During testing, the predictions are produced by `predictions = model(user_ids, item_ids)` with the `user_ids` and `item_ids` loaded from `RatingsData`.
* There is a size limit of 30Mb for saved models. Please make sure that your model loads in the cell above.

In [None]:
# This cell tests the accuracy of your best model.

In [None]:
# This cell is reserved for grading

In [None]:
# This cell is reserved for grading

In [None]:
# This cell is reserved for grading

In [None]:
# This cell is reserved for grading

In [None]:
# This cell is reserved for grading

In [None]:
# This cell is reserved for grading