# Recommender System using  Stacked Autoencoder

In this notebook we will create a Recommender System for movies using a Stacked Autoencoder. 
The stacked Autoencoder is composed by more than one stage of Encoding and by one stage of Decoding 
We can see the structure of the network in this picture:

![title](https://cdn-images-1.medium.com/max/1600/1*GD1a7PRdUngdUxGQqjMVWQ.png)

In particular, in this model we will use the following datasets:

https://grouplens.org/datasets/movielens/1m/

https://grouplens.org/datasets/movielens/100k/

MovieLens data sets were collected by the GroupLens Research Project
at the University of Minnesota.
 
This data set consists of:
 * 100,000 ratings (1-5) from 943 users on 1682 movies. 
 * Each user has rated at least 20 movies. 
 * Simple demographic info for the users (age, gender, occupation, zip)


In [2]:
# Libraries
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.optim as optim
import torch.utils.data
from torch.autograd import Variable

## 1. Data

In [3]:
# Movies Data
movies = pd.read_csv('ml-1m/movies.dat', sep = '::', header = None, engine = 'python', encoding = 'latin-1', names = ['MovieID', 'Title', 'Genre' ])
movies.head()

Unnamed: 0,MovieID,Title,Genre
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


In [4]:
# Users Data
users = pd.read_csv('ml-1m/users.dat', sep = '::', header = None, engine = 'python', encoding = 'latin-1', names = ['UserID', 'Gender', 'Age', 'Job', 'ZipCode' ])
users.head()

Unnamed: 0,UserID,Gender,Age,Job,ZipCode
0,1,F,1,10,48067
1,2,M,56,16,70072
2,3,M,25,15,55117
3,4,M,45,7,2460
4,5,M,25,20,55455


In [5]:
# Ratings Data
ratings = pd.read_csv('ml-1m/ratings.dat', sep = '::', header = None, engine = 'python', encoding = 'latin-1', names = ['UserID', 'MovieID', 'Rating', 'Timestamp' ])
ratings.head()

Unnamed: 0,UserID,MovieID,Rating,Timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


### 1.1 Prepare Training Set and Test set

We will use an holdout train test split. In particular we will use 80% of the data as training set and the remaining 20 % as test set

In [6]:
# Train set
training_set = pd.read_csv('ml-100k/u1.base', sep = '\t')
training_set.head()

Unnamed: 0,1,1.1,5,874965758
0,1,2,3,876893171
1,1,3,4,878542960
2,1,4,3,876893119
3,1,5,3,889751712
4,1,7,4,875071561


In [7]:
training_set.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 79999 entries, 0 to 79998
Data columns (total 4 columns):
1            79999 non-null int64
1.1          79999 non-null int64
5            79999 non-null int64
874965758    79999 non-null int64
dtypes: int64(4)
memory usage: 2.4 MB


In [8]:
# Convert the df into an array
training_set = np.array(training_set, dtype = 'int')
training_set

array([[        1,         2,         3, 876893171],
       [        1,         3,         4, 878542960],
       [        1,         4,         3, 876893119],
       ...,
       [      943,      1188,         3, 888640250],
       [      943,      1228,         3, 888640275],
       [      943,      1330,         3, 888692465]])

In [9]:
# Test set
test_set = pd.read_csv('ml-100k/u1.test', sep = '\t')
test_set.head()

Unnamed: 0,1,6,5,887431973
0,1,10,3,875693118
1,1,12,5,878542960
2,1,14,5,874965706
3,1,17,3,875073198
4,1,20,4,887431883


In [10]:
test_set.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19999 entries, 0 to 19998
Data columns (total 4 columns):
1            19999 non-null int64
6            19999 non-null int64
5            19999 non-null int64
887431973    19999 non-null int64
dtypes: int64(4)
memory usage: 625.0 KB


In [11]:
test_set = np.array(test_set, dtype = 'int')
test_set

array([[        1,        10,         3, 875693118],
       [        1,        12,         5, 878542960],
       [        1,        14,         5, 874965706],
       ...,
       [      459,       934,         3, 879563639],
       [      460,        10,         3, 882912371],
       [      462,       682,         5, 886365231]])

In [12]:
# Getting the number of users and movies
n_users = int(max(max(training_set[:,0]), max(test_set[:,0])))
n_movies = int(max(max(training_set[:,1]), max(test_set[:,1])))

Now we will convert the data into an array whith users in line and movies in columns. For this reason we extracted the number of users and movies. The values will be the ratings. In particular we will create a list of list, in particular a list of movie ratings for each user.

In [13]:
def convert(data):
    new_data = []
    for id_users in range(1, n_users + 1):
        id_movies = data[:, 1][data[:,0] == id_users] # Select all the movies ID that corresponds to the user id
        id_ratings = data[:, 2][data[:,0] == id_users] # Select all the ratings ID that corresponds to the user id
        ratings = np.zeros(n_movies)
        ratings[id_movies - 1] = id_ratings # id_movies start at 0
        new_data.append(list(ratings))
    return new_data

In [14]:
training_set = convert(training_set)
test_set = convert(test_set)

Now we convert the data into Torch Tensor 

In [15]:
training_set = torch.FloatTensor(training_set)
test_set = torch.FloatTensor(test_set)

## 2. Stacked Autoencoder

In [20]:
# Stacked Autoencoder Class
class SAE(nn.Module): # Inheritance from nn.Module
    def __init__(self, ):
        super(SAE, self).__init__() # Get all inherited class and methods of the parent module
        self.fc1 = nn.Linear(n_movies, 20) # First fully connected layer with 20 neurons used as Encoder
        self.fc2 = nn.Linear(20, 10) # Second fully connected layer with 10 neurons used as Econder
        self.fc3 = nn.Linear(10, 20) # Third fully connected layern used as Decoder
        self.fc4 = nn.Linear(20, n_movies) # Output layer
        self.activation = nn.Sigmoid()
    
    def forward(self, x):
        x = self.activation(self.fc1(x)) # First encoded vector
        x = self.activation(self.fc2(x)) # Second encoded vector
        x = self.activation(self.fc3(x)) # First decoded vector
        x = self.fc4(x) # Output vector
        return x

In [23]:
# Instantiate the autoencoder
sae = SAE()
metric = nn.MSELoss()
# The decay is used to regulate the convergence by reducing the learning rate after some epcohs
optimizer = optim.RMSprop(sae.parameters(), lr=0.01, weight_decay=0.5) 

### Training 

In [28]:
epochs = 200
for epoch in range(1, epochs + 1):
    train_loss = 0 
    s = 0. # count the number of users who rated at least one movie
    for id_user in range(n_users): # For each user take the movies rated
        input = Variable(training_set[id_user]).unsqueeze(0) # Batch of a single input vector
        target = input.clone() # The target vector will be our input
        if torch.sum(target.data > 0) > 0 : # consider users that rated at least one movie
            output = sae.forward(input) # forward the input in the network
            target.require_grad = False # Don't compute the gradient with respect to the target
            output[target == 0] = 0  # Optimization since these value will not be used we put them to 0
            loss = metric(output, target) # Compute the loss between prediction(output) and the target
            mean_corrector = n_movies / float(torch.sum(target.data > 0) + 10e-10) # to be sure that the denominator is not 0
            loss.backward() # Backward method for deciding the direction of th eupdate if increase or decrease
            train_loss += np.sqrt(loss.data[0] * mean_corrector) # Update the train loss
            s += 1. # increment users count 
            optimizer.step() # Decide the intensity of the update
    print('epoch:' + str(epoch) +  ' loss: ' + str(train_loss / s))

epoch:1 loss: 1.0966228981741677
epoch:2 loss: 1.0533843541728227
epoch:3 loss: 1.0383570260862365
epoch:4 loss: 1.0306853443218258
epoch:5 loss: 1.026731920675691
epoch:6 loss: 1.023879606240372
epoch:7 loss: 1.0218176013189593
epoch:8 loss: 1.0207395770701637
epoch:9 loss: 1.0197838926595066
epoch:10 loss: 1.018825840361084
epoch:11 loss: 1.0185552086597922
epoch:12 loss: 1.0179807298566554
epoch:13 loss: 1.0176416490458675
epoch:14 loss: 1.0173294490045
epoch:15 loss: 1.017008616385241
epoch:16 loss: 1.0166951622796632
epoch:17 loss: 1.0165761555500215
epoch:18 loss: 1.0164913810986287
epoch:19 loss: 1.0164307757854514
epoch:20 loss: 1.01610899543316
epoch:21 loss: 1.0159623596928442
epoch:22 loss: 1.0158065286184674
epoch:23 loss: 1.0159547969235099
epoch:24 loss: 1.015719993614856
epoch:25 loss: 1.0156087229344302
epoch:26 loss: 1.0152754981520042
epoch:27 loss: 1.0150360320201024
epoch:28 loss: 1.0127338757254107
epoch:29 loss: 1.011153452175444
epoch:30 loss: 1.0104551557340806


### Results

In [52]:
# Measure the test loss
test_loss = 0
s = 0.
for id_user in range(n_users): # For each user take the movies rated
     # Here we keep the training set, since we need to predict the movie the user didn't watch
    input = Variable(training_set[id_user]).unsqueeze(0)
    target = Variable(test_set[id_user]) # The target vector will be the test set
    if torch.sum(target.data > 0) > 0 : # consider users that rated at least one movie
        output = sae.forward(input) # forward the input in the network
        target.require_grad = False # Don't compute the gradient with respect to the target
        output[target == 0] = 0  # Optimization since these value will not be used we put them to 0
        loss = metric(output, target) # Compute the loss between prediction(output) and the target
        mean_corrector = n_movies / float(torch.sum(target.data > 0) + 10e-10) # to be sure that the denominator is not 0
        test_loss += np.sqrt(loss.data[0] * mean_corrector) # Update the train loss
        s += 1. # increment users count 
print('Test loss: ' + str(test_loss / s))

Test loss: 0.9196047966139437


## 3. Discussion

The test loss is very similar to the training loss. This mean that our model is robust and that there is no overfitting.
In general the goal was to obtain a test loss of less than one, and with this autoencoder we 