## AUTOENCODERS
Real data around us like images and documentations are of very high dimensions. Autoencoders can learn a simple representation of it. They are a class of unsupervised neural networks. The architecture consists of an encoder, a bottleneck and a decoder. The output is the reconstruction of the input and so the dimensions of input and output are equal. The objective here is to reduce the reconstruction loss by learning the weights through techiniques like backpropagation. <br><br>

It is similar to PCA but with a nonlinear activation function.

## PROJECT
In this project, a movie recommendation system model is designed using Stacked Autoenoders which is a type of autoencoders that has more than one hidden layers.

In [1]:
#importing the libraries
import numpy as np
import pandas as pd
import torch
import torch.nn as nn #for neural networks
import torch.nn.parallel  #for parallel computation
import torch.optim as optim #for optimizers
import torch.utils.data #for tools
from torch.autograd import Variable #for Stocahstic Grad desc

## Importing the Datasets

In [2]:
movies = pd.read_csv(r'C:\Users\pnish\OneDrive\Documents\Projects\Recommendation System - Autoencoder\AutoEncoders\ml-1m\ml-1m\movies.dat', sep='::',header=None,engine='python',encoding='latin-1')
users = pd.read_csv(r'C:\Users\pnish\OneDrive\Documents\Projects\Recommendation System - Autoencoder\AutoEncoders\ml-1m\ml-1m\users.dat', sep='::',header=None,engine='python',encoding='latin-1')
ratings = pd.read_csv(r'C:\Users\pnish\OneDrive\Documents\Projects\Recommendation System - Autoencoder\AutoEncoders\ml-1m\ml-1m\ratings.dat', sep='::',header=None,engine='python',encoding='latin-1')

### Movies data
The movies data consists of three columns: Movie ID, Movie name and the genre.

In [3]:
print(movies.shape)
movies.head()

(3883, 3)


Unnamed: 0,0,1,2
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


### Users data
The users data consists of four column: User ID, Gender, Age, Job code and visit code.

In [4]:
print(users.shape)
users.head()

(6040, 5)


Unnamed: 0,0,1,2,3,4
0,1,F,1,10,48067
1,2,M,56,16,70072
2,3,M,25,15,55117
3,4,M,45,7,2460
4,5,M,25,20,55455


### Ratings data
The ratings data consists of four columns: User ID, Movie ID, Ratings and Time stamps.

In [5]:
print(ratings.shape)
ratings.head()

(1000209, 4)


Unnamed: 0,0,1,2,3
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


## Preparing training and test sets
The data used for building the model is similar to the ratings datasets. It consists of 100,000 records and split randomly into training set and test. 80% of the data is used as training set to train the model and the remaning is used for validation.

In [6]:
#need to add header=None
training_set = pd.read_csv(r'C:\Users\pnish\OneDrive\Documents\Projects\Recommendation System - Autoencoder\AutoEncoders\ml-100k\ml-100k\u1.base',delimiter='\t')
training_set = np.array(training_set,dtype='int') #Converting into an array since pytorch works only on arrays
test_set = pd.read_csv(r'C:\Users\pnish\OneDrive\Documents\Projects\Recommendation System - Autoencoder\AutoEncoders\ml-100k\ml-100k\u1.test',delimiter='\t')
test_set = np.array(test_set,dtype='int') #Converting into an array since pytorch works only on arrays

In [7]:
training_set

array([[        1,         2,         3, 876893171],
       [        1,         3,         4, 878542960],
       [        1,         4,         3, 876893119],
       ...,
       [      943,      1188,         3, 888640250],
       [      943,      1228,         3, 888640275],
       [      943,      1330,         3, 888692465]])

## Getting the number of users and movies
The total number of movies and users are needed because the training and test data needed to be converted into a matrix that has users as the lines, movies as the column and the corresponding rating in each cell.

In [8]:
nb_users = int(max(max(training_set[:,0]),max(test_set[:,0])))
nb_movies = int(max(max(training_set[:,1]),max(test_set[:,1])))

In [9]:
print(nb_users,nb_movies)

943 1682


## Converting the data into list of lists

In [10]:
def convert(data):
    new_data = []
    for id_users in range(1,nb_users+1):
        id_movies = data[:,1][data[:,0]==id_users]
        id_ratings = data[:,2][data[:,0]==id_users]
        ratings= np.zeros(nb_movies)
        ratings[id_movies-1]=id_ratings
        new_data.append(list(ratings))
    return new_data

In [11]:
training_set=convert(training_set)
testing_set = convert(test_set)

## Converting the data into Torch Sensors

In [12]:
training_set = torch.FloatTensor(training_set)
test_set = torch.FloatTensor(testing_set)

## Creating the architecture of the Neural Network
This model has three hidden layers between the input and output layers. The number of nodes in the three hidden layers are 20, 10 and 20. The input layer consists of number of nodes equal to the number of columns in the data (i.e., the number of movies). The autoencoder tries to recreate the input as output.

In [13]:
class SAE(nn.Module):
    def __init__(self,):
        super(SAE,self).__init__()
        self.fc1 = nn.Linear(nb_movies,20)
        self.fc2 = nn.Linear(20,10)
        self.fc3 = nn.Linear(10,20)
        self.fc4 = nn.Linear(20,nb_movies)
        self.activation = nn.Sigmoid()
    def forward(self,x):
        x = self.activation(self.fc1(x))
        x = self.activation(self.fc2(x))
        x = self.activation(self.fc3(x))
        x = self.fc4(x)
        return x

In [14]:
sae = SAE()
criterion = nn.MSELoss()
optimizer = optim.RMSprop(sae.parameters(), lr = 0.01, weight_decay =0.5)

## Training the SAE

In [15]:
nb_epoch = 200
for epoch in range(1, nb_epoch + 1):
    train_loss = 0 # initial training loss
    s = 0. # number of users who rated at least one movie
    for id_user in range(nb_users):
        input = Variable(training_set[id_user]).unsqueeze(0) # training_set[id_user] is a single vector and pytorch wont accept this. There need to be a batch of vectors (size[1,1682])
        target = input.clone()
        if torch.sum(target.data>0)>0:
            output = sae(input)
            target.require_grad = False # to make sure that we dont compute the gradient descent to the target
            output[target==0] = 0
            loss = criterion(output, target)
            mean_corrector = nb_movies/float(torch.sum(target.data > 0) + 1e-10)
            loss.backward() #to find the direction to which the weight should be updated
            train_loss += np.sqrt(loss.data*mean_corrector)
            s += 1.
            optimizer.step() #finds the intensity of weight update
    print('Epoch: ',epoch,' Loss: ',train_loss/s)

  allow_unreachable=True)  # allow_unreachable flag


Epoch:  1  Loss:  tensor(1.7717)
Epoch:  2  Loss:  tensor(1.0968)
Epoch:  3  Loss:  tensor(1.0534)
Epoch:  4  Loss:  tensor(1.0386)
Epoch:  5  Loss:  tensor(1.0308)
Epoch:  6  Loss:  tensor(1.0269)
Epoch:  7  Loss:  tensor(1.0239)
Epoch:  8  Loss:  tensor(1.0219)
Epoch:  9  Loss:  tensor(1.0207)
Epoch:  10  Loss:  tensor(1.0198)
Epoch:  11  Loss:  tensor(1.0189)
Epoch:  12  Loss:  tensor(1.0185)
Epoch:  13  Loss:  tensor(1.0179)
Epoch:  14  Loss:  tensor(1.0177)
Epoch:  15  Loss:  tensor(1.0173)
Epoch:  16  Loss:  tensor(1.0169)
Epoch:  17  Loss:  tensor(1.0170)
Epoch:  18  Loss:  tensor(1.0166)
Epoch:  19  Loss:  tensor(1.0165)
Epoch:  20  Loss:  tensor(1.0163)
Epoch:  21  Loss:  tensor(1.0161)
Epoch:  22  Loss:  tensor(1.0160)
Epoch:  23  Loss:  tensor(1.0158)
Epoch:  24  Loss:  tensor(1.0156)
Epoch:  25  Loss:  tensor(1.0158)
Epoch:  26  Loss:  tensor(1.0156)
Epoch:  27  Loss:  tensor(1.0156)
Epoch:  28  Loss:  tensor(1.0151)
Epoch:  29  Loss:  tensor(1.0128)
Epoch:  30  Loss:  tens

After training, the class object becomes our model. Now we can use this model to predict the test data. The parameters of the model can be accessed using the method state_dict() (Eg. : sae.state_dict())

In [16]:
print("These are parameters of our trained model: ")
sae.state_dict()

These are parameters of our trained model: 


OrderedDict([('fc1.weight',
              tensor([[-2.8530e-02, -3.8972e-03,  7.4315e-02,  ..., -9.4598e-04,
                       -7.2531e-02, -1.0948e-02],
                      [-8.3199e-02,  2.9801e-01, -3.1988e-02,  ..., -3.6588e-04,
                       -3.9030e-03, -1.4574e-02],
                      [ 1.3208e-01, -4.5119e-02,  8.4846e-02,  ..., -5.0668e-03,
                       -2.2829e-02, -3.2755e-02],
                      ...,
                      [ 2.4811e-01,  1.7087e-02,  2.3530e-01,  ..., -2.4886e-02,
                       -4.6826e-02, -3.0817e-02],
                      [-4.8230e-02,  3.6995e-01, -1.8778e-01,  ..., -5.8407e-03,
                       -2.6334e-02, -8.8783e-03],
                      [-3.2568e-01, -1.7400e-02,  6.9348e-02,  ..., -7.0913e-03,
                       -3.2973e-02,  1.7039e-03]])),
             ('fc1.bias',
              tensor([-0.0997, -0.5760, -0.0306, -0.7905, -0.7104, -0.4718, -0.1169, -0.2361,
                      -0.4689, -1.00

## Testing our SAE

In [17]:
test_loss = 0 # initial training loss
s = 0. # number of users who rated at least one movie
for id_user in range(nb_users):
    input = Variable(training_set[id_user]).unsqueeze(0) # training_set[id_user] is a single vector and pytorch wont accept this. There need to be a batch of vectors (size[1,1682]) which we get using Variable class
    target = Variable(test_set[id_user]).unsqueeze(0)
    if torch.sum(target.data > 0) > 0:
        output = sae(input)
        target.require_grad = False # to make sure that we dont compute the gradient descent to the target
        output[target==0] = 0
        loss = criterion(output, target)
        mean_corrector = nb_movies/float(torch.sum(target.data > 0) + 1e-10)
        test_loss += np.sqrt(loss.data*mean_corrector)
        s += 1.
print("Test loss: ",float(test_loss/s))

Test loss:  0.9550700187683105


The test loss is almost 1. This indicates that on average our model is going to predict a rating that is different from the original rating by less than 1 star. It means if a user watches a new movie and gives a rating of 4 then then this model would have predicted the between 3 and 5.