## AutoEncoders

Autoencoders are used to transform your data from one form (less efficient) to a better representation (more efficient) This practise is useful for pre-training models. Precisely, when you train your NN, you initialise your weights randomly according to gaussian distribution. But in the end, they're random weights you initialise with are often inefficient and tend to find local minima. Instead of feeding the network with input, the input is used to train the autoencoder which is better representation of the same data, which is then used to feed the model. 

<img src="https://probablydance.files.wordpress.com/2016/04/3_middle_layer.png"/>
Source: [Probably Dance](https://probablydance.com/2016/04/30/neural-networks-are-impressively-good-at-compression/)

In [1]:
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.optim as optim
import torch.utils.data
from torch.autograd import Variable

ImportError: No module named torch

In [26]:
movies = pd.read_csv('ml-1m/movies.dat', sep='::', 
                      header=None, engine='python',
                      encoding='latin-1')
print movies.head()

users = pd.read_csv('ml-1m/users.dat', sep='::', 
                      header=None, engine='python',
                      encoding='latin-1')
print users.head()
ratings = pd.read_csv('ml-1m/ratings.dat', sep='::', 
                      header=None, engine='python',
                      encoding='latin-1')
ratings.head()

   0                                   1                             2
0  1                    Toy Story (1995)   Animation|Children's|Comedy
1  2                      Jumanji (1995)  Adventure|Children's|Fantasy
2  3             Grumpier Old Men (1995)                Comedy|Romance
3  4            Waiting to Exhale (1995)                  Comedy|Drama
4  5  Father of the Bride Part II (1995)                        Comedy
   0  1   2   3      4
0  1  F   1  10  48067
1  2  M  56  16  70072
2  3  M  25  15  55117
3  4  M  45   7  02460
4  5  M  25  20  55455


Unnamed: 0,0,1,2,3
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


In [27]:
training_set = pd.read_csv('ml-100k/u1.base', 
                           delimiter='\t').values 
test_set = pd.read_csv('ml-100k/u1.test', 
                       delimiter='\t').values

In [28]:
# Getting the number of users and movies 
nb_users = int(max(max(training_set[:, 0]),
                   max(test_set[:, 0])))
nb_movies = int(max(max(training_set[:, 1]),
                   max(test_set[:, 1])))

In [29]:
# convert the data into an an array with users 
# in lines and movies in columns.
# This is because torch expects the data to be
# in this fashion
def convert(data):
    """Create a list of lists. One list for each user. 
    The rating are from 1 to 5. User that didn't see 
    the movie will have a rating of 0"""
    new_data = []
    for id_users in range(1, nb_users + 1):
        # take the column with movie ids for this
        # specific user
        id_movies = data[:, 1][data[:, 0] == id_users] 
        # same for the ratings
        id_ratings = data[:, 2][data[:, 0] == id_users]
        # we also have to take care of the case where
        # this user didn't watch a specific movie
        # So, create a list of 1682 (total movies) elements
        # initialized with 0 and set the rating where this
        # person has watched the movies. 
        ratings = np.zeros(nb_movies)
        ratings[id_movies - 1] = id_ratings
        new_data.append(list(ratings))
    return new_data
training_set = convert(training_set)
test_set = convert(test_set)

In [30]:
# Converting the data into torch tensors
training_set = torch.FloatTensor(training_set)
test_set = torch.FloatTensor(test_set)

In [31]:
# For autoencoders we are predicting the ratings also
# so no need to change the input to binary


In [32]:
# Creating the architecture of Auto Encoder
# inherit from nn.Module to inhert functions
class StackedAutoEncoder(nn.Module):
    """Creates a Stacked Auto Encoder
    Configuration:
    Input -> 20 -> 10 -> 20 -> Input
    """
    def __init__(self):
        super(StackedAutoEncoder, self).__init__()
        # first layer that will take input nb_movies
        # the number of nodes in hidden layer will be 20
        self.first_full_connection = nn.Linear(nb_movies, 20)
        # make second full connection
        self.second_full_connection = nn.Linear(20, 10)
        # make third full connection
        self.third_full_connection = nn.Linear(10, 20)
        # make fourth full connection
        self.fourth_ful_connection = nn.Linear(20, nb_movies)
        # use sigmoid activation function
        self.activation = nn.Sigmoid()
        
    def forward(self, input_vector):
        """
        Encode and perform forward pass
        """
        input_vector = self.activation(
                        self.first_full_connection(input_vector))
        input_vector = self.activation(
                        self.second_full_connection(input_vector))
        input_vector = self.activation(
                        self.third_full_connection(input_vector))
        # no activation required for the last layer
        input_vector = self.fourth_ful_connection(input_vector)
        return input_vector

In [33]:
stacked_auto_encoder = StackedAutoEncoder()
criterion = nn.MSELoss()
optimizer = optim.RMSprop(stacked_auto_encoder.parameters(), lr=0.01,
                         weight_decay=0.5)

In [34]:
# Training the Stacked Auto Encoder
number_of_epochs = 200
for epoch in range(number_of_epochs):
    training_loss = float(0)
    counter = float(0)

    for id_user in range(nb_users):
        # Variable is used to create the same effect
        # as np.reshape(-1, 1)
        input_vector = Variable(training_set[id_user]).unsqueeze(0)
        # creates copy
        target_variable = input_vector.clone()
        # exclude users who didn't rate any movies
        if torch.sum(target_variable.data > 0) > 0:
            # just pass it to the constructor
            output = stacked_auto_encoder(input_vector)
            # don't compute gradients of the target
            # this saves computational cost
            target_variable.require_grad = False
            # reset the output of the variables
            # where the user hasn't rated any rating
            output[target_variable==0] = 0
            loss = criterion(output, target_variable)
            # This is needed because we want to calculate
            # the error on only those movies who have 
            # ratings
            mean_corrector = nb_movies / \
                                float(torch.sum(target_variable > 0)
                                      + 1e-10)
            loss.backward()
            # take squared loss
            training_loss +=  np.sqrt(loss.data[0] * mean_corrector)
            counter += 1
            # update the weights
            optimizer.step()
    print 'Epoch: {0}\tTraining Loss: {1}'.format(epoch + 1, 
                                                  training_loss/counter)

Epoch: 1	Training Loss: 1.91791504736
Epoch: 2	Training Loss: 1.2089137057
Epoch: 3	Training Loss: 1.16117812748
Epoch: 4	Training Loss: 1.1449053598


KeyboardInterrupt: 

In [36]:
# Testing the Stacked Auto Encoder
test_loss = 0
counter = 0

for id_user in range(nb_users):
    # Variable is used to create the same effect
    # as np.reshape(-1, 1)
    input_vector = Variable(training_set[id_user]).unsqueeze(0)
    target_variable = Variable(test_set[id_user]).unsqueeze(0)
    # exclude users who didn't rate any movies
    if torch.sum(target_variable.data > 0) > 0:
        # just pass it to the constructor
        output = stacked_auto_encoder(input_vector)
        # don't compute gradients of the target
        # this saves computational cost
        target_variable.require_grad = False
        output[target_variable == 0] = 0
        loss = criterion(output, target_variable)
        # This is needed because we want to calculate
        # the error on only those movies who have 
        # ratings
        mean_corrector = nb_movies / float(torch.sum(
                                                target_variable > 0)
                                           + 1e-10)
        # take mean squared loss
        training_loss +=  np.sqrt(loss.data[0] * mean_corrector)
        counter += 1
print 'Test Loss: {1}'.format(epoch + 1, test_loss/counter)

Epoch: 1	Trest Loss: 0.0
Epoch: 2	Trest Loss: 0.0
Epoch: 3	Trest Loss: 0.0


KeyboardInterrupt: 