# Auto encoder for MNIST data
An autoencoder is a neural network that learns to copy its input to its output. It has an internal (hidden) layer that describes a code used to represent the input, and it is constituted by two main parts: an encoder that maps the input into the code, and a decoder that maps the code to a reconstruction of the input.

In this homework, we are going to explore some insight about the auto encoder such as how the latent space in the middle of the network looks like, how the image interpolates between two number labels, etc. Moreover, we will try to improve our existing auto encoder by defining it with the convolutional neural network (CNN), trying to add some noise and trying to add regularization directly to the loss function.

### Main tasks for the homework

## Part 1

1. First, please go over the linear autoenccoder code on ICON auto_encoder_FCN.ipynb to see how the implemented linear auto encoder works. Linear auto encoder has limited representation power, especially when two latent variables are used. This results in blurred reconstructions and inability to fully capture some digits. 

2. Use more latent variables, layers, and features/layer in the linear auto-encoder to improve the representation power. You can use the MSE of the recovered images (compared to the originals in the MNIST test data) as a measure of representation power. The goal is to come up with minimal testing. Please keep track of the training and testing error for each setting. You can report these for each setting you have tried in a table within the comments.  Demonstrate that as you increase the representation power (e.g increase in latent variables) or features/layer, the MSE of the autoencoder on training data will decrease, while that on the test data may saturate or go up. 

3. Try the following regularization strategies to overcome the above problem (a) Explicit l2 regularization (b) Explicit l1 regularization (c) noise augmentation of inputs (d) early stopping. 

Here are some useful links for reference:

https://debuggercafe.com/adding-noise-to-image-data-for-deep-learning-data-augmentation/

https://scikit-image.org/docs/dev/api/skimage.util.html#skimage.util.random_noise

https://discuss.pytorch.org/t/how-to-add-noise-to-mnist-dataset-when-using-pytorch/59745 (optional)

We then try to add regularization directly to the loss function.

Here are some useful links for reference:

https://debuggercafe.com/sparse-autoencoders-using-l1-regularization-with-pytorch/

https://stackoverflow.com/questions/42704283/adding-l1-l2-regularization-in-pytorch

https://stackoverflow.com/questions/44641976/in-pytorch-how-to-add-l1-regularizer-to-activations



## Load the data and setup the GPU

In [10]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import matplotlib.pyplot as plt
from torchvision import datasets, transforms
import numpy as np
from torch.autograd import Variable

from torchvision import datasets
import torchvision.transforms as transforms

from skimage.util import random_noise 

# Download the MNIST datasets from Prof. Jacob's google drive

In [11]:
# Run the followin only once to download the files. Note where the data is copied to.
!gdown --id '1fAW-pvhBWiXxE-H-WEOurS1Ta4V5mYe0'
!gdown --id '1zSVOn9lJa4eF-jubtZwdtO-t0UScnJk0'

Downloading...
From: https://drive.google.com/uc?id=1fAW-pvhBWiXxE-H-WEOurS1Ta4V5mYe0
To: /content/mnist_train.pickle
47.5MB [00:00, 180MB/s]
Downloading...
From: https://drive.google.com/uc?id=1zSVOn9lJa4eF-jubtZwdtO-t0UScnJk0
To: /content/mnist_test.pickle
7.92MB [00:00, 125MB/s]


# Once copied, load them to python


In [12]:
import pickle
with open('/content/mnist_train.pickle', 'rb') as data:
    mnist_train = pickle.load(data)
with open('/content/mnist_train.pickle', 'rb') as data:
    mnist_test = pickle.load(data)

In [13]:
if torch.cuda.is_available():
  device = torch.device("cuda")
else:
  device = torch.device("cpu")

plt.rcParams['figure.figsize'] = [12, 6]


## Setting up the fully connected autoencoder class.

You may try with different number of features in the bottle neck layer. More parameters can improve the representation, but may result in poor generalization. The visualization is only posssible when the number of features in the bottleneck layer is 2.

The encoder uses the following structure

 <font color="red">784 $\rightarrow$ 128$\rightarrow$ 32 $\rightarrow$ 12 $\rightarrow$ 2</font>

The decoder uses the following structure

 <font color="red">2 $\rightarrow$ 12$\rightarrow$ 32 $\rightarrow$ 128 $\rightarrow$ 784</font>

In [14]:
class LinearAutoEncoder(nn.Module):
    def __init__(self):
        super(LinearAutoEncoder, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(28 * 28, 128),
            nn.Tanh(),
            nn.Linear(128, 32),
            nn.ReLU(True), 
            nn.Linear(32, 12), 
            nn.ReLU(True), 
            nn.Linear(12, 2),
            nn.Tanh()
            )
        self.decoder = nn.Sequential(
            nn.Linear(2, 12),
            nn.ReLU(True),
            nn.Linear(12, 32),
            nn.ReLU(True),
            nn.Linear(32, 128),
            nn.ReLU(True), 
            nn.Linear(128, 28 * 28), 
            )

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

### Helper functions

Define the sparse loss function as an example. You may also adapt this function for your needs.

In [15]:
# define the l1/l2 loss on the weights. normtype = 2 for l2 regularization, =1 for l1 regularization

def weightloss(model,normtype):
    loss = 0
    for W in model.parameters():
      l2_reg = W.norm(normtype)
    return loss
    

## TODO: modify the following for Part 1.

See list of todos above.

In [19]:
def train(model, num_epochs=5, batch_size=2048, learning_rate=1e-3):
    torch.manual_seed(42)
    criterion = nn.MSELoss() # mean square error loss
    optimizer = torch.optim.Adam(
        model.parameters(), lr=learning_rate, weight_decay=1e-4)    # Here the weight_decay parameter is adding L2 regularization

    train_loader = torch.utils.data.DataLoader(mnist_training, 
                                               batch_size=batch_size, 
                                               shuffle=True)
    
    models = []
    for epoch in range(num_epochs):
        for data in train_loader:
            img, _ = data

            # (TODO) Add some noise augmentation (This is just one way to add the noise, feel free to come up with other ways)
            img = torch.tensor(random_noise(img,mode='gaussian',mean=0,var=0.05,clip=True))
            img=torch.tensor(random_noise(img,mode='s&p',salt_vs_pepper=0.5,clip=True))
            img = torch.tensor(random_noise(img,mode='speckle',mean=0,var=0.05,clip=True))

            img = Variable(img).cuda(device).type(torch.cuda.FloatTensor)
            img = img.view(img.size(0), -1)
            recon = model(img)
            loss = criterion(recon, img)

            # (TODO) Add L1/L2 regularization HERE
            m_loss = criterion(recon,img)
            loss = m_loss + 0.01 * weightloss(model,1)


            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
     
        # Saving the models at each epoch for visualization purposes
        training_loss.append(loss)
        fname = 'dict'+str(epoch)
        torch.save(model.state_dict(), fname)
        print('Epoch:{}, Loss:{:.4f}'.format(epoch+1, float(loss)))

        # (TODO) Add early stopping HERE
        if (epoch > 2) and (validation_loss[epoch]-validation_loss[epoch-1] < 0.01):
          print('STOP')
          break

    return model


## Training block

In [20]:
model = LinearAutoEncoder().cuda(device) 

max_epochs = 40
model = train(model, num_epochs=max_epochs,batch_size=100,learning_rate=1e-2)

NameError: ignored

## Evaluate loss on test data



In [18]:
criterion = nn.MSELoss() 
batch_size = 1000
train_loader = torch.utils.data.DataLoader(mnist_training, 
                                               batch_size=batch_size, 
                                               shuffle=False)

test_loader = torch.utils.data.DataLoader(mnist_testing, 
                                               batch_size=batch_size, 
                                               shuffle=False)

criterion = nn.MSELoss(reduction='mean') # mean square error loss

trainingloss = 0
for data in train_loader:
    img, _ = data
    img = Variable(img).cuda(device).type(torch.cuda.FloatTensor)
    img = img.view(img.size(0), -1)
    recon = model(img)
    trainingloss += criterion(recon, img)
trainingloss = trainingloss.detach().cpu().numpy()/train_loader.batch_size

testloss = 0
for data in test_loader:
    img, _ = data
    img = Variable(img).cuda(device).type(torch.cuda.FloatTensor)
    img = img.view(img.size(0), -1)
    recon = model(img)
    testloss += criterion(recon, img)

testloss = testloss.detach().cpu().numpy()/test_loader.batch_size
print("Training loss=",trainingloss,"Testing loss=",testloss)



NameError: ignored