# Auto encoder for MNIST data
An autoencoder is a neural network that learns to copy its input to its output. It has an internal (hidden) layer that describes a code used to represent the input, and it is constituted by two main parts: an encoder that maps the input into the code, and a decoder that maps the code to a reconstruction of the input.

In this homework, we are going to explore some insight about the auto encoder such as how the latent space in the middle of the network looks like, how the image interpolates between two number labels, etc. Moreover, we will try to improve our existing auto encoder by defining it with the convolutional neural network (CNN), trying to add some noise and trying to add regularization directly to the loss function.

### Main tasks for the homework

## Part 2

4. Modify the code to convolutional auto-encoders. You can add a 2 convolutional layers, followed by two fully connected layers for the encoder. The encoder should use the following structure with two convolutional layers, followed by two fully connected layers: 

  28x28x1  →  28x28x16 →  14x14x4  →  7x7x4  →  198  →  h. Here, h is the number of hidden variables. 

  The decoder uses the following structure: (For the decoder, you might use "nn.ConvTranspose2d" to expand the dimension.) 

  h  →  198 →  7x7x4  →  14x14x4  →  28x28x16  →  28x28x1

5. Set h=2 and plot the embedding of the digits offered by the encoder and compare with the corresponding embedding you observed in the case of fully connected networks. Also give hidden data on a regular grid and observe how the decoder transforms the data

5. Increase the number of latent variables, along with the best regularization strategies that you learned from Part 1.


See https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html and https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html



## Load the data and setup the GPU

In [25]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import matplotlib.pyplot as plt
from torchvision import datasets, transforms
import numpy as np
from torch.autograd import Variable

from torchvision import datasets
import torchvision.transforms as transforms

from skimage.util import random_noise   # This library can be used for adding noise and can be installed by "conda install scikit-image"

from six.moves import urllib    
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)

if torch.cuda.is_available():
  device = torch.device("cuda")
else:
  device = torch.device("cpu")

plt.rcParams['figure.figsize'] = [12, 6]

mnist_training = datasets.MNIST('data', train=True, download=True, transform=transforms.ToTensor())
mnist_testing = datasets.MNIST('data', train=False, download=True, transform=transforms.ToTensor())

if torch.cuda.is_available():
  device = torch.device("cuda")
else:
  device = torch.device("cpu")

plt.rcParams['figure.figsize'] = [12, 6]


## Setting up the convolutional autoencoder class.

Modify the code to convolutional auto-encoders. You can add a 2 convolutional layers, followed by two fully connected layers for the encoder. The encoder should use the following structure with two convolutional layers, followed by two fully connected layers:

The encoder uses the following structure

 <font color="red">784 $\rightarrow$ 128$\rightarrow$ 32 $\rightarrow$ 12 $\rightarrow$ 2</font>

The decoder uses the following structure

 <font color="red">2 $\rightarrow$ 12$\rightarrow$ 32 $\rightarrow$ 128 $\rightarrow$ 784</font>

In [35]:
class ConvolutionalAutoencoder(nn.Module):
    def __init__(self):
        super(ConvolutionalAutoencoder, self).__init__()
        self.encoder = nn.Sequential( # like the Composition layer you built
            nn.Conv2d(1, 16, 3, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(16, 32, 3, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(32, 64, 7)
        )
        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(64, 32, 7),
            nn.ReLU(),
            nn.ConvTranspose2d(32, 16, 3, stride=2, padding=1, output_padding=1),
            nn.ReLU(),
            nn.ConvTranspose2d(16, 1, 3, stride=2, padding=1, output_padding=1),
            nn.Sigmoid()
        )
    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

### Helper functions

Define the sparse loss function as an example. You may also adapt this function for your needs.

In [32]:
# define the l1/l2 loss on the weights. normtype = 2 for l2 regularization, =1 for l1 regularization

def weightloss(model,normtype):
    loss = 0
    for W in model.parameters():
      l2_reg = W.norm(normtype)
    return loss
    

In [40]:
def train(model, num_epochs=5, batch_size=2048, learning_rate=1e-3):
    torch.manual_seed(42)
    criterion = nn.MSELoss() # mean square error loss
    optimizer = torch.optim.Adam(
        model.parameters(), lr=learning_rate, weight_decay=1e-4)    # Here the weight_decay parameter is adding L2 regularization

    train_loader = torch.utils.data.DataLoader(mnist_training, 
                                               batch_size=batch_size, 
                                               shuffle=True)
    
    models = []
    for epoch in range(num_epochs):
        for data in train_loader:
            img, _ = data

            # (TODO) Add some noise augmentation (This is just one way to add the noise, feel free to come up with other ways)
            # img =
            img = torch.tensor(random_noise(img,mode='gaussian',mean=0,var=0.05,clip=True))
            img=torch.tensor(random_noise(img,mode='s&p',salt_vs_pepper=0.5,clip=True))
            img = torch.tensor(random_noise(img,mode='speckle',mean=0,var=0.05,clip=True))

            img = Variable(img).cuda(device).type(torch.cuda.FloatTensor)
            img = img.view(img.size(0), -1)
            recon = model(img)
            loss = criterion(recon, img)

            # (TODO) Add L1/L2 regularization HERE
            m_loss = criterion(recon,img)
            loss = m_loss + 0.01 * weightloss(model,1)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
     
        # Saving the models at each epoch for visualization purposes
        training_loss.append(loss)
        fname = 'dict'+str(epoch)
        torch.save(model.state_dict(), fname)
        print('Epoch:{}, Loss:{:.4f}'.format(epoch+1, float(loss)))

        # (TODO) Add early stopping HERE
        if (epoch > 2) and (validation_loss[epoch]-validation_loss[epoch-1] < 0.01):
          print('STOP')
          break

    return model


## Training block

In [41]:
model = ConvolutionalAutoencoder().cuda(device)

max_epochs = 40
model = train(model, num_epochs=max_epochs,batch_size=100,learning_rate=1e-2)

RuntimeError: ignored

## Evaluate loss on test data



In [None]:
criterion = nn.MSELoss() 
batch_size = 1000
train_loader = torch.utils.data.DataLoader(mnist_training, 
                                               batch_size=batch_size, 
                                               shuffle=False)

test_loader = torch.utils.data.DataLoader(mnist_testing, 
                                               batch_size=batch_size, 
                                               shuffle=False)

criterion = nn.MSELoss(reduction='mean') # mean square error loss

trainingloss = 0
for data in train_loader:
    img, _ = data
    img = Variable(img).cuda(device).type(torch.cuda.FloatTensor)
    img = img.view(img.size(0), -1)
    recon = model(img)
    trainingloss += criterion(recon, img)
trainingloss = trainingloss.detach().cpu().numpy()/train_loader.batch_size

testloss = 0
for data in test_loader:
    img, _ = data
    img = Variable(img).cuda(device).type(torch.cuda.FloatTensor)
    img = img.view(img.size(0), -1)
    recon = model(img)
    testloss += criterion(recon, img)

testloss = testloss.detach().cpu().numpy()/test_loader.batch_size
print("Training loss=",trainingloss,"Testing loss=",testloss)



## TODO: Model exploration

Set h=2 and plot the embedding of the digits offered by the encoder and compare with the corresponding embedding you observed in the case of fully connected networks. Also give hidden data on a regular grid and observe how the decoder transforms the data

In [None]:
from idx_tools import Idx
import matplotlib.pyplot as plt

# Read the data
mnist_data = Idx.load_idx('./mnist/train-images.idx3-ubyte')

# Plot a random image
plt.imshow(mnist_data[2034], cmap='gray')