# Introduction

We have learned how CNNs will take an input image, and through a series of layers, transform that input into an output that's much smaller in the x, y dimensions, but much greater in depth. Along the way, the CNN is discarding spatial information from the input image and isolating high level information about its content. Some of this structure can also be thought of as a kind of data compression; compressing from an image into something like a feature vector, which is basically a feature map produced after an input has gone through a series of layers squished into a vector shape. 

This is part fo what makes up something called an autoencoder, which is what we will learn about this lesson. An autoencoder has two main components: an encoder that compresses some input data, and a decoder that reconstructs data form the compressed representation. Why is htis kind of structure even useful? 

It ends up being useful in a number of cases. Autoencoders are used in a traditional data compression sense, in that they can learn to reduce the dimensionality of any input. Then, anyone can use a compressed representation to share it, or view it and so on, faster than they could with the original input. We might think of something like a jpg or mp3 file type, which contain explicit rules for compressing images and audio. The difference is that an autoencoder learns efficient data compression and decompression functions instead of having them designed, encoded by a human. 

Autoencoders have shown the most promise in image denoising techniques and in filling in missing data. This structure will also come up again as we learn about generative models that can take in an image and transform it into a related space such as form gray scale to color or from low to high resolution images. 

The encoder and decodeer are both built with neural networks. Generally, the whole network is trained by minimizing the difference between the input and the output. In that way, the middle layer will be a compressed representation of the input data from which we can reconstruct the original data. 

<img src="assets/Autoencoders.png">

The key aspect of an autoencoder is its ability to compress an image such that its content is still maintained. Then later, we may be able to use this compressed representation to generate something else. We will show how to build autoencoders in PyTorch. We will start with a simple example where we will compress images. Then, since this is ImageData, we will improve it by using convolutional layers.

So, let's be defining and training an autoencoder!

# A Simple Autoencoder

We'll start off by building a simple autoencoder to compress the MNIST dataset, which has images of 28x28x1 array with a total number of 784 pixels. With autoencoders, we pass input data through an encoder that makes a compressed representation of the input. Then, this representation is passed through a decoder to reconstruct the input data. Generally the encoder and decoder will be built with neural networks, then trained on example data.

<img src='assets/autoencoder_1.png' />

### Compressed Representation

A compressed representation can be great for saving and sharing any kind of data in a way that is more efficient than storing raw data. Compressed data is often cheaper to store in data centers and faster to share across Wi-Fi connected devices. In practice, the compressed representation often holds key information about an input image and we can use it for denoising images or oher kinds of reconstruction and transformation!

<img src='assets/denoising.png' width=60%/>


The idea is that we define our encoder and decoder as neural networks. Then, we train this complete autoencoder by passing in an original image and getting back as output a reconstructed image. We can then compare the original with the reconstruction. We want the original and reconstructed image to be as close as possible. So, our loss will actually be comparing these pixel values and measuring the difference between the original and reconstructed images. Once this whole network is trained, we should have a working encoder and decoder portion of a network, and we will be able to use either part to either compress or decompress a certain image. 

In [3]:
import torch
import numpy as np
from torchvision import datasets
import torchvision.transforms as transforms

# convert data to torch.FloatTensor
transform = transforms.ToTensor()

# load the training and test datasets
train_data = datasets.MNIST(root='data', train=True,
                                   download=True, transform=transform)
test_data = datasets.MNIST(root='data', train=False,
                                  download=True, transform=transform)

RuntimeError: data/MNIST/processed/training.pt is a zip archive (did you mean to use torch.jit.load()?)

We are only defining train and test loaders here, and that is because in this case, we really just want to get our training loss as low as possible. This is not a typical classification task, and validation sets are really most useful when we are trying to predict a quantity like a class. 

In [4]:
# Create training and test dataloaders

# number of subprocesses to use for data loading
num_workers = 0
# how many samples per batch to load
batch_size = 20

# prepare data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers)

NameError: name 'train_data' is not defined

### Visualize the Data

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
    
# obtain one batch of training images
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy()

# get one image from the batch
img = np.squeeze(images[0])

fig = plt.figure(figsize = (5,5)) 
ax = fig.add_subplot(111)
ax.imshow(img, cmap='gray')

---
## Linear Autoencoder

We'll train an autoencoder with these images by flattening them into 784 length vectors. The images from this dataset are already normalized such that the values are between 0 and 1. Let's start by building a simple autoencoder. The encoder and decoder should be made of **one linear layer**. The units that connect the encoder and decoder will be the _compressed representation_. The decoder will also be made of one linear layer that will up sample or increase the dimension of the compressed representation. We want this layer to output a vector of length 784. Later, we will reshape this vector output into a 28x28 reconstructed image, and then we will be able to compare these two. 

Since the images are normalized between 0 and 1, we need to use a **sigmoid activation on the output layer** to get values that match this input value range.

<img src='assets/simple_autoencoder.png' width=50% />


#### TODO: Build the graph for the autoencoder in the cell below. 
> The input images will be flattened into 784 length vectors. The targets are the same as the inputs. 
> The encoder and decoder will be made of two linear layers, each.
> The depth dimensions should change as follows: 784 inputs > **encoding_dim** > 784 outputs.
> All layers will have ReLu activations applied except for the final output layer, which has a sigmoid activation.

**The compressed representation should be a vector with dimension `encoding_dim=32`.**

In [None]:
import torch.nn as nn
import torch.nn.functional as F

# define the NN architecture
class Autoencoder(nn.Module):
    def __init__(self, encoding_dim):
        super(Autoencoder, self).__init__()
        ## encoder ##
        
        ## decoder ##
        

    def forward(self, x):
        # define feedforward behavior 
        # and scale the *output* layer with a sigmoid activation function
        
        return x

# initialize the NN
encoding_dim = 32
model = Autoencoder(encoding_dim)
print(model)