**AUTOENCODERS** (work in progress)

*Patrick Donnelly*

In this tutorial, we'll take a look at **autoencoder** networks. Autoencoders are kinda weird if you haven't seen them before. They take an input and learn a function to, uh, reproduce the input. This function is a neural network that passes the input through a "bottleneck" before "reconstructing" the input in the output layer (see https://www.jeremyjordan.me/autoencoders/). Why on earth would we want to do this? Well, the bottleneck gives us a compressed representation of our input, and we can then train the network to minimize the loss associated with the reconstruction of the input. 

As with our other tutorials, we'll use simple data in order to focus on the operations associated with constructing different autoencoder architectures. We'll experiment with some "vanilla" autoencoders and then move on to regularized, variational, and other autoencoders. As always, we will implement these networks in PyTorch.

Let's begin by defining a simple autoencoder architecture. We'll use the `vae` example from the PyTorch `examples` directory as reference. First we'll need to import some packages:

In [1]:
import torch
import torch.nn as nn

Now let's define our `VAE` class and constructor. This will inherit from the `nn` module:

In [2]:
class VAE(nn.Module):
    def __init__(self):
        super(VAE, self).__init__()

Before we define our network architecture, let's build some toy data. We're going to start with a familiar example: a 5x5 "zero digit" of one-bit pixels:

In [3]:
a_zero = [[0,1,1,1,0],[1,0,0,0,1],[1,0,0,0,1],[1,0,0,0,1],[0,1,1,1,0]]
a_zero

[[0, 1, 1, 1, 0],
 [1, 0, 0, 0, 1],
 [1, 0, 0, 0, 1],
 [1, 0, 0, 0, 1],
 [0, 1, 1, 1, 0]]

See the zero? Cool. Now we want to define that as a vector:

In [4]:
a_zero_vector = [0,1,1,1,0,1,0,0,0,1,1,0,0,0,1,1,0,0,0,1,0,1,1,1,0]
a_zero_vector

[0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0]

If we were doing image classification, we'd now want to define a label (e.g. a one-hot encoding vector) as our output. But since we're constructing an autoencoder, we actually already have our output. It's our input! Now we know the number of input and output nodes - it's just the length of `a_zero_vector`:

In [5]:
len(a_zero_vector)

25

Now we need our bottleneck. Let's start by adding a set of hidden nodes. We want our bottleneck to contain fewer nodes than our input and output layers. How about 10 nodes?

In [6]:
class VAE(nn.Module):
    def __init__(self):
        super(VAE, self).__init__()
        self.fc1 = nn.Linear(25, 10)
        self.fc2 = nn.Linear(10, 25)

`nn.Linear` defines a linear transformation (with bias) mapping input to output nodes. The number of nodes is passed as an argument. In this case, we have 25 input nodes and 10 output nodes for our first layer, and 10 input nodes and 25 output nodes for our second layer. By convention, we'll call our **fully-connected** layers `fc1` and `fc2`.

To begin experimenting with PyTorch, we'll need to convert our example "image" to a `tensor`: 

In [7]:
x = torch.tensor(a_zero_vector, requires_grad=False).float()
print(x.shape)
x

torch.Size([25])


tensor([0., 1., 1., 1., 0., 1., 0., 0., 0., 1., 1., 0., 0., 0., 1., 1., 0., 0.,
        0., 1., 0., 1., 1., 1., 0.])

Let's now define an `encode` method. This will pass the output of our first linear transformation through a `ReLu` activation function. We can define `ReLu` using the `nn` module, but let's follow our PyTorch examples code and use the `functional` API. First we import the `nn.functional` module (by convention) as `F`:

In [8]:
from torch.nn import functional as F

Now we can build a simple `encode` method. This will take our "image" `x` as input, pass it through the fully-connected layer `fc1` and apply a `relu` activation:

In [9]:
def encode(self, x):
    x = F.relu(self.fc1(x))
    return x

We'll also build a `decode` method. This will apply a `sigmoid` activation to the output of the `fc2` operation:

In [10]:
def decode(self, x):
    x = F.sigmoid(self.fc2(x))
    return x

This looks kinda verbose for a simple vanilla autoencoder, but it'll make sense as we expand the networ. We're sticking to the template in the PyTorch `examples` directory.

Now we just need a `forward` method that defines how the data propagates through our encoder and decoder:

In [11]:
def forward(self, x):
    x = self.encode(x)
    return self.decode(x)

Let's put it all together!

In [12]:
class VAE(nn.Module):
    def __init__(self):
        super(VAE, self).__init__()
        self.fc1 = nn.Linear(25, 10)
        self.fc2 = nn.Linear(10, 25)
        
    def encode(self, x):
        x = F.relu(self.fc1(x))
        return x
        
    def decode(self, x):
        x = F.sigmoid(self.fc2(x))
        return x
        
    def forward(self, x):
        x = self.encode(x)
        return self.decode(x)

Now we can call an instance of our autoencoder:

In [13]:
vae = VAE()
print(vae)

VAE(
  (fc1): Linear(in_features=25, out_features=10, bias=True)
  (fc2): Linear(in_features=10, out_features=25, bias=True)
)


Let's now feed our input through the instance of our autoencoder:

In [14]:
vae(x)



tensor([0.4703, 0.4871, 0.4439, 0.4296, 0.4083, 0.4344, 0.4225, 0.5771, 0.5311,
        0.4970, 0.5918, 0.4605, 0.5516, 0.4299, 0.5474, 0.4411, 0.3807, 0.5126,
        0.5438, 0.5133, 0.5032, 0.5225, 0.4660, 0.5765, 0.5075],
       grad_fn=<SigmoidBackward>)

Excellent! We can see that our autoencoder didn't do a great job of reconstructing our input. After all, it hasn't actually learned anything yet. We just passed the data through the network. We haven't backpropagated.

How do we measure our **reconstruction loss**?