https://www.youtube.com/watch?v=zp8clK9yCro

The idea behind autoencoders are simple. W ehave an input image and we encode this image into low dimensional embedding of the image and than decode it to reconstruct original image as good as possible.

<center><img src='./images/autoencoder.PNG' width=550px></center> 

An example for this is video compression where we want to send the images over the network from one end to other end. So instead of sending full image we can send encoded data only. And on the other side we than have the decoder which can decode the iamge. This will save a lot of cost and could be efficient.
<center><img src='./images/autoencoder_1.PNG' width=550px></center> 

### Giw do we encode and decode image.
For both operation we can simply use feed forward neural net or when we deal with images, we often use CNN.

- When we speak about such a model it is called generative model. Because here instead of doing a classification at the end we want to generate images based on encoding.
- In order to train our model we need loss funciton that we want to optimize . We want our reconstructed iamge as claose as original image. All pixel value will almost be same. therfore, our loss functon is simple an MSE, whcih calculate the error for each pixel.
- There is a tric, whcih helps to understand whole process better. instead of thinking of all transformations as operations in one way. We can think it as a circular or forth and back operation. We encode in one direction and than go in other direction to decode it.
- so for each transformation we apply in the encoder we want to apply the inverse of this operation in the decoder. For example if we apply a linear layer that reduces the size the reverse operation is also a linear layer that increases the size again
- Now for cnns this is a little bit trickier here we apply convolutional layers so  the inverse is actually called a __transpose convolution__.

### `nn.convTranspose2d`
- PyTorch has this layer already included for us so we can simply use the `nn.convTranspose2d` layer the only tricky thing with this is to determine the correct input and output shapes 
-
- <center><img src='./images/autoencoder_2.PNG' width=550px></center> 
- <center><img src='./images/autoencoder_3.PNG' width=550px></center> 

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
import matplotlib.pyplot as plt

In [None]:
# Load the MNIST data

# normalization if the image values are not normalized.
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5), (0.5))
    ])

transform = transforms.ToTensor()

# Download data
mnist_data = datasets.MNIST(root='./data', 
                            train=True, 
                            download=True, 
                            transform=transform)

# load data
data_loader = torch.utils.data.DataLoader(dataset=mnist_data,
                                          batch_size=64,
                                          shuffle=True)

In [None]:
# Lets check the first batch
dataiter = iter(data_loader)
images, labels = dataiter.next() # call the iamges using dataiter.next()
print(torch.min(images), torch.max(images))
# >>> tensor(0.) tensor(1.) # we can see that values are between 0 and 1

# Autoencoder with simple feedforwatrd 

In [None]:
# Simple feed forward network with linear layer and repeatedly reduce the size
class Autoencoder_Linear(nn.Module):
    def __init__(self):
        super().__init__()        
        self.encoder = nn.Sequential(
            nn.Linear(28 * 28, 128), # (N, 784) -> (N, 128) N=batch size, initial size of image is 28*28=784. image size is reduced from 784 to 128
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 12),
            nn.ReLU(),
            nn.Linear(12, 3) # -> N, 3 output = 3. So size of image is reduced from 784 to 3. We dont need activation fucniton in the last layer.
        )

        # In decoder we go in opposite direction
        self.decoder = nn.Sequential(
            nn.Linear(3, 12),
            nn.ReLU(),
            nn.Linear(12, 64),
            nn.ReLU(),
            nn.Linear(64, 128),
            nn.ReLU(),
            nn.Linear(128, 28 * 28), #N,3 -> N,784
            nn.Sigmoid() # we apply ativation funtion. In the starting we checked the value in image and we know that the value si in between 0 and 1.
            # Now we need activation fucntion whcih puts value beween 0 and. So we use Sigmoid funciton.
        )

    # Apply encoder-decoer in forward pass.
    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return decoded

NOTE: 
# Input image is in range of  [-1, +1] -> use `nn.Tanh` instead of sigmoid. to transform we  defined transform above.
# transform = transforms.Compose(
#     [transforms.ToTensor(),
#      transforms.Normalize((0.5), (0.5))
#     ])

In [None]:
model = Autoencoder_Linear()

criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), # optimize the parameters of our model.
                             lr=1e-3, 
                             weight_decay=1e-5)

In [None]:
# training
# Point to training loop video
num_epochs = 10
outputs = [] # create alist to storre output
for epoch in range(num_epochs): # iterate over epochs
    for (img, _) in data_loader: # iterate over batch or dataloader.
        img = img.reshape(-1, 28*28) # -> use for Autoencoder_Linear
        recon = model(img)
        loss = criterion(recon, img)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    print(f'Epoch:{epoch+1}, Loss:{loss.item():.4f}')
    outputs.append((epoch, img, recon))

torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])
torch.Size([64, 3])


In [None]:
# plot the iamges to check how good its reconstructued
for k in range(0, num_epochs, 4):
    plt.figure(figsize=(9, 2))
    plt.gray()
    imgs = outputs[k][1].detach().numpy() # as output is torch tensor, we convert to numpy using detach
    recon = outputs[k][2].detach().numpy()
    for i, item in enumerate(imgs):
        if i >= 9: break
        plt.subplot(2, 9, i+1)
        item = item.reshape(-1, 28,28) # -> use for Autoencoder_Linear
        # item: 1, 28, 28
        plt.imshow(item[0])
            
    for i, item in enumerate(recon):
        if i >= 9: break
        plt.subplot(2, 9, 9+i+1) # row_length + i + 1
        item = item.reshape(-1, 28,28) # -> use for Autoencoder_Linear
        # item: 1, 28, 28
        plt.imshow(item[0])

# Autoencoder using conv2D
Lets see if we can use convolutional neural network and improve the performance.

In [None]:
class Autoencoder(nn.Module):
    def __init__(self):
        super().__init__()        
        # N, 1, 28, 28
        self.encoder = nn.Sequential(
            # instead of linear layer we use conv2D layer
            nn.Conv2d(1, 16, 3, stride=2, padding=1), #  (inputchannel, out_channel, kerenelsize, stride , padding)). # -> N, 16, 14, 14. We reduced the size of image by 50% from 28 to 14
            nn.ReLU(),
            # becasreful to get correct shape and size.
            nn.Conv2d(16, 32, 3, stride=2, padding=1), # -> N, 32, 7, 7
            nn.ReLU(),
            nn.Conv2d(32, 64, 7) # -> N, 64, 1, 1  16 channels and output image is is 1x1
            
        )

        # Go in backward diirection using convTranspose2D layer.
        # N , 64, 1, 1 We have this size from encoder.
        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(64, 32, 7), # -> N, 32, 7, 7
            nn.ReLU(),
            nn.ConvTranspose2d(32, 16, 3, stride=2, padding=1, output_padding=1), # N, 16, 14, 14 (N,16,13,13 without output_padding)
            nn.ReLU(),
            nn.ConvTranspose2d(16, 1, 3, stride=2, padding=1, output_padding=1), # N, 1, 28, 28  (N,1,27,27 without output_padding)
            nn.Sigmoid()
        )

    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return decoded
    

# Note: If you want to use pooling with CNN. nn.MaxPool2d -> use nn.MaxUnpool2d, or use different kernelsize, stride etc to compensate...
# Input [-1, +1] -> use `nn.Tanh`

In [None]:
model = Autoencoder()

criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(),
                             lr=1e-3, 
                             weight_decay=1e-5)

In [None]:
# training
# Point to training loop video
num_epochs = 10
outputs = [] # create alist to storre output
for epoch in range(num_epochs): # iterate over epochs
    for (img, _) in data_loader: # iterate over batch or dataloader.
        # img = img.reshape(-1, 28*28) # -> use for Autoencoder_Linear. No need as our intial image size is 2D
        recon = model(img)
        loss = criterion(recon, img)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    print(f'Epoch:{epoch+1}, Loss:{loss.item():.4f}')
    outputs.append((epoch, img, recon))

In [None]:
for k in range(0, num_epochs, 4):
    plt.figure(figsize=(9, 2))
    plt.gray()
    imgs = outputs[k][1].detach().numpy()
    recon = outputs[k][2].detach().numpy()
    for i, item in enumerate(imgs):
        if i >= 9: break
        plt.subplot(2, 9, i+1)
        # item = item.reshape(-1, 28,28) # -> use for Autoencoder_Linear
        # item: 1, 28, 28
        plt.imshow(item[0])
            
    for i, item in enumerate(recon):
        if i >= 9: break
        plt.subplot(2, 9, 9+i+1) # row_length + i + 1
        # item = item.reshape(-1, 28,28) # -> use for Autoencoder_Linear
        # item: 1, 28, 28
        plt.imshow(item[0])

#### Homework: Use MaxPool2d, inspect the encoded data (can it be plotted as img?)