# AutoEncoders

💡 Basic Purpose: A model that attempts to describe very large data with as few features as possible.

AutoEncoders have an **Encoder** and **Decoder**.

1. The **Encoder** takes the input data and reduces it to a smaller, compressed representation called the **latent space**

2. The **latent space** is a low-dimensional space that captures the essential features of the input data.

3. The **Decoder** then takes this compressed representation and tries to reconstruct the original data.


<p align="center">
    <img src="../showcase_images/autoEncoders.png" alt="auto encoder network" width="300">
</p>

- The **goal** is to make the reconstructed output as close to the original input as possible, while minimizing the reconstruction error. Autoencoders are commonly used for tasks like dimensionality reduction and data denoising.

**Latent Space**

The number of **neurons** in the latent space *layer* determines how many dimensions the Encoded data will be represented as.
-  In the above example we have 3 neurons so the **encoded data** will be represented in 3D space, if it were 5 it would be represented in 5D space.

*Latent Space Example Plot:*

<p align="center">
    <img src="../showcase_images/LatentSpaceRepresentation.png" alt="auto encoder network" width="500">
</p>

- Note the one image of a number 7 that is near the 0's cluster, because that point is in a region the decoder has learned to associate with the features of a "0", the reconstruction will likely be poor. The resulting image might look like a blurry "0", or an ambiguous shape that's a mix of a "7" and a "0".

**AutoEncoder Use-Cases**
- Dimensionality Reduction
- Image Compression
- Image denoising
- Anomaly detection
- Feature extraction

**Limitations**
- Overfitting: The model might simple just copy the input to the output without learning a meaningful compressed representation.
- Lack of Generative capability (VAE models are used for this purpose)
- Computational cost


---

# Torch Implementation With MNIST Dataset

⭐️ The MNIST dataset was used in -> [../basic_NN_multi-class-classification.ipynb](../basic_NN_multi-class-classification.ipynb), review it to understand the dataset. However, we will download the torch MNIST for this project.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
import numpy as np

In [2]:
# Set the device for training (GPU if available, otherwise CPU), note: mps is for the Mac silicone.
DEVICE = torch.device("mps" if torch.cuda.is_available() else "cpu")

### Get The Dataset:

In [3]:
# --- Data Loading and Preprocessing ---
# Define a transform to convert images to tensors
transform = transforms.ToTensor()

# Download and load the MNIST training and test datasets if it wasn't already downloaded, else load it.
train_dataset = datasets.MNIST(root='../datasets', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='../datasets', train=False, download=True, transform=transform)


### AutoEncoder Model

In [4]:
# --- Hyperparameters ---
LEARNING_RATE = 1e-3
BATCH_SIZE = 64
EPOCHS = 10
LATENT_DIM = 16 # The size of our compressed latent space representation

In [5]:
# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False)

In [12]:
class AutoEncoder(nn.Module):
    def __init__(self):
        super(AutoEncoder, self).__init__()

        # Encoder: Converts input data into a compressed latent representation
        self.encoder = nn.Sequential(
            # Input: the MNIST dataset is labeled as [batch_size, 1, 28, 28]
            nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(16, 32, kernel_size=3, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1),
            nn.ReLU(),
        )

        # Latent space: Flatten the output of the encoder
        self.flatten = nn.Flatten()
        self.fc_encode = nn.Linear(64 * 4 * 4, LATENT_DIM)

        # Decoder: Reconstructs image from latent representation
        self.fc_decode = nn.Linear(LATENT_DIM, 64 * 4 * 4, LATENT_DIM)
        self.unflatten = nn.Unflatten(1, (64,4,4))
        self.decoder =nn.Sequential(
            # Input: is the latent representation [batch_size, 64, 4, 4]
            nn.ConvTranspose2d(64, 32, kernel_size=3, stride=2, padding=1, output_padding=1),
            nn.ReLU(),
            nn.ConvTranspose2d(32, 16, kernel_size=3, stride=2, padding=1, output_padding=1),
            nn.ReLU(),
            nn.ConvTranspose2d(16, 1, kernel_size=3, stride=2, padding=1, output_padding=1),
            nn.Sigmoid() # sigmoid to enure output is between 0 and 1 
        )

    def forward(self, x):
        encoded = self.encoder(x)
        flattened = self.flatten(encoded)
        latent = self.fc_encode(flattened)
        
        decoded_flattened = self.fc_decode(latent)
        decoded = self.unflatten(decoded_flattened)
        reconstructed = self.decoder(decoded)
        return reconstructed


`Initialize`

In [13]:
model = AutoEncoder().to(DEVICE)
criterion = nn.MSELoss() # Mean Squared Error is a common loss function for autoencoders
optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE)