# Anomaly Detection using Autoencoders and GANs

## Introduction

Anomaly detection is a critical aspect of data analysis, aiming to identify unusual patterns or outliers in datasets that do not conform to expected behavior. These anomalies can indicate significant events, such as fraud, system failures, or cyber-attacks. Leveraging deep learning techniques like Autoencoders and Generative Adversarial Networks (GANs), we can build robust models for detecting anomalies by learning data representations and identifying deviations.

In this tutorial, we'll explore how to detect anomalies using Autoencoders and GANs. We'll delve into the underlying mathematics, provide example code, and explain the processes involved. We'll reference key papers and discuss the latest developments in the field. Relevant imagery will be included to enhance understanding.

## Table of Contents

1. [Understanding Anomaly Detection](#1)
   - [What is Anomaly Detection?](#1.1)
   - [Applications](#1.2)
2. [Autoencoders for Anomaly Detection](#2)
   - [Underlying Mathematics](#2.1)
   - [Implementation](#2.2)
3. [GANs for Anomaly Detection](#3)
   - [Underlying Mathematics](#3.1)
   - [Implementation](#3.2)
4. [Latest Developments](#4)
   - [Variational Autoencoders (VAEs)](#4.1)
   - [Adversarial Autoencoders](#4.2)
5. [Conclusion](#5)
6. [References](#6)

<a id="1"></a>
# 1. Understanding Anomaly Detection

<a id="1.1"></a>
## 1.1 What is Anomaly Detection?

Anomaly detection refers to the identification of items, events, or observations that do not conform to an expected pattern or other items in a dataset. Anomalies are also known as outliers, novelties, noise, deviations, or exceptions.

### Types of Anomalies

- **Point Anomalies**: Individual data instances that are anomalous with respect to the rest of the data.
- **Contextual Anomalies**: Data instances that are anomalous in a specific context (e.g., time series data).
- **Collective Anomalies**: A collection of related data instances that are anomalous together.

<a id="1.2"></a>
## 1.2 Applications

- **Fraud Detection**: Identifying fraudulent transactions in finance.
- **Cybersecurity**: Detecting intrusions and malicious activities in networks.
- **Healthcare**: Identifying abnormal patterns in medical data.
- **Manufacturing**: Detecting defects or faults in production processes.
- **Environmental Monitoring**: Identifying unusual patterns in climate data.

<a id="2"></a>
# 2. Autoencoders for Anomaly Detection

Autoencoders are unsupervised neural network models that learn to reconstruct input data by compressing it into a lower-dimensional representation and then reconstructing it back.

<a id="2.1"></a>
## 2.1 Underlying Mathematics

An autoencoder consists of two main components:

- **Encoder**: Maps the input data \( \mathbf{x} \) to a latent representation \( \mathbf{z} \).
- **Decoder**: Reconstructs the input data from the latent representation.

### Encoder Function

$[
\mathbf{z} = f(\mathbf{x}) = \sigma(W_e \mathbf{x} + \mathbf{b}_e)
]$

### Decoder Function

$[
\hat{\mathbf{x}} = g(\mathbf{z}) = \sigma(W_d \mathbf{z} + \mathbf{b}_d)
]$

- $( \sigma )$: Activation function (e.g., ReLU, sigmoid).
- $( W_e, W_d )$: Weight matrices for encoder and decoder.
- $( \mathbf{b}_e, \mathbf{b}_d )$: Bias vectors.

### Loss Function

The autoencoder is trained to minimize the reconstruction error between the input $( \mathbf{x} )$ and the reconstruction $( \hat{\mathbf{x}} )$:

$[
\mathcal{L}(\mathbf{x}, \hat{\mathbf{x}}) = \| \mathbf{x} - \hat{\mathbf{x}} \|^2
]$

### Anomaly Detection Principle

- **Normal Data**: The autoencoder learns to reconstruct normal data with low error.
- **Anomalous Data**: Reconstruction error is higher for anomalies, as they differ from the patterns learned during training.

By setting a threshold on the reconstruction error, we can classify data points as normal or anomalous.

<a id="2.2"></a>
## 2.2 Implementation

We'll implement an autoencoder for anomaly detection using the MNIST dataset. We'll simulate anomalies by introducing corrupted images.

In [None]:
# Import necessary libraries
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, Dataset
import numpy as np
import matplotlib.pyplot as plt

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')

In [None]:
# Load MNIST dataset
transform = transforms.Compose([transforms.ToTensor()])

train_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = torchvision.datasets.MNIST(root='./data', train=False, transform=transform)

# We will use digit '0' as normal data and others as anomalies
class MNISTAnomalyDataset(Dataset):
    def __init__(self, dataset, normal_digit=0):
        self.normal_data = []
        self.anomalous_data = []
        for img, label in dataset:
            if label == normal_digit:
                self.normal_data.append((img, 0))  # Label 0 for normal
            else:
                self.anomalous_data.append((img, 1))  # Label 1 for anomaly
    def __len__(self):
        return len(self.normal_data)
    def __getitem__(self, idx):
        return self.normal_data[idx]

# Create datasets
normal_train_dataset = MNISTAnomalyDataset(train_dataset)
normal_test_dataset = MNISTAnomalyDataset(test_dataset)

# Create data loaders
batch_size = 128
train_loader = DataLoader(dataset=normal_train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(dataset=normal_test_dataset, batch_size=batch_size, shuffle=False)

In [None]:
# Define the Autoencoder model
class Autoencoder(nn.Module):
    def __init__(self):
        super(Autoencoder, self).__init__()
        # Encoder
        self.encoder = nn.Sequential(
            nn.Linear(28 * 28, 128),
            nn.ReLU(True),
            nn.Linear(128, 64),
            nn.ReLU(True),
            nn.Linear(64, 12),
            nn.ReLU(True),
            nn.Linear(12, 3)
        )
        # Decoder
        self.decoder = nn.Sequential(
            nn.Linear(3, 12),
            nn.ReLU(True),
            nn.Linear(12, 64),
            nn.ReLU(True),
            nn.Linear(64, 128),
            nn.ReLU(True),
            nn.Linear(128, 28 * 28),
            nn.Sigmoid()
        )
    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

model = Autoencoder().to(device)
print(model)

In [None]:
# Loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)

In [None]:
# Training the Autoencoder
num_epochs = 20
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for data, _ in train_loader:
        img = data.view(data.size(0), -1).to(device)
        
        # Forward pass
        output = model(img)
        loss = criterion(output, img)
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item() * img.size(0)
    epoch_loss = running_loss / len(train_loader.dataset)
    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {epoch_loss:.4f}')

### Visualizing Reconstruction

Let's visualize some reconstructed images to see how well the autoencoder performs.

In [None]:
# Display original and reconstructed images
model.eval()
dataiter = iter(test_loader)
images, _ = dataiter.next()
images = images.view(images.size(0), -1).to(device)

with torch.no_grad():
    outputs = model(images)

images = images.view(-1, 1, 28, 28).cpu().numpy()
outputs = outputs.view(-1, 1, 28, 28).cpu().numpy()

# Plot
fig, axes = plt.subplots(nrows=2, ncols=8, sharex=True, sharey=True, figsize=(12,4))
for images_row, row in zip([images, outputs], axes):
    for img, ax in zip(images_row, row):
        ax.imshow(img.squeeze(), cmap='gray')
        ax.axis('off')
plt.show()

### Anomaly Detection on Test Data

We will use the reconstruction error to detect anomalies. Since the autoencoder was trained only on normal data (digit '0'), it should have higher reconstruction error on anomalous data (other digits).

In [None]:
# Prepare test data with anomalies
test_data = []
test_labels = []
for img, label in test_dataset:
    test_data.append(img)
    test_labels.append(0 if label == 0 else 1)  # Label 0: normal, 1: anomaly

test_data = torch.stack(test_data)
test_labels = torch.tensor(test_labels)

# Compute reconstruction errors
model.eval()
with torch.no_grad():
    test_data_flat = test_data.view(test_data.size(0), -1).to(device)
    reconstructions = model(test_data_flat)
    mse = torch.mean((test_data_flat - reconstructions) ** 2, dim=1).cpu().numpy()

# Set threshold
threshold = np.percentile(mse, 95)  # Adjust percentile as needed

# Predict anomalies
predictions = (mse > threshold).astype(int)

# Calculate metrics
from sklearn.metrics import classification_report, confusion_matrix
print('Classification Report:')
print(classification_report(test_labels, predictions, target_names=['Normal', 'Anomaly']))

print('Confusion Matrix:')
print(confusion_matrix(test_labels, predictions))

**Explanation:**

- We prepare the test data containing both normal and anomalous samples.
- Compute the reconstruction error (mean squared error) for each sample.
- Set a threshold based on the reconstruction error distribution.
- Predict anomalies by comparing reconstruction errors to the threshold.
- Evaluate the model using classification metrics.

<a id="3"></a>
# 3. GANs for Anomaly Detection

Generative Adversarial Networks (GANs) are composed of two neural networks, a generator and a discriminator, competing in a zero-sum game. GANs can be used for anomaly detection by learning the data distribution and identifying samples that do not conform to it.

<a id="3.1"></a>
## 3.1 Underlying Mathematics

### GAN Architecture

- **Generator (G)**: Attempts to produce data that is indistinguishable from real data.
- **Discriminator (D)**: Attempts to distinguish between real data and data produced by the generator.

### Loss Functions

The GAN is trained using the following minimax game:

$[
\min_G \max_D \mathbb{E}_{\mathbf{x} \sim p_{\text{data}}(\mathbf{x})} [\log D(\mathbf{x})] + \mathbb{E}_{\mathbf{z} \sim p_{\mathbf{z}}(\mathbf{z})} [\log (1 - D(G(\mathbf{z})))]
]$

- $( p_{\text{data}}(\mathbf{x}) )$: Real data distribution.
- $( p_{\mathbf{z}}(\mathbf{z}) )$: Prior noise distribution (e.g., Gaussian).

### Anomaly Detection Principle

- **Train GAN on Normal Data**: The generator learns to produce data similar to normal data.
- **Anomaly Score**: Measure how well a sample fits into the learned data distribution.

Common methods for anomaly scoring with GANs:

- **Discriminator Score**: Use the output of the discriminator as an anomaly score.
- **Reconstruction Error**: Combine generator and discriminator losses to compute an anomaly score.

### AnoGAN [[1]](#ref1)

- **Idea**: For a given test sample, find the closest point in the generator's latent space and measure the reconstruction error.
- **Anomaly Score**:

  $[
  A(\mathbf{x}) = (1 - \lambda) \| \mathbf{x} - G(\mathbf{z}^*) \| + \lambda \cdot D(\mathbf{x})
  ]$

  - $( \mathbf{z}^* )$: Latent vector optimized to minimize reconstruction error.
  - $( \lambda )$: Weighting parameter.

<a id="3.2"></a>
## 3.2 Implementation

We'll implement a simple GAN and use it for anomaly detection on the MNIST dataset.

In [None]:
# Define the Generator and Discriminator
class Generator(nn.Module):
    def __init__(self, latent_dim):
        super(Generator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(latent_dim, 128),
            nn.ReLU(True),
            nn.Linear(128, 256),
            nn.ReLU(True),
            nn.Linear(256, 512),
            nn.ReLU(True),
            nn.Linear(512, 28 * 28),
            nn.Tanh()
        )
    def forward(self, z):
        img = self.model(z)
        img = img.view(img.size(0), 1, 28, 28)
        return img

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(28 * 28, 512),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(256, 1),
            nn.Sigmoid()
        )
    def forward(self, img):
        img_flat = img.view(img.size(0), -1)
        validity = self.model(img_flat)
        return validity

latent_dim = 100
generator = Generator(latent_dim).to(device)
discriminator = Discriminator().to(device)

In [None]:
# Loss function and optimizers
adversarial_loss = nn.BCELoss()

optimizer_G = torch.optim.Adam(generator.parameters(), lr=1e-3)
optimizer_D = torch.optim.Adam(discriminator.parameters(), lr=1e-3)

In [None]:
# Training the GAN
num_epochs = 50
for epoch in range(num_epochs):
    for i, (imgs, _) in enumerate(train_loader):
        real_imgs = imgs.to(device)
        
        # Adversarial ground truths
        valid = torch.ones(imgs.size(0), 1, device=device)
        fake = torch.zeros(imgs.size(0), 1, device=device)
        
        # Train Generator
        optimizer_G.zero_grad()
        
        # Sample noise as generator input
        z = torch.randn(imgs.size(0), latent_dim, device=device)
        
        # Generate images
        gen_imgs = generator(z)
        
        # Loss measures generator's ability to fool the discriminator
        g_loss = adversarial_loss(discriminator(gen_imgs), valid)
        
        g_loss.backward()
        optimizer_G.step()
        
        # Train Discriminator
        optimizer_D.zero_grad()
        
        # Measure discriminator's ability to classify real from generated samples
        real_loss = adversarial_loss(discriminator(real_imgs), valid)
        fake_loss = adversarial_loss(discriminator(gen_imgs.detach()), fake)
        d_loss = (real_loss + fake_loss) / 2
        
        d_loss.backward()
        optimizer_D.step()
    
    print(f'Epoch [{epoch+1}/{num_epochs}] | D Loss: {d_loss.item():.4f} | G Loss: {g_loss.item():.4f}')

### Visualizing Generated Images

In [None]:
# Generate and display images
generator.eval()
with torch.no_grad():
    z = torch.randn(16, latent_dim, device=device)
    gen_imgs = generator(z).cpu()

# Plot
fig, axes = plt.subplots(nrows=2, ncols=8, sharex=True, sharey=True, figsize=(12,4))
for img, ax in zip(gen_imgs, axes.flatten()):
    ax.imshow(img.squeeze(), cmap='gray')
    ax.axis('off')
plt.show()

### Anomaly Detection

We'll implement a simplified version of the AnoGAN method.

In [None]:
# Anomaly detection function
def anomaly_score(x, generator, discriminator, lambda_=0.1, iterations=500):
    z = torch.randn(1, latent_dim, requires_grad=True, device=device)
    optimizer = torch.optim.Adam([z], lr=1e-2)
    x = x.to(device)
    
    for i in range(iterations):
        optimizer.zero_grad()
        gen_x = generator(z)
        residual_loss = torch.mean((gen_x - x) ** 2)
        discrimination_loss = adversarial_loss(discriminator(gen_x), torch.ones(1, 1, device=device))
        loss = (1 - lambda_) * residual_loss + lambda_ * discrimination_loss
        loss.backward()
        optimizer.step()
    
    anomaly_score = loss.item()
    return anomaly_score


In [None]:
# Compute anomaly scores
anomaly_scores = []
labels = []
model.eval()
for img, label in test_dataset:
    score = anomaly_score(img.unsqueeze(0), generator, discriminator)
    anomaly_scores.append(score)
    labels.append(0 if label == 0 else 1)

anomaly_scores = np.array(anomaly_scores)
labels = np.array(labels)

# Set threshold
threshold = np.percentile(anomaly_scores, 95)  # Adjust percentile as needed

# Predict anomalies
predictions = (anomaly_scores > threshold).astype(int)

# Calculate metrics
print('Classification Report:')
print(classification_report(labels, predictions, target_names=['Normal', 'Anomaly']))

print('Confusion Matrix:')
print(confusion_matrix(labels, predictions))

**Explanation:**

- For each test image, we optimize a latent vector \( \mathbf{z} \) to minimize the combined residual and discrimination losses.
- Compute the anomaly score based on the final loss.
- Set a threshold to classify samples as normal or anomalous.

<a id="4"></a>
# 4. Latest Developments

Anomaly detection using deep learning continues to evolve, with new architectures and methods being proposed.

<a id="4.1"></a>
## 4.1 Variational Autoencoders (VAEs)

VAEs [[2]](#ref2) introduce a probabilistic approach to autoencoders by learning a latent space that follows a predefined distribution (e.g., Gaussian). VAEs can be used for anomaly detection by comparing the likelihood of data points under the learned distribution.

### Key Concepts

- **Encoder Outputs**: Mean $( \mu )$ and standard deviation $( \sigma )$ of the latent variables.
- **Reparameterization Trick**: Allows backpropagation through stochastic nodes by expressing latent variables as $( \mathbf{z} = \mu + \sigma \odot \epsilon )$, where $( \epsilon \sim \mathcal{N}(0, 1) )$.
- **Loss Function**: Combines reconstruction loss and Kullback-Leibler (KL) divergence:

  $[
  \mathcal{L} = \mathbb{E}_{q(\mathbf{z}|\mathbf{x})} [\log p(\mathbf{x}|\mathbf{z})] - \text{KL}(q(\mathbf{z}|\mathbf{x}) \| p(\mathbf{z}))
  ]$

<a id="4.2"></a>
## 4.2 Adversarial Autoencoders

Adversarial Autoencoders [[3]](#ref3) combine autoencoders with adversarial training to match the aggregated posterior of the latent representation to a target distribution.

### Key Concepts

- **Encoder and Decoder**: Similar to traditional autoencoders.
- **Discriminator**: Trained to distinguish between encoded latent vectors and samples from the target distribution.
- **Adversarial Loss**: Encourages the encoder to produce latent representations that match the target distribution.

### Advantages for Anomaly Detection

- By enforcing a specific distribution on the latent space, anomalies can be detected as samples that do not conform to this distribution.

<a id="5"></a>
# 5. Conclusion

Anomaly detection is a vital task in various domains, and deep learning techniques like Autoencoders and GANs provide powerful tools for identifying anomalies in complex datasets. By leveraging reconstruction errors and adversarial training, we can build models that learn normal data patterns and detect deviations effectively. Understanding the underlying mathematics and implementation details enables practitioners to develop robust anomaly detection systems. The field continues to advance with new architectures like VAEs and Adversarial Autoencoders offering improved performance.

<a id="6"></a>
# 6. References

1. <a id="ref1"></a>Schlegl, T., Seeböck, P., Waldstein, S. M., Schmidt-Erfurth, U., & Langs, G. (2017). *Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery*. In Proceedings of the International Conference on Information Processing in Medical Imaging (IPMI).
2. <a id="ref2"></a>Kingma, D. P., & Welling, M. (2014). *Auto-Encoding Variational Bayes*. [arXiv:1312.6114](https://arxiv.org/abs/1312.6114)
3. <a id="ref3"></a>Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., & Frey, B. (2016). *Adversarial Autoencoders*. [arXiv:1511.05644](https://arxiv.org/abs/1511.05644)

---

This notebook provides an in-depth exploration of anomaly detection using Autoencoders and GANs. You can run the code cells to see how these models are implemented and experiment with different datasets and parameters.