# **Project: Anomaly Detection for AITEX Dataset**
#### Track: VAE
## `Notebook 6`: Building the Variational Autoencoder (VAE)
**Author**: Oliver Grau 

**Date**: 27.03.2025  
**Version**: 1.0

## 📚 Table of Contents

- [1. Introduction](#1-introduction)
- [2. Overview: Why a VAE?](#2-overview-why-a-vae)
- [3. Model Evolution & Design Choices](#3-model-evolution--design-choices)
- [4. Custom Loss Functions for Training](#4-custom-loss-functions-for-training)
- [5. Instanciating the Model](#5-instanciating-the-model)
- [6. Conclusion & Outlook](#5-conclusion--outlook)

---

## 1. Introduction
In this notebook, we detail the evolution and architecture of our Variational Autoencoder (VAE) tailored specifically for anomaly detection on the AITEX Fabric Defect Dataset.

---

## 2. Overview: Why a VAE?

A **Variational Autoencoder (VAE)** is (for now) suitable for anomaly detection because it learns a probabilistic representation of normal data. Deviations from this learned representation signal anomalies.

---

## 3. Model Evolution & Design Choices

The AitexVAE model evolved through several iterations, each enhancing its ability to reconstruct fabric images clearly and robustly. As I developed the notebook series I ran through several evolutions of the model and experimented a lot with it. By this experimentation my original model (AitexVAE) evolved and is now in version 8.

Here’s a comprehensive analysis of each model from `AitexVAE` to `AitexVAEv8`:


### 🌱 `AitexVAE` – **Baseline Architecture**

#### ✅ Architecture
- Standard 4-layer CNN encoder (downsampling 256x256 to 16x16).
- Fully connected `mu` and `logvar` heads → latent vector.
- Decoder:
  - Linear → [64, 16, 16] → two transposed convs: 16×16 → 64×64 → 256×256.
  - Uses `Sigmoid()` at the end.

#### 🎯 Purpose
- Establishes a minimal working baseline.
- Strong downsampling but **shallow decoder** → limited recon power.

---

### 🧰 `AitexVAEv2` – **Configurable Channels**

#### ✅ What's New
- You can define:
  - `encoder_channels` (e.g., [32, 64, 128, 256])
  - `decoder_channels` (e.g., [64, 32, 1])
- Adds flexibility for experimentation.
- Transposed convolutions still used in decoder.

#### 🎯 Purpose
- Tune network capacity and symmetry.
- Still no bottleneck/BatchNorm or attention.

---

### 🧱 `AitexVAEv3` – **Fixed but Stronger Decoder**

#### ✅ What's New
- Encoder and decoder are hardcoded like v1, but with:
  - More consistent depth: encoder ends with 256-channels
  - Decoder starts from `[64, 16, 16]` (like v1), upsampled in 2 big steps

#### 🎯 Purpose
- Simple fixed-structure VAE.
- Control experiment for future improvements.

---

### 🔄 `AitexVAEv4` – **Improved Decoder and Bottleneck**

#### ✅ What's New
- Introduces:
  - **BatchNorm1d** in bottleneck.
  - A **fully connected intermediate layer** before `mu`/`logvar`.
  - Optional **SEBlock** (attention) before decoding.
- Decoder starts at `[256, 16, 16]` and upsamples in 4 steps.

#### 🎯 Purpose
- Normalize latent distributions.
- Strengthen decoder.
- Add optional attention to guide decoding.

---

### 🔗 `AitexVAEv5` – **Skip Connections (Additive)**

#### ✅ What's New
- First model to use **additive skip connections** from encoder to decoder.
- Encoder stores feature maps from each conv layer.
- Decoder adds skip features after each upsampling stage.

#### 🎯 Purpose
- Improve reconstructions by **reusing spatial context**.
- Add spatial detail from encoder to decoder.

---

### 🔗➕ `AitexVAEv6` – **Skip Connections (Concat)**

#### ✅ What's New
- Instead of adding, uses **concatenation** for skip connections.
- Each concat is followed by a **1×1 conv** to reduce channels.
- More flexible than additive: lets the model learn fusion.

#### 🎯 Purpose
- Empower the decoder to **learn how much to use from encoder**.
- Improve gradient flow and spatial recovery.

---

### 🚪🔒 `AitexVAEv8` – **Gated Skip Connections (ChannelGate)**

#### ✅ What's New
- Replaces concat with **learnable gates per channel** (ChannelGate).
- Gate = sigmoid(MLP(average pooled enc_feat)) → weight per channel.
- Each decoder feature map gets **modulated encoder features**, added in.

#### 🎯 Purpose
- Let the model **learn how much encoder info to pass at each stage**.
- Greatly improves control and reduces noisy skip information.
- Cleaner gradients and better reconstruction control.

---

### 🎓 Overall Evolution Summary

| Version    | Key Idea                        | Strengths                                 |
|------------|----------------------------------|--------------------------------------------|
| `AitexVAE` | Basic baseline                  | Easy to train, limited decoder             |
| `v2`       | Configurable channel structure  | Good for hyperparameter tuning             |
| `v3`       | Stronger decoder                | More stable reconstruction                 |
| `v4`       | Bottleneck + Attention          | Normalize latent + SE refinement           |
| `v5`       | Additive skip connections       | Preserves spatial features                 |
| `v6`       | Concat skip connections         | Learnable fusion, flexible                 |
| `v8`       | **Gated** skip connections      | Best control over encoder info reuse       |

You've just explored a carefully crafted **evolutionary series of VAE architectures**, developed through an iterative process of **training → evaluation → architectural refinement → retraining**. Each version introduces new ideas and structural changes aimed at better addressing the unique challenges posed by the AITEX fabric dataset.

This progression ultimately led us to version **`AitexVAEv8`**, which incorporates gated skip connections for more controlled feature reuse. This is our most advanced and expressive VAE so far.

As a learner, you're free to:
- ✅ Try out any of the intermediate versions to better understand how each architectural change affects performance, or
- 🚀 Jump straight to the conclusions in **`08_Why the VAE Struggles with AITEX Anomaly Detection.ipynb`**, where we summarize key insights from this VAE branch and explain why this approach, despite its strengths, faces fundamental limitations on the AITEX dataset.

The choice is yours! Whether to dive deep into each version or continue forward in your anomaly detection journey. If you want to dive deep into the journey then start with the notebook **`07_Training the model.ipynb`**.


---

## 4. Custom Loss Functions for Training

Our VAEs were trained using custom loss functions, blending spatial and frequency-domain reconstruction metrics with KL divergence to ensure high-fidelity reconstructions and meaningful latent representations.

### Key Loss Functions:
- **`vae_loss`**: Classic VAE loss (MSE + KL divergence).
- **`frequency_vae_loss`**: Loss computed on FFT magnitudes of images (captures textural details).
- **`log_scaled_frequency_vae_loss`**: Uses log-scaled FFT magnitudes for better handling varying signal strengths.
- **`hybrid_vae_loss`**: Combines spatial MSE and frequency-domain losses.
- **`hybrid_spatial_vae_loss`**: Advanced hybrid loss integrating FFT magnitude, MSE, and Structural Similarity Index (SSIM).

This hybrid approach significantly improved anomaly detection sensitivity and reconstruction quality.

---

## 5. Instanciate the Model

Let's set up the model and print out the structure:

In [2]:
from codebase.models.vae.aitex_vae import AitexVAEv2, AitexVAE, AitexVAEv3, AitexVAEv8
from torch import optim

# Instantiate the model
model = AitexVAEv8(
    in_channels=1, latent_dim=128, dropout_p=0.1, use_attention=True)

optimizer = optim.Adam(model.parameters(), lr=1e-3) # , weight_decay=1e-6)

print(model)

AitexVAEv8(
  (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (relu1): ReLU(inplace=True)
  (drop1): Dropout2d(p=0.1, inplace=False)
  (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (relu2): ReLU(inplace=True)
  (drop2): Dropout2d(p=0.1, inplace=False)
  (conv3): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (relu3): ReLU(inplace=True)
  (drop3): Dropout2d(p=0.1, inplace=False)
  (conv4): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (relu4): ReLU(inplace=True)
  (drop4): Dropout2d(p=0.1, inplace=False)
  (fc_intermediate): Linear(in_features=65536, out_features=1024, bias=True)
  (bn_intermediate): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (fc_mu): Linear(in_features=1024, out_features=128, bias=True)
  (fc_logvar): Linear(in_features=1024, out_features=128, bias=True)
  (fc_dec): Linear(in_features=128, out_features=65536, bias=True)
  (se

---

## 6. 🔚 Conclusion & Outlook

With the robust VAE architecture finalized and powerful loss functions defined, our next steps include:
- **Training and tuning** the AitexVAE model on fabric patches
- **Evaluating reconstruction quality** and anomaly detection performance
- **Optimizing** hyperparameters like latent dimensions, KL divergence weighting, and dropout rates for maximum anomaly detection accuracy

<p style="font-size: 0.8em; text-align: center;">© 2025 Oliver Grau. Educational content for personal use only. See LICENSE.txt for full terms and conditions.</p>