# **Project: Anomaly Detection for AITEX Dataset**
#### Track: DR√ÜM
## `Notebook 1`: Transitioning to DR√ÜM for Visual Anomaly Detection
**Author**: Oliver Grau 

**Date**: 27.03.2025  
**Version**: 1.0

## üìö Table of Contents

- [1. Summary of Earlier Approaches](#1-summary-of-earlier-approaches)
- [2. Observations on the AITEX Dataset](#2-observations-on-the-aitex-dataset)
- [3. Enter DR√ÜM: A Different Paradigm](#3-enter-draem--a-different-paradigm)
- [4. What you should already know](#4-what-you-should-already-know)
- [5. Roadmap of Upcoming Notebooks](#5-roadmap-of-upcoming-notebooks)

---

## 1. Summary of Earlier Approaches

### 1. **Variational Autoencoder (VAE)**
- **Goal**: Learn to reconstruct normal patches and detect anomalies via high reconstruction error.
- **Key properties**:
  - Encoder compresses input into latent space.
  - Decoder reconstructs image from latent.
  - Anomalies detected using pixel-wise L1/MSE loss.

**üí° Challenges with VAE**:
- Reconstructions tend to be **blurry**.
- **Small/texture anomalies** are often **missed**.
- Struggles with **high-frequency detail** ‚Äî common in textiles.

---

### 2. **PatchCore (with ResNet, ConvNeXt, DINOv2, DenseNet and Custom FFT and ShallowCNN)**
- **Goal**: Compare patch-level CNN features from a test image to a memory bank of normal patches.
- **Pipeline**:
  - Extract intermediate CNN features (from ResNet or ViT).
  - Flatten spatial features into patch embeddings.
  - Store them in a memory bank (index).
  - Use **FAISS** to perform **nearest-neighbor search**.
  - Use the L2 distance to the closest normal patch as anomaly score.

**‚úÖ Pros**:
- Does not require training.
- Works well on texture-rich industrial datasets.

**‚ö† Limitations Observed**:
- **FAISS is heavy** even on GPU.
- Difficult to scale.
- Requires **a lot of memory**.
- **Recall maxed out around 37%** ‚Äî anomalies often missed.
- Results vary heavily depending on layer and backbone.
- Patch-based decision can miss **global coherence**.

---

## 2. Observations on the AITEX Dataset

- High-resolution textile images (4096√ó256), patched to 256√ó256.
- **Defects are small** and can be subtle (e.g., loose threads, weave faults).
- **Noisy background** from machinery or conveyor adds complexity.
- **Histogram equalization and various backbones** (ResNet50, ConvNeXt, DINOv2) had **limited effect**.
- Precision was sometimes good (e.g., 80%), but **recall remained low (~35%)** ‚Äî meaning most defects were missed.

---

## 3. Enter DR√ÜM: A Different Paradigm

> **DR√ÜM: Denoising Autoencoder with Realistic Anomalies**

Instead of:
- memorizing what normal looks like (VAE),
- or matching features to a memory bank (PatchCore),

üëâ **DR√ÜM learns to *reconstruct clean images from synthetically corrupted ones***.

### üîß Basic Concepts

#### 1. **Synthetic Anomalies**
- During training, DR√ÜM adds **fake anomalies** (noise blobs, cut-paste artifacts) to normal images.
- It learns to **reconstruct clean versions** and localize the anomalous regions.

#### 2. **Dual Architecture**
- **Reconstruction Network**: U-Net learns to clean corrupted input.
- **Discriminator Network**: Learns to predict **pixel-wise anomaly masks**.

#### 3. **Loss Functions**

The model is trained with two separate loss branches:

- **Reconstruction loss** ‚Äî encourages accurate reconstruction of the original (unmodified) input image.
- **Segmentation loss** ‚Äî trains the model to segment anomalies using synthetic ground truth masks (i.e., masks used during training to indicate the corrupted region).

---

### **Reconstruction Losses**

These are applied to the **autoencoder part** of the model, which learns to reconstruct the uncorrupted input from its corrupted version.

- `L1_loss`:  
  Measures the absolute difference between the reconstructed image and the original input. Promotes pixel-level similarity but ignores perceptual differences.

- `SSIM_loss`:  
  Structural Similarity Index. Encourages perceptual similarity by comparing local contrast, luminance, and structure. Helps produce sharper, more visually aligned reconstructions.

- `FFT_loss`:  
  Compares the log-magnitude of the FFT spectra of the original and reconstructed images. Useful for detecting brightness shifts and texture inconsistencies that may be invisible in pixel space.

‚úÖ **Combined Reconstruction Loss** (example):  
```python
recon_loss = 0.7 * MSE_SSIM_Loss + 0.3 * FFT_Loss
```
Where `MSE_SSIM_Loss = 0.8 * L1_loss + 0.2 * SSIM_loss`

This combined loss ensures that both fine details and global brightness/texture patterns are preserved in the reconstruction.

---

### **Segmentation Losses**

These are used to train the **anomaly segmentation head**, which learns to predict the anomaly mask based on residuals and deep features.

- `BCE_loss`:  
  Binary Cross-Entropy loss between the predicted anomaly map and the synthetic anomaly mask. Encourages per-pixel classification accuracy.

- `Focal_loss`:  
  Focuses learning on hard pixels (e.g., near the edges of the anomaly). Reduces the effect of background pixels, which dominate due to class imbalance.

‚úÖ **Combined Segmentation Loss**:  
```python
segmentation_loss = 0.5 * BCE_loss + 0.5 * Focal_loss
```

This hybrid loss improves the model's ability to localize anomalies precisely while handling the severe imbalance between background and anomaly pixels.

---

### üîó Summary Table

| Loss         | Used For        | Purpose                                       |
|--------------|-----------------|-----------------------------------------------|
| `L1_loss`    | Reconstruction  | Pixel-wise similarity                         |
| `SSIM_loss`  | Reconstruction  | Perceptual similarity (structure, contrast)   |
| `FFT_loss`   | Reconstruction  | Texture + brightness shift detection (global) |
| `BCE_loss`   | Segmentation    | Per-pixel anomaly classification              |
| `Focal_loss` | Segmentation    | Emphasizes hard-to-classify anomaly regions   |

---


### Intuition

By training on **augmented corruptions**, DR√ÜM learns **where** things are wrong in a robust way ‚Äî **without ever seeing real defects during training**.

---

## 4. What You Should Already Know

If you followed the VAE and PatchCore notebooks, you're already familiar with:

- PyTorch training pipelines
- Concepts like:
  - Patch embeddings
  - L2 distance as anomaly measure
  - Encoder/Decoder models
  - Evaluation: ROC AUC, precision, recall, F1

With DR√ÜM, we'll **build on that knowledge** and transition into **pixel-wise segmentation of anomalies**.

---

## 5. Roadmap of Upcoming Notebooks

| Notebook | Title | Description |
|----------|-------|-------------|
| **02_Data_Preparation** | Patch & Prepare AITEX | Create normal-only training set and test set with masks |
| **03_Synthetic_Anomaly_Generation** | Simulate Anomalies | Functions for perlin noise + cut-paste augmentation |
| **04_DRAEM_Model** | Build the DR√ÜM Model | Define U-Net and mask prediction network |
| **05_Train_DRAEM** | Train on Normal Patches | Train the model with synthetic masks |
| **06_Evaluate_DRAEM** | Run on Test Set | Get anomaly masks, scores, and metrics (ROC AUC, mass-center-distance, hit / miss rate) |

---

Let‚Äôs move from handcrafted descriptors and brute-force similarity ‚Äî to a **learned notion of abnormality**.

‚û°Ô∏è **On to Notebook 02!**

<p style="font-size: 0.8em; text-align: center;">¬© 2025 Oliver Grau. Educational content for personal use only. See LICENSE.txt for full terms and conditions.</p>