# **Project: Anomaly Detection for AITEX Dataset**
#### Track: VAE
## `Notebook 1`: Anomaly Detection with Autoencoders: Introduction & Motivation
**Author**: Oliver Grau 

**Date**: 27.03.2025  
**Version**: 1.0

## Why Anomaly Detection Matters Across Enterprises

In today’s data-driven organizations, detecting anomalies is a critical part of ensuring **operational resilience, security, and quality control**. Whether it’s predictive maintenance, fraud prevention, network security, or process monitoring. Enterprises rely on the ability to identify **subtle deviations from normal behavior** that may indicate underlying issues or risks.

While rule-based systems or classical thresholding may struggle with complex, high-dimensional data, machine learning methods - particularly **reconstruction-based models like Autoencoders** - provide a scalable, data-adaptive alternative.

This notebook series focuses on how Autoencoders can be used for **unsupervised anomaly detection**, making them well-suited for settings where labeled anomalies are rare, expensive, or unavailable.

---

## Target Audience

This notebook is designed for **data scientists, ML engineers, and AI practitioners** who already understand the fundamentals of neural networks and machine learning. The goal is to provide a **practical, implementation-focused guide** for applying Autoencoders to anomaly detection scenarios in real-world enterprise environments.

Rather than repeating introductory theory, we focus on design decisions, best practices, and common pitfalls and all supported by real data and clear visualizations. But of course some mathematical theory is always beneficial, so feel free to consult `09_Bonus - Concepts and Math.ipynb` for a  mathematical foundation to autoencoders.

---

## Common Enterprise Use Cases for Anomaly Detection

- **Predictive Maintenance in Industrial Systems**: Detect mechanical faults before failure by analyzing multivariate sensor data.
- **Fraud & Abuse Detection**: Identify outlier patterns in financial transactions, platform usage, or authentication logs.
- **IT Infrastructure & Security**: Detect unusual system behavior, network intrusions, or performance bottlenecks.
- **Quality Assurance & Process Monitoring**: Spot visual or sensor anomalies in production pipelines, especially in manufacturing and healthcare.

---

## Why Autoencoders?

Autoencoders are a natural fit for anomaly detection because they are trained to **reconstruct normal data**. When faced with anomalous input, they typically fail to reconstruct it well resulting in a **higher reconstruction error**, which we can use as an anomaly signal.

They offer several advantages:

- They work well in **unsupervised settings** (no need for labeled anomalies)
- They are adaptable to **tabular, image, time-series, or mixed data**
- They are **interpretable and modular**, making them easy to integrate and extend in production environments

---

## What You’ll Build in This first part of the Notebook series

By following along, you will:

- Implement a PyTorch-based Autoencoder tailored for anomaly detection
- Train the model on a real or representative dataset
- Evaluate anomalies using reconstruction error
- Explore thresholds, visualizations, and result interpretation

This forms the baseline for more advanced extensions, including cloud deployment, real-time inference, or integration into enterprise systems.

---

# 📚 Table of Contents

## 1. Introduction & Motivation
- Why anomaly detection matters for enterprises
- Real-world use cases (predictive maintenance, fraud, IT, QA)
- From examples to abstraction: what is an anomaly?

## 2. Understanding Autoencoders
- What is an Autoencoder? (Intuitive explanation)
- Key components: Encoder, Decoder, Bottleneck
- Why Autoencoders are useful for anomaly detection
- Visual example with toy data and code

## 3. Input Features: What Kind of Data Works?
- Tabular vs. time-series vs. image data
- How to structure and normalize input features
- Why reconstruction-based models need stable normal data
- Examples from different domains (sensor logs, metrics, images)

## 4. Dataset: AITEX Overview
- Background on the dataset (real-world relevance)
- Structure and features of the data
- Why this dataset is suitable for our first model
- Optional: link to enterprise-relevant analogies (e.g., sensor monitoring, transactions)

## 5. Data Preparation & Exploration
- Loading and inspecting the dataset
- Exploratory data analysis (EDA)
  - Missing values
  - Time-series structure
  - Visualization of normal vs. anomalous samples
- Preprocessing steps (e.g., normalization, train/test split)

## 6. Building the Autoencoder
- PyTorch model definition (step-by-step)
- Choosing architecture & activation functions
- Loss function (MSE) and optimizer setup
- Explanation of design decisions in simple language

## 7. Training the Model
- Training loop with progress visualization
- Evaluation of reconstruction loss on training data
- Optional: early stopping or validation logic

## 8. Why the VAE struggles with AITEX Anomaly Detection
- How to interpret reconstruction error
- Threshold selection (manual vs. statistical)
- Visualizing anomalies vs. normal data
- Precision/Recall (light introduction, intuitive)

---

<div style="border-left: 4px solid #007acc; padding: 0.8em; background-color: #f0f8ff; margin-bottom: 1em;">
  <strong>💡 Reader Question:</strong> <br><br>
  We have <b>labels</b> (binary masks) in our AITEX dataset, so we could use <b>supervised</b> classification + segmentation methods instead of <b>unsupervised learning</b>?
</div>

# When to Use Supervised Classification + Segmentation

This approach works *really well* when:
- You **have labeled data**: ground truth labels for each patch ("defect" or "no defect"), and ideally, segmentation masks.
- Your **defect types are known** and represented in the training set.
- You're okay with training a **specific model per dataset** or defect type.

That’s why the classification + segmentation pipeline worked so well. The AITEX dataset has clearly labeled defective and non-defective patches, and labeled segmentation masks.

But...


## ❓**What If Labels Are Scarce or Unavailable?**

That’s where unsupervised anomaly detection methods shine:

### 1. **Autoencoders (VAEs, CNN-AEs)**
- Train **only on normal images** (i.e., no defects).
- Learn to reconstruct these normal inputs.
- At test time, any **high reconstruction error** → likely anomaly.
- Good for **unexpected defects** or when **no defect masks are available**.

### 2. **PatchCore**
- Self-supervised: builds a **memory bank of normal patch features**.
- At test time, compares test patches to the memory bank (using k-NN in feature space).
- Doesn’t need defect annotations.
- Excels at **catching subtle deviations** from normal texture, especially in regular patterns (like fabric, PCB, etc.).

### 3. **DRÆM (Discriminative AE for Anomaly Detection)**
- Trains with **synthetic anomalies** (noise masks or inpainted regions).
- Learns to distinguish between "real" images and "tampered" ones.
- Outputs **pixel-level anomaly maps**, again using **only normal data**.
- Very strong for **pixel-wise localization** without labeled defects.

---

## 🎯 Summary Table

| Approach                         | Needs Labels? | Detect Unknown Defects? | Pixel-wise Output? | Data Efficiency | When to Use |
|----------------------------------|---------------|--------------------------|---------------------|------------------|--------------|
| **Classification + Segmentation** | ✅ Yes         | ❌ No                   | ✅ Yes              | ❌ Needs many labeled images | You have good, labeled datasets (e.g., AITEX) |
| **Autoencoder (VAE)**             | ❌ No          | ✅ Yes                  | ❌ Optional         | ✅ Very data-efficient       | Defects are unknown, labels are scarce |
| **PatchCore**                     | ❌ No          | ✅ Yes                  | ✅ Yes              | ✅ Very data-efficient       | High regularity, unknown anomalies |
| **DRÆM**                          | ❌ No          | ✅ Yes                  | ✅ Yes              | ✅ Very data-efficient       | You want localization, but no labels |

---

## So, Why Use (V)AE / PatchCore / DRÆM?

Because in **real industrial scenarios**, the ideal labeled dataset is **rare**:
- No one has time to label every single defect pixel.
- You often care about **novel/unseen defect types**.
- You want a model that can say: _“This doesn’t look like normal fabric at all”_ — even if it’s never seen that type of defect.

**Unsupervised = robustness + label freedom + generalization.**

Want a metaphor?  

**Supervised classification + segmentation** is like a doctor trained to recognize known diseases from textbook images.  
**Autoencoders, PatchCore and DRAEM** are like a doctor trained to understand "what healthy looks like" and spot **anything** that's off.

<p style="font-size: 0.8em; text-align: center;">© 2025 Oliver Grau. Educational content for personal use only. See LICENSE.txt for full terms and conditions.</p>