# **Project: Anomaly Detection for AITEX Dataset**
#### Track: Introduction
## `Notebook 1`: Starting the Journey: Image-Based Anomaly Detection
**Author**: Oliver Grau 

**Date**: 27.03.2025  
**Version**: 1.0

# 📚 Table of Contents

- [Starting the Journey: Image-Based Anomaly Detection](#starting-the-journey-image-based-anomaly-detection)
  - [Introduction to the Project](#introduction-to-the-project)
    - [Why AITEX?](#why-aitex)
    - [📂 Dataset: AITEX Fabric Dataset (AFID)](#-dataset-aitex-fabric-dataset-afid)
      - [Download the Dataset](#-download-the-dataset)
      - [Dataset Preparation](#-dataset-preparation)
  - [📖 Structure of the Project](#-structure-of-the-project)
- [🎓 Prerequisites](#-prerequisites)

- [🚀 Getting Started: Setting up Your Development Environment](#-getting-started-setting-up-your-development-environment)
  - [Step 1: Extract the Project Files](#-step-1-extract-the-project-files)
  - [Step 2: Recommended Extensions for Visual Studio Code](#-step-2-recommended-extensions-for-visual-studio-code)
  - [Step 3: Python Environment Setup (with GPU Support)](#-step-3-python-environment-setup-with-gpu-support)
  - [Project Directory Structure](#-project-directory-structure)

- [How to Work with These Notebooks](#-how-to-work-with-these-notebooks)
  - [📓 Opening the First Notebook](#-start-here-opening-the-first-notebook)
  - [✅ Final Checks](#-final-checks)

Welcome to **"Image-Based Anomaly Detection – A Notebook Premium Series"**, an extensive and interactive learning journey designed to equip you with advanced skills and practical insights into cutting-edge anomaly detection methods.

## Introduction to the Project

This notebook series focuses on tackling the challenging yet fascinating task of **anomaly detection** using the real-world **AITEX Fabric Defect Detection Dataset**. Anomaly detection is crucial across industries, from quality assurance in manufacturing to safety monitoring and medical imaging.

### Why AITEX?

Instead of using simplified benchmark datasets, we chose **AITEX** for several important reasons:

- It contains **tiny, localized anomalies**, such as faint line disruptions or small weaving inconsistencies, which pose a real challenge for modern models.
- The dataset is made up of **highly regular, repeating textures** typical of fabric surfaces, which makes false positives more likely if the model doesn't truly understand the texture.
- AITEX reflects a **real-world industrial use case**, demanding both precision and reliability.

By working with AITEX, you're not only learning techniques but also preparing for scenarios that mirror actual applications in manufacturing and quality control.

### 📂 Dataset: AITEX Fabric Dataset (AFID)

This notebook series is designed to work with the **AITEX Fabric Dataset (AFID)**, which contains high-resolution fabric images with and without anomalies. These images are ideal for training and evaluating anomaly detection models.

#### 🔗 Download the Dataset

The AITEX dataset is **not included** in this package due to potential licensing restrictions.  
To use it, please download the dataset manually from the official AITEX project page:

👉 [https://www.aitex.es/afid/](https://www.aitex.es/afid/)

> ⚠️ **Important Licensing Note:**  
> According to the information available on [Kaggle](https://www.kaggle.com/datasets/veeranjaniraju/fabric-anomaly-detection), the AITEX dataset may be subject to a **non-commercial use license**. The official AITEX website does **not explicitly specify** the terms of use.  
>  
> Therefore, this notebook series uses the dataset **strictly for educational and research purposes**.  
> **Please consult the AITEX source directly** to confirm that your intended use complies with any licensing restrictions before applying this material in commercial settings.

---

#### 🧰 Dataset Preparation

Once you have downloaded the dataset, follow the instructions in the `notebooks/00_preparation/Dataset Setup.ipynb` notebook to:

- Extract the files to the `data/` folder
- Organize the subfolders as expected (e.g., `Defect`, `NoDefect`, `Mask`)

## 📖 Structure of the Project

Here's a quick overview of what to expect from each branch:

- **Branch 01: VAE**  
  You'll explore how Variational Autoencoders learn representations of normal images, enabling them to highlight anomalies through reconstruction errors.

- **Branch 02: PatchCore**  
  Discover an efficient, memory-based anomaly detection method leveraging pretrained CNN feature embeddings and fast nearest-neighbor search.

- **Branch 03: DRÆM**  
  Deep-dive into a modern, reconstruction-based method designed explicitly for visual anomaly detection, known for its sensitivity and precision.

- **Branch 04: Operationalization**  
  Learn how to deploy and integrate your trained anomaly detection model into a practical, real-world scenario.

Each branch is structured to help you understand both the strengths and limitations of these methods, providing clarity about the optimal scenarios for their application.

--- 

# 🎓 Prerequisites

Before diving into this series, ensure you are comfortable with:

- Basic Python programming
- Understanding of neural network fundamentals
- Familiarity with PyTorch
- CNN architectures and deep learning concepts

Having these prerequisites will significantly enhance your learning experience.

---

# 🚀 Getting Started: Setting up Your Development Environment

Before diving into the content, let’s set up your local development environment so you can fully benefit from all features including model training, evaluation, and deployment.

This project is designed to work on **Linux-based systems**, including **WSL2 (Windows Subsystem for Linux)** with **Ubuntu**. While any Python-capable IDE can be used, I recommend **Visual Studio Code (VS Code)** for a good developer experience.

---

## 📦 Step 1: Extract the Project Files

After purchasing this series, you received a `.zip` file containing the complete project. To get started:

```bash
# Example: extract to your WSL home directory or any other linux compatible based enironment
cd ~
unzip /mnt/c/Users/<YourUsername>/Downloads/AnomalyDetectionSeries.zip -d anomaly-detection
cd anomaly-detection
```

Replace `/mnt/c/...` with the correct path to where your ZIP file is stored on your system.

---

## 🧠 Step 2: Recommended Extensions for Visual Studio Code

Make sure the following **VS Code extensions** are installed:

| Extension Name         | Description                              |
|------------------------|------------------------------------------|
| **Python**             | Core Python language support             |
| **Pylance**            | Fast IntelliSense and type checking      |
| **Jupyter**            | Run and debug Jupyter Notebooks          |
| **Jupyter Keymap**     | Keyboard shortcuts for notebook use      |
| **Jupyter Notebook Renderers** | Improved notebook visuals         |
| **Python Debugger**    | Step-through debugging for scripts       |

To install, open the Extensions panel in VS Code (`Ctrl+Shift+X`), search for the names above, and install them.

---

## 🧪 Step 3: Python Environment Setup (with GPU Support)

For model training and experimentation, a Python environment with **PyTorch + CUDA** support is required.

You can use **Conda** or your preferred environment manager. Here's an example using `conda`:

```bash
conda create -n anomaly-detection python=3.10
conda activate anomaly-detection

# Install PyTorch with CUDA (replace 'cu118' with your CUDA version)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install additional dependencies
pip install -r requirements.txt
```

💡 **Tip:** A `requirements.txt` is included with all dependencies used across the notebook series.

---

## 📁 Project Directory Structure

Here's an overview of the top-level project folders:

| Folder         | Purpose |
|----------------|---------|
| `.github/`     | GitHub Actions for CI/CD, model testing and linting |
| `codebase/`    | Scripts for model research and development |
| `artifacts/`   | Serialized intermediate results (e.g., prepared image data, logs) |
| `data/`        | Datasets (raw, processed, or cached versions) |
| `bonus/`        | Bonus notebooks with more in depth knowledge |
| `inference/`   | Scripts for loading models and running inference (the inference pipeline) |
| `notebooks/`   | All Jupyter notebooks, including this series |


---


# 🧭 How to Work with These Notebooks

This notebook series is designed as a **learning journey**, not a quick tutorial or shortcut. You're not expected to grasp everything on the first read and you shouldn't try to. **Learning (machine learning) is an iterative process**, and building real understanding takes time, experimentation, and curiosity.

Here are a few thoughts to guide you:

- **Take your time**: This is not meant to be completed in a few days. It's completely fine if it takes weeks. In fact, that's expected.
- **Read actively**: Think critically about the content and the code. Don’t just run the cells. Try to understand what’s happening.
- **Modify and experiment**: Play with parameters, test your assumptions, and see what breaks. Learning happens in the doing.
- **Pause and clarify**: If something isn’t clear, look it up before moving on. You're not supposed to know everything yet.
- **Revisit things**: It’s normal for concepts to click only after a second or third encounter.

> ⚠️ One important note: the notebooks focus on using and analyzing machine learning workflows, **not on documenting every line of code development.**

To keep the notebooks focused and readable:
- The code used for **model development** can be found in the `codebase/` folder  
- The code for the **inference pipeline** is located in `inference/codebase/`  
- It’s **not explicitly mentioned in every cell where the code comes from** — please use your IDE’s tools (e.g., symbol navigation, file search) to explore and understand the underlying implementation.

**Understanding the code is critical.** It’s not enough to run it. Your goal is to learn how and why it works. This notebook series is here to support that process, but ultimately, the insights come from your own curiosity and persistence.

For me as the author, creating this series was a **major learning journey**. I wrote, tested, failed, rewrote, and reflected and I hope the results help you build something solid for yourself, too.

<div style="border-left: 4px solid #28a745; padding: 0.8em; background-color: #eafaf1; margin-bottom: 1em;">
  <strong>💡 Hint:</strong> <br><br>
  I was even at a point (several times) where I thought I was giving up, I was doing something fundamentally wrong. But patience and perseverance led to success in the end. Don't give up!
</div>

---

## 📓 Opening the First Notebook

Once everything is set up, open the following notebook in VS Code:

`notebooks/01_VAE/01 Introduction and Motivation.ipynb`

This is your entry point into the anomaly detection series. It walks you through the key concepts, data, and modeling pipeline step by step.

---

## ✅ Final Checks

Before continuing:

- [ ] VS Code configured with the listed extensions
- [ ] Conda or environment with PyTorch & CUDA set up
- [ ] Folder structure is intact after extraction
- [ ] Able to open and run the `notebooks/01_VAE/01 Introduction and Motivation.ipynb` notebook

You're ready to go! 🚀

<p style="font-size: 0.8em; text-align: center;">© 2025 Oliver Grau. Educational content for personal use only. See LICENSE.txt for full terms and conditions.</p>