# **Project: Anomaly Detection for AITEX Dataset**
#### Track: PatchCore
## `Notebook 2`: Understanding PatchCore
**Author**: Oliver Grau 

**Date**: 27.03.2025  
**Version**: 1.0

## 📚 Table of Contents

- [1. Motivation: A New Perspective on Anomaly Detection](#1-motivation-a-new-perspective-on-anomaly-detection)
- [2. PatchCore vs. Autoencoder-based Detection](#2-patchcore-vs-autoencoder-based-detection)
- [3. The PatchCore Pipeline](#3-the-patchcore-pipeline)
- [4. Mathematical Formulation](#4-mathematical-formulation)
- [5. Key Concepts in Detail](#5-key-concepts-in-detail)
- [6. Strengths and Limitations](#6-strengths-and-limitations)

---

## 1. Motivation: A New Perspective on Anomaly Detection

Reconstruction-based anomaly detection models, such as Autoencoders or VAEs, attempt to learn a compressed representation of the input and detect anomalies via reconstruction error. However, they often fail to detect subtle or small anomalies and may generalize too well. PatchCore addresses this by entirely abandoning the need for reconstruction.

Instead of reconstructing input images, PatchCore detects anomalies by comparing localized image features with known normal features stored in a memory bank. The core idea is simple: "if something looks unlike anything we've seen during training, it's likely anomalous."

---

## 2. PatchCore vs. Autoencoder-based Detection

| 🔄 Aspect                      | 🤖 Autoencoder/VAE               | 🧠 PatchCore                              |
|-------------------------------|----------------------------------|-------------------------------------------|
| Learning type                 | Unsupervised, generative         | Unsupervised, discriminative              |
| Training phase                | Required (end-to-end)            | None (feature extractor is frozen)        |
| Representation                | Latent vector (z)                | Patch embeddings from pretrained CNN     |
| Anomaly signal                | High reconstruction error        | Large distance to normal patch features   |
| Assumption                    | Anomalies can't be reconstructed | Anomalies are different in feature space  |

PatchCore is more robust because it uses pretrained CNNs (e.g., ResNet) trained on large-scale datasets like ImageNet. These networks have learned rich, general features that capture shape, texture, and structure without being **tuned** to the target anomaly dataset.

---

## 3. The PatchCore Pipeline

1. **Feature Extraction**:
   - Use a pretrained CNN to extract intermediate feature maps from normal training images.

2. **Unfolding**:
   - Break the feature maps into patch vectors by flattening the spatial dimensions.

3. **CoreSet Sampling (optional)**:
   - Select a representative subset of patch features to build a memory bank.

4. **Inference**:
   - For each test image, extract patch features and compute their distances to the closest features in the memory bank.

5. **Scoring**:
   - The anomaly score for a patch is the distance to its nearest neighbor in the memory bank.
   - Image-level anomaly scores are computed by aggregating (e.g., max or mean) patch-level scores.

---

## 4. Mathematical Formulation

Let $ x \in \mathbb{R}^{H \times W} $ be an input image, and $ f(x) \in \mathbb{R}^{C \times H' \times W'} $ be the feature map extracted from a pretrained CNN.

### Unfold to Patches:

We reshape the feature map to obtain patch vectors:
$$
P = \text{Unfold}(f(x)) \in \mathbb{R}^{N \times C}, \quad \text{where } N = H' \cdot W'
$$

### Memory Bank:

Constructed from patch vectors $ P_1, P_2, ..., P_M $ extracted from M training images (only normal samples):
$$
\mathcal{M} = \bigcup_{i=1}^M P_i
$$

### Anomaly Score:

For each patch vector $ p \in P $, compute its nearest-neighbor distance:
$$
d(p) = \min_{m \in \mathcal{M}} \| p - m \|_2
$$

The image-level score can be:
$$
s(x) = \max_{p \in P} d(p) \quad \text{or} \quad s(x) = \frac{1}{N} \sum_{p \in P} d(p)
$$

---

## 5. Key Concepts in Detail

### Memory Bank
A memory bank $ \mathcal{M} $ stores the patch-level feature vectors extracted from normal training images. It serves as the reference distribution of normality.

### Feature Unfolding
Instead of summarizing an entire image into one embedding, we preserve spatial structure by extracting patch-wise embeddings from intermediate CNN layers.

### Distance Metric
PatchCore uses the Euclidean distance $ \| p - m \|_2 $ to determine similarity. Other distance metrics or approximate nearest neighbor (ANN) search can be used to scale to larger datasets.

---

## 6. Strengths and Limitations

### ✅ Strengths:
- No training required (only feature extraction)
- Strong generalization to unseen anomalies
- Local, patch-wise scoring enables fine-grained anomaly localization
- Works well even with few training images

### ⚠️ Limitations:
- Memory Bank can become large without CoreSet sampling
- No end-to-end learning or fine-tuning possible
- Patch-wise independence may lead to false positives in noisy textures

PatchCore trades off learnable complexity for robust generalization. For many industrial tasks, this is exactly the right balance.


---

<div style="border-left: 4px solid #007acc; padding: 0.8em; background-color: #f0f8ff; margin-bottom: 1em;">
  <strong>💡 Reader Question:</strong> <br><br>In case you are confused what the Patch in <b>PatchCore</b> means and how it differentiate from the patches we created from the AITEX dataset - here is a clarification of both domain terms: Patch AITEX and Patch from <b>PatchCore</b>.
</div>

---

# What Does “Patch” Mean in PatchCore?

The name **PatchCore** stems from the core concept of operating on **small spatial regions (patches)** extracted from full-sized feature maps.

## In PatchCore:

- A “patch” refers to a **small feature vector** extracted from a **feature map** produced by a pretrained backbone (e.g., ResNet34 or DenseNet).
- These feature maps come from an input image of shape `[B, C, H, W]` and typically have **downsampled spatial resolution** (e.g., 32×32).
- PatchCore extracts all **spatial positions** from this feature map:
  
  **Example:**
  ```
  Feature map shape: [B, C, H=32, W=32] → 1024 patches per image
  Each patch: vector of C dimensions (e.g., 192 or 448)
  ```

- These extracted patches become the **units of memory** in a FAISS (this is explained in later notebooks) index. During inference, new test patches are compared against the stored normal patches to detect anomalies based on distance.

> So in PatchCore, a *patch* is not a cut-out piece of the image, but a **positionally extracted feature embedding** from a CNN.

---

# How Does This Differ from Your AITEX Patches?

In your AITEX data preparation:

- You pre-split each **4096×256** fabric strip into **explicit, fixed-size image patches**, typically **256×256 pixels**, using a sliding window or tiling.
- These are true **image-level crops**, saved to disk or loaded into memory.
- They are then individually passed through a model or fed into PatchCore’s feature extractor.

## ⚠️ Important Distinction:

| Aspect                   | AITEX Patches                         | PatchCore Patches                     |
|--------------------------|---------------------------------------|----------------------------------------|
| Type                     | Image crops (e.g., 256×256)           | Feature vectors from CNN outputs       |
| Purpose                  | Preprocessing granularity             | Anomaly detection granularity          |
| Memory Bank Content      | N/A (full image)                      | CNN patch embeddings                   |
| Unit of Similarity Check | Whole patch vs. anomaly-free sample  | Patch-level vector vs. memory vectors  |

---

# 🤔 Why Is It Called “PatchCore”?

The name comes from:
- **Patch-level representation**: operating on small parts of an image (via CNN feature maps)
- **Core-set sampling**: a selection algorithm used to reduce the size of the memory bank while retaining coverage

So PatchCore = Patch-wise Comparison + Core-set Sampling

<p style="font-size: 0.8em; text-align: center;">© 2025 Oliver Grau. Educational content for personal use only. See LICENSE.txt for full terms and conditions.</p>