<center><img src="https://raw.githubusercontent.com/openvinotoolkit/anomalib/main/docs/source/images/logos/anomalib-wide-blue.png" alt="Paris" class="center"></center>

<center>💙 A library for benchmarking, developing and deploying deep learning anomaly detection algorithms</center>

______________________________________________________________________

> NOTE:
> This notebook is originally created by @innat on [Kaggle](https://www.kaggle.com/code/ipythonx/mvtec-ad-anomaly-detection-with-anomalib-library/notebook).

[Anomalib](https://github.com/openvinotoolkit/anomalib): Anomalib is a deep learning library that aims to collect state-of-the-art anomaly detection algorithms for benchmarking on both public and private datasets. Anomalib provides several ready-to-use implementations of anomaly detection algorithms described in the recent literature, as well as a set of tools that facilitate the development and implementation of custom models. The library has a strong focus on image-based anomaly detection, where the goal of the algorithm is to identify anomalous images, or anomalous pixel regions within images in a dataset.

The library supports [`MVTec AD`](https://www.mvtec.com/company/research/datasets/mvtec-ad) (CC BY-NC-SA 4.0) and [`BeanTech`](https://paperswithcode.com/dataset/btad) (CC-BY-SA) for **benchmarking** and `folder` for custom dataset **training/inference**. In this notebook, we will explore `anomalib` training a PADIM model on the `MVTec AD` bottle dataset and evaluating the model's performance. The sections in this notebook explores the steps in `tools/train.py` more in detail. Those who would like to reproduce the results via CLI could use `python tools/train.py --model padim`.

## Installing Anomalib

Installation can be done in two ways: (i) install via PyPI, or (ii) installing from sourc, both of which are shown below:

### I. Install via PyPI

In [None]:
!pip install --upgrade pip

In [None]:
# Option - I: Uncomment the next line if you want to install via pip.
%pip install anomalib

### Install Picsellia package

In [45]:
!pip install -r requirements/base.txt
!pip install -r requirements/openvino.txt
!pip install picsellia


/Users/pierre-nicolastiffreau/picsellia-anomalib

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Collecting networkx~=2.5
  Using cached networkx-2.8.8-py3-none-any.whl (2.0 MB)
Collecting nncf>=2.1.0
  Downloading nncf-2.4.0-py3-none-any.whl (904 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m904.5/904.5 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting onnx>=1.10.1
  Downloading onnx-1.14.0-cp39-cp39-macosx_10_12_universal2.whl (15.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.2/15.2 MB[0m [31m872.6 kB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting openvino-dev>=2022.3.0
  Using cached openvino_dev-2022.3.0-9052-py3-none-any.whl (5.8 MB)
Collecting numpy<1.24,>=1.19.1
  Downloadi

 Now let's verify the working directory. This is to access the datasets and configs when the notebook is run from different platforms such as local or Google Colab.

In [40]:
from __future__ import annotations

import os
from pathlib import Path
from typing import Any

# from git.repo import Repo

current_directory = Path.cwd()
if current_directory.name == "000_getting_started":
    # On the assumption that, the notebook is located in
    #   ~/anomalib/notebooks/000_getting_started/
    root_directory = current_directory.parent.parent
elif current_directory.name == "anomalib":
    # This means that the notebook is run from the main anomalib directory.
    root_directory = current_directory
# else:
#     # Otherwise, we'll need to clone the anomalib repo to the `current_directory`
#     repo = Repo.clone_from(url="https://github.com/openvinotoolkit/anomalib.git", to_path=current_directory)
#     root_directory = current_directory / "anomalib"

os.chdir(root_directory)

## Imports

In [None]:
import yaml
import numpy as np
from PIL import Image
from pytorch_lightning import Trainer
from torchvision.transforms import ToPILImage

from anomalib.config import get_configurable_parameters
from anomalib.data import get_datamodule
from anomalib.models import get_model
from anomalib.pre_processing.transforms import Denormalize
from anomalib.utils.callbacks import LoadModelCallback, get_callbacks
from picsellia import Client
from picsellia.types.enums import LogType
from picsellia.exceptions import ResourceNotFoundError

### Connect to Picsellia and get experiment


In [None]:
api_token = ""
client = Client(api_token=api_token, organization_name="")
experiment_id = ""
experiment = client.get_experiment_by_id(experiment_id)

## Configuration


In [4]:
MODEL = "patchcore"

### Get the dataset from Picsellia

In [None]:
for dataset_type in ['good', 'abnormal', 'mask']:
    assets = experiment.get_dataset(dataset_type)
    assets.download(os.path.join(root_directory, experiment.png_dir, dataset_type))

### Write to Config File

In [6]:
try:
    config = experiment.get_artifact('config')
    config.download(experiment.config_dir)
    config_fullpath = os.path.join(experiment.config_dir, config.filename)
except ResourceNotFoundError:
    config_fullpath = os.path.join(root_directory, "src", "anomalib", "models", "patchcore", "config.yaml")
config_data = yaml.safe_load(open(config_fullpath, 'r'))

In [7]:
# get experiment's parameters
try:
    parameters = experiment.get_log(name='parameters').data
except ResourceNotFoundError:
    parameters = {}
batch_size = parameters.get("batch_size", 4)
max_epochs = parameters.get("max_epochs", 5)

In [8]:
dataset_name = experiment.get_dataset(name='good').name
config_data['dataset']['name'] = dataset_name
config_data['dataset']['format'] = 'folder'
config_data['dataset']['root'] = os.path.join(root_directory, experiment.png_dir)
config_data['dataset']['path'] = os.path.join(root_directory, experiment.png_dir)
config_data['dataset']['normal_dir'] = 'good'
config_data['dataset']['normal_test_dir'] = None

config_data['dataset']['abnormal_dir'] = 'abnormal'
config_data['dataset']['mask_dir'] = 'mask'
config_data['dataset']['task'] = 'segmentation'
config_data['dataset']['train_batch_size'] = batch_size
config_data['dataset']['extensions'] = None
config_data['trainer']['max_epochs'] = max_epochs

config_data['project']['path'] = os.path.join(root_directory, experiment.results_dir)

config_data['model']['name'] = MODEL


In [9]:
with open(config_fullpath, 'w') as cfg:
    cfg.write( yaml.dump(config_data, default_flow_style=False))

In [None]:
# pass the config file to model, callbacks and datamodule
config = get_configurable_parameters(config_path=config_fullpath)

In [36]:
datamodule = get_datamodule(config)
datamodule.prepare_data()  # Downloads the dataset if it's not in the specified `root` directory
datamodule.setup()  # Create train/val/test/prediction sets.

i, data = next(enumerate(datamodule.val_dataloader()))

Let's check the shapes of the input images and masks.

In [None]:
print(data["image"].shape, data["mask"].shape)

We could now visualize a normal and abnormal sample from the validation set.

In [None]:
def show_image_and_mask(sample: dict[str, Any], index: int) -> Image:
    img = ToPILImage()(Denormalize()(sample["image"][index].clone()))
    msk = ToPILImage()(sample["mask"][index]).convert("RGB")

    return Image.fromarray(np.hstack((np.array(img), np.array(msk))))


# Visualize an image with a mask
show_image_and_mask(data, index=0)

## Prepare Model and Callbacks

Now, the config file is updated as we want. We can now start model training with it. Here we will be using `datamodule`, `model` and `callbacks` to train the model. Callbacks are self-contained objects, which contains non-essential logic. This way we could inject as many callbacks as possible such as ModelLoading, Timer, Metrics, Normalization and Visualization


In [None]:
# Set the export-mode to OpenVINO to create the OpenVINO IR model.
config.optimization.export_mode = "openvino"
# config.optimization.export_mode = "onnx"

try:
    checkpoint_file = experiment.get_artifact('checkpoints')
    loaded_checkpoint_path = os.path.join(root_directory, experiment.checkpoint_dir)
    checkpoint_file.download(loaded_checkpoint_path)
except ResourceNotFoundError as e:
    loaded_checkpoint_path = None

model = get_model(config)
callbacks = get_callbacks(config)

In [47]:
from pytorch_lightning import Callback

class SaveTrainingMetrics(Callback):

    def on_train_epoch_end(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> None:
        experiment.log(name='training_pixel_F1score', type=LogType.LINE,data= float(trainer.callback_metrics['pixel_F1Score']))
        experiment.log(name='training_pixel_AUROC', type=LogType.LINE,data= float(trainer.callback_metrics['pixel_AUROC']))
        experiment.log(name='training_image_F1Score', type=LogType.LINE,data= float(trainer.callback_metrics['image_F1Score']))
        experiment.log(name='training_image_AUROC', type=LogType.LINE,data= float(trainer.callback_metrics['image_AUROC']))

In [48]:
callbacks.insert(len(callbacks),SaveTrainingMetrics())

## Initialize Training

In [49]:
trainer = Trainer(**config.trainer, callbacks=callbacks)

In [50]:
if loaded_checkpoint_path:
    load_model_callback = LoadModelCallback(weights_path=os.path.join(loaded_checkpoint_path, checkpoint_file.filename))
    trainer.callbacks.insert(0, load_model_callback)

In [None]:
trainer.fit(model=model, datamodule=datamodule)

## Validation

In [None]:
# load best model from checkpoint before evaluating
load_model_callback = LoadModelCallback(weights_path=trainer.checkpoint_callback.best_model_path)
# load_model_callback = LoadModelCallback(weights_path=os.path.join(loaded_checkpoint_path, checkpoint_file.filename))
trainer.callbacks.insert(0, load_model_callback)
test_results = trainer.test(model=model, datamodule=datamodule)

### Log test results to Picsellia

In [None]:
experiment.log(name="test_results", type=LogType.TABLE, data=test_results[0])

In [None]:
# log example predictions
test_image_path = os.path.join(root_directory,experiment.results_dir,MODEL,dataset_name,'run/images/abnormal')
for ele in os.listdir(test_image_path)[:3]:
    experiment.log(name='abnormal prediction_'+ele, type=LogType.IMAGE, data=os.path.join(test_image_path,ele))

### Log best model's checkpoints to Picsellia

In [None]:
experiment.store('checkpoints', trainer.checkpoint_callback.best_model_path)

### Log openvino and onnx models

In [None]:
model_root = os.path.join(root_directory, experiment.results_dir, MODEL, dataset_name,'run/weights/openvino/')
experiment.store('openvino_metadata', os.path.join(model_root,'metadata.json'))
experiment.store('openvino_bin', os.path.join(model_root,'metadata.json'))
experiment.store('model_onnx', os.path.join(model_root,'model.onnx'))