# Short walkthrough on Benchmarking in Anomalib

In [None]:
import pandas as pd

## Install Anomalib

In [None]:
!git clone https://github.com/openvinotoolkit/anomalib.git

In [None]:
% cd anomalib

In [None]:
% pip install .

In [None]:
! pip install -r requirements/openvino.txt

> Note: Restart Runtime if promted by clicking the button at the end of the install logs

## Download and setup dataset

In [None]:
!wget https://openvinotoolkit.github.io/anomalib/_downloads/3f2af1d7748194b18c2177a34c03a2c4/hazelnut_toy.zip

In [None]:
% cd /content/anomalib/

In [None]:
!mkdir datasets && unzip hazelnut_toy.zip -d datasets/ > /dev/null

## Create configuration file for training using Folder Dataset

The following configuration file is based on the one at `anomalib/models/padim/config.yaml`. The configuration file at that location uses the MVTec dataset for training. Since we are working with a custom dataset, we will use the `Folder` datset format. In this format, the images are divided among folders such as _good_, and _colour_. Optionally, it can also contain a _mask_ folder as shown below.

```bash
hazelnut_toy
├── colour
│  ├── 00.jpg
│  ├── 01.jpg
│  ...
├── good
│  ├── 00.jpg
│  ├── 01.jpg
└── mask
   ├── 00.jpg
   ├── 01.jpg
   ...
```

Each of these folders contain images belonging to their respective category. Since we are using the `hazelnut_toy` dataset, we need to change a few lines in the PaDiM configuration as shown below.

```yaml
dataset:
  name: <name-of-the-dataset>
  format: folder
  path: <path/to/folder/dataset>
  normal_dir: normal # name of the folder containing normal images.
  abnormal_dir: abnormal # name of the folder containing abnormal images.
  normal_test_dir: null # name of the folder containing normal test images.
  task: segmentation # classification or segmentation
  mask: <path/to/mask/annotations> #optional
  extensions: null
  split_ratio: 0.2  # ratio of the normal images that will be used to create a test split
```

The complete configuration is in the codeblock below.

In [1]:
folder_padim = """
dataset:
  name: hazelnut
  format: folder
  path: /content/anomalib/datasets/hazelnut_toy
  normal_dir: good # name of the folder containing normal images.
  abnormal_dir: colour # name of the folder containing abnormal images.
  normal_test_dir: null # name of the folder containing normal test images.
  task: segmentation # classification or segmentation
  mask: /content/anomalib/datasets/hazelnut_toy/mask/colour # optional
  extensions: null
  split_ratio: 0.2  # ratio of the normal images that will be used to create a test split
  image_size: 256
  train_batch_size: 32
  test_batch_size: 32
  num_workers: 8
  transform_config:
    train: null
    val: null
  create_validation_set: false
  tiling:
    apply: false
    tile_size: null
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16

model:
  name: padim
  backbone: resnet18
  layers:
    - layer1
    - layer2
    - layer3
  normalization_method: min_max # options: [none, min_max, cdf]

metrics:
  image:
    - F1Score
    - AUROC
  pixel:
    - F1Score
    - AUROC
  threshold:
    image_default: 3
    pixel_default: 3
    adaptive: true

project:
  seed: 42
  path: ./results

logging:
  log_images_to: ["local"] # options: [wandb, tensorboard, local].
  logger: [] # options: [tensorboard, wandb, csv] or combinations.
  log_graph: false # Logs the model graph to respective logger.

optimization:
  openvino:
    apply: false

# PL Trainer Args. Don't add extra parameter here.
trainer:
  accelerator: auto # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
  accumulate_grad_batches: 1
  amp_backend: native
  auto_lr_find: false
  auto_scale_batch_size: false
  auto_select_gpus: false
  benchmark: false
  check_val_every_n_epoch: 1 # Don't validate before extracting features.
  default_root_dir: null
  detect_anomaly: false
  deterministic: false
  devices: 1
  enable_checkpointing: true
  enable_model_summary: true
  enable_progress_bar: true
  fast_dev_run: false
  gpus: null # Set automatically
  gradient_clip_val: 0
  ipus: null
  limit_predict_batches: 1.0
  limit_test_batches: 1.0
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  log_every_n_steps: 50
  max_epochs: 1
  max_steps: -1
  max_time: null
  min_epochs: null
  min_steps: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle
  num_nodes: 1
  num_processes: null
  num_sanity_val_steps: 0
  overfit_batches: 0.0
  plugins: null
  precision: 32
  profiler: null
  reload_dataloaders_every_n_epochs: 0
  replace_sampler_ddp: true
  sync_batchnorm: false
  tpu_cores: null
  track_grad_norm: -1
  val_check_interval: 1.0 # Don't validate before extracting features.
"""
with open("config.yaml", "w", encoding="utf8") as f:
    f.writelines(folder_padim)

## Train the model to see if it is working

In [None]:
! python ./tools/train.py --config config.yaml

## Create Benchmarking config

Benchmarking runs are configured using a yaml file. It contains five sections. The first one is `seed:` it is used to reproducibility across benchmarking runs. One of the uniqueness of Anomalib is that it supports deployment to edge devices using [OpenVINO](https://docs.openvino.ai/latest/index.html). This enables optimized performance and faster inference on majority of Intel devices. The benchmarking script can be used to compute OpenVINO inference throughput. To do this, `compute_openvino:` should be set to `true`.

> Note: Not all models in Anomalib support OpenVINO export.

The `hardware` section of the config file is used to pass the list of hardwares on which to compute the benchmarking results. If the host system has multiple GPUs, then the benchmarking computation is distributed across GPUs to speed up collection of results. By default, the results are gathered in a `csv` file but with the `writer` flag, you can also save the results to `tensorboard` and `wandb` loggers. The final section is the `grid_search` section. It has two parameters, _dataset_ and *model_name*. The _dataset_ field is used to set the values of grid search while the *model_name* section is used to pass the list of models for which the benchmark is computed.

In this notebook we are working with a toy dataset, so we also need to tell the benchmarking script to use that particular dataset instead of the default `MVTec` as defined in each of the model config file. We can either update each config file or just pass a list of one value for the fields such as _format_, _path_, etc., as shown below.

For more information about benchmarking, you can look at the [Anomalib Documentation](https://openvinotoolkit.github.io/anomalib/guides/benchmarking.html).

In [None]:
# While every attribute in dataset and model can be used to perform grid search,
# in this example the lists with only single values are used for patching the
# original model config
benchmarking_params = """seed: 42
compute_openvino: true
hardware:
  - gpu
writer: []
grid_search:
  dataset:
    name: [hazelnut]
    format: [folder]
    path: [/content/anomalib/datasets/hazelnut_toy]
    normal_dir: [good]
    abnormal_dir: [colour]
    normal_test_dir: [null]
    task: [segmentation]
    mask: [/content/anomalib/datasets/hazelnut_toy/mask/colour]
    extensions: [null]
    split_ratio: [0.2]
    image_size: [256, 128]
    num_workers: [4]
  model_name:
    - padim
    - patchcore
"""
with open("benchmark_config.yaml", "w", encoding="utf8") as f:
    f.writelines(benchmarking_params)

In [None]:
!python ./tools/benchmarking/benchmark.py --config benchmark_config.yaml

In [None]:
df = pd.read_csv("runs/padim_gpu.csv")
df.head()