# Use `Folder` for Customs Datasets

# Installing Anomalib

The easiest way to install anomalib is to use pip. You can install it from the command line using the following command:


In [None]:
%pip install anomalib

## Setting up the Dataset Directory

This cell is to ensure we change the directory to have access to the datasets.


In [2]:
from pathlib import Path

# NOTE: Provide the path to the dataset root directory.
#   If the datasets is not downloaded, it will be downloaded
#   to this directory.
dataset_root = Path.cwd().parent.parent.parent / "datasets" / "hazelnut_toy"

## Use Folder Dataset (for Custom Datasets) via API

Here we show how one can utilize custom datasets to train anomalib models. A custom dataset in this model can be of the following types:

- A dataset with good and bad images.
- A dataset with good and bad images as well as mask ground-truths for pixel-wise evaluation.
- A dataset with good and bad images that is already split into training and testing sets.

To experiment this setting we provide a toy dataset that could be downloaded from the following [https://github.com/openvinotoolkit/anomalib/blob/main/docs/source/data/hazelnut_toy.zip](link). For the rest of the tutorial, we assume that the dataset is downloaded and extracted to `../datasets`, located in the `anomalib` directory.


In [3]:
# flake8: noqa
import numpy as np
from PIL import Image
from torchvision.transforms.v2.functional import to_pil_image

from anomalib.data import Folder, FolderDataset

### DataModule

Similar to how we created the datamodules for existing benchmarking datasets in the previous tutorials, we can also create an Anomalib datamodule for our custom hazelnut dataset.

In addition to the root folder of the dataset, we now also specify which folder contains the normal images, which folder contains the anomalous images, and which folder contains the ground truth masks for the anomalous images.


In [None]:
folder_datamodule = Folder(
    name="hazelnut_toy",
    root=dataset_root,
    normal_dir="good",
    abnormal_dir="crack",
    mask_dir=dataset_root / "mask" / "crack",
)
folder_datamodule.setup()

In [None]:
# Train images
data = next(iter(folder_datamodule.train_data))
print(data.image.shape)

In [None]:
# Test images
data = next(iter(folder_datamodule.test_data))
print(data.image.shape, data.gt_mask.shape)

As can be seen above, creating the dataloaders are pretty straghtforward, which could be directly used for training/testing/inference. We could visualize samples from the dataloaders as well.


In [None]:
img = to_pil_image(data.image.clone())
msk = to_pil_image(data.gt_mask.int() * 255).convert("RGB")

Image.fromarray(np.hstack((np.array(img), np.array(msk))))

`Folder` data module offers much more flexibility cater all different sorts of needs. Please refer to the documentation for more details.


### Torch Dataset

As in earlier examples, we can also create a standalone PyTorch dataset instance.


In [None]:
FolderDataset??

Now let's create the dataset, we'll start with the training subset.

In [None]:
folder_dataset_train = FolderDataset(
    name="hazelnut_toy",
    normal_dir=dataset_root / "good",
    abnormal_dir=dataset_root / "crack",
    split="train",
)
print(len(folder_dataset_train))
sample = folder_dataset_train[0]
print(sample.image.shape, sample.image_path, sample.gt_label)

As can be seen above, when we choose `train` split, the dataset contains 34 samples. These are the normal images that have been assigned to the training set, which have a corresponding ground truth label of `False`, indicating that the image does not contain an anomaly. 

Now let's have a look at the test set:



In [None]:
# Folder Classification Test Set
folder_dataset_test = FolderDataset(
    name="hazelnut_toy",
    normal_dir=dataset_root / "good",
    abnormal_dir=dataset_root / "crack",
    split="test",
)
print(len(folder_dataset_test))
sample = folder_dataset_test[0]
print(sample.image.shape, sample.image_path, sample.gt_label)

#### Segmentation Task

It is also possible to configure the Folder dataset for the segmentation task, where the dataset object returns image and ground-truth mask. To achieve this, we need to pass a folder of ground truth masks to the dataset. The mask folder should contain a ground truth pixel mask for every anomalous image in the dataset.


In [None]:
# Folder Segmentation Train Set
folder_dataset_segmentation_train = FolderDataset(
    name="hazelnut_toy",
    normal_dir=dataset_root / "good",
    abnormal_dir=dataset_root / "crack",
    split="train",
    mask_dir=dataset_root / "mask" / "crack",
)
print(len(folder_dataset_segmentation_train))
sample = folder_dataset_segmentation_train[0]
print(sample.image.shape, sample.gt_mask.shape, sample.image_path, sample.gt_label)

In [None]:
# Folder Segmentation Test Set
folder_dataset_segmentation_test = FolderDataset(
    name="hazelnut_toy",
    normal_dir=dataset_root / "good",
    abnormal_dir=dataset_root / "crack",
    split="test",
    mask_dir=dataset_root / "mask" / "crack",
)
print(len(folder_dataset_segmentation_test))
sample = folder_dataset_segmentation_test[0]
print(sample.image.shape, sample.gt_mask.shape, sample.image_path, sample.gt_label)

Let's visualize the image and the mask...


In [None]:
img = to_pil_image(data.image.clone())
msk = to_pil_image(data.gt_mask.int() * 255).convert("RGB")

Image.fromarray(np.hstack((np.array(img), np.array(msk))))