# Use `Folder` for Customs Datasets

# Installing Anomalib

The easiest way to install anomalib is to use pip. You can install it from the command line using the following command:


In [None]:
%pip install anomalib

## Setting up the Dataset Directory

This cell is to ensure we change the directory to have access to the datasets.


In [None]:
from pathlib import Path

# NOTE: Provide the path to the dataset root directory.
#   If the datasets is not downloaded, it will be downloaded
#   to this directory.
dataset_root = Path.cwd().parent / "datasets" / "hazelnut_toy"

## Use Folder Dataset (for Custom Datasets) via API

Here we show how one can utilize custom datasets to train anomalib models. A custom dataset in this model can be of the following types:

- A dataset with good and bad images.
- A dataset with good and bad images as well as mask ground-truths for pixel-wise evaluation.
- A dataset with good and bad images that is already split into training and testing sets.

To experiment this setting we provide a toy dataset that could be downloaded from the following [https://github.com/openvinotoolkit/anomalib/blob/main/docs/source/data/hazelnut_toy.zip](link). For the rest of the tutorial, we assume that the dataset is downloaded and extracted to `../datasets`, located in the `anomalib` directory.


In [None]:
# pylint: disable=wrong-import-position, wrong-import-order
# flake8: noqa
import numpy as np
from PIL import Image
from torchvision.transforms import ToPILImage

from anomalib.data import TaskType
from anomalib.data.folder import Folder, FolderDataset
from anomalib.data.utils import InputNormalizationMethod, get_transforms

### DataModule

Similar to how we created the datamodules for existing benchmarking datasets in the previous tutorials, we can also create an Anomalib datamodule for our custom hazelnut dataset.

In addition to the root folder of the dataset, we now also specify which folder contains the normal images, which folder contains the anomalous images, and which folder contains the ground truth masks for the anomalous images.


In [None]:
folder_datamodule = Folder(
    root=dataset_root,
    normal_dir="good",
    abnormal_dir="crack",
    task=TaskType.SEGMENTATION,
    mask_dir=dataset_root / "mask" / "crack",
    image_size=256,
    normalization=InputNormalizationMethod.NONE,  # don't apply normalization, as we want to visualize the images
)
folder_datamodule.setup()

In [None]:
# Train images
i, data = next(enumerate(folder_datamodule.train_dataloader()))
print(data.keys(), data["image"].shape)

In [None]:
# Test images
i, data = next(enumerate(folder_datamodule.test_dataloader()))
print(data.keys(), data["image"].shape, data["mask"].shape)

As can be seen above, creating the dataloaders are pretty straghtforward, which could be directly used for training/testing/inference. We could visualize samples from the dataloaders as well.


In [None]:
img = ToPILImage()(data["image"][0].clone())
msk = ToPILImage()(data["mask"][0]).convert("RGB")

Image.fromarray(np.hstack((np.array(img), np.array(msk))))

`Folder` data module offers much more flexibility cater all different sorts of needs. Please refer to the documentation for more details.


### Torch Dataset

As in earlier examples, we can also create a standalone PyTorch dataset instance.


In [None]:
FolderDataset??

To create `FolderDataset` we need to create the albumentations object that applies transforms to the input image.


In [None]:
get_transforms??

In [None]:
image_size = (256, 256)
transform = get_transforms(image_size=256, normalization=InputNormalizationMethod.NONE)

#### Classification Task


In [None]:
folder_dataset_classification_train = FolderDataset(
    normal_dir=dataset_root / "good",
    abnormal_dir=dataset_root / "crack",
    split="train",
    transform=transform,
    task=TaskType.CLASSIFICATION,
)
folder_dataset_classification_train.setup()
folder_dataset_classification_train.samples.head()

Let's look at the first sample in the dataset.


In [None]:
data = folder_dataset_classification_train[0]
print(data.keys(), data["image"].shape)

As can be seen above, when we choose `classification` task and `train` split, the dataset only returns `image`. This is mainly because training only requires normal images and no labels. Now let's try `test` split for the `classification` task


In [None]:
# Folder Classification Test Set
folder_dataset_classification_test = FolderDataset(
    normal_dir=dataset_root / "good",
    abnormal_dir=dataset_root / "crack",
    split="test",
    transform=transform,
    task=TaskType.CLASSIFICATION,
)
folder_dataset_classification_test.setup()
folder_dataset_classification_test.samples.head()

In [None]:
data = folder_dataset_classification_test[0]
print(data.keys(), data["image"].shape, data["image_path"], data["label"])

#### Segmentation Task

It is also possible to configure the Folder dataset for the segmentation task, where the dataset object returns image and ground-truth mask.


In [None]:
# Folder Segmentation Train Set
folder_dataset_segmentation_train = FolderDataset(
    normal_dir=dataset_root / "good",
    abnormal_dir=dataset_root / "crack",
    split="train",
    transform=transform,
    mask_dir=dataset_root / "mask" / "crack",
    task=TaskType.SEGMENTATION,
)
folder_dataset_segmentation_train.setup()  # like the datamodule, the dataset needs to be set up before use
folder_dataset_segmentation_train.samples.head()

In [None]:
# Folder Segmentation Test Set
folder_dataset_segmentation_test = FolderDataset(
    normal_dir=dataset_root / "good",
    abnormal_dir=dataset_root / "crack",
    split="test",
    transform=transform,
    mask_dir=dataset_root / "mask" / "crack",
    task=TaskType.SEGMENTATION,
)
folder_dataset_segmentation_test.setup()  # like the datamodule, the dataset needs to be set up before use
folder_dataset_segmentation_test.samples.head(10)

In [None]:
data = folder_dataset_segmentation_test[3]
print(data.keys(), data["image"].shape, data["mask"].shape)

Let's visualize the image and the mask...


In [None]:
img = ToPILImage()(data["image"].clone())
msk = ToPILImage()(data["mask"]).convert("RGB")

Image.fromarray(np.hstack((np.array(img), np.array(msk))))