## Use Folder Dataset (for Custom Datasets) via API

Here we show how one can utilize custom datasets to train anomalib models. A custom dataset in this model can be of the following types:

- A dataset with good and bad images.
- A dataset with good and bad images as well as mask ground-truths for pixel-wise evaluation.
- A dataset with good and bad images that is already split into training and testing sets.

To experiment this setting we provide a toy dataset that could be downloaded from the following [https://github.com/openvinotoolkit/anomalib/blob/main/docs/source/data/hazelnut_toy.zip](link). For the rest of the tutorial, we assume that the dataset is downloaded and extracted to `../../datasets`, located in the `anomalib` directory.

In [None]:
import numpy as np
from PIL import Image
from torchvision.transforms import ToPILImage

from anomalib.data.folder import Folder, FolderDataset
from anomalib.data.utils import InputNormalizationMethod, get_transforms

### Torch Dataset

In [None]:
FolderDataset??

To create `FolderDataset` we need to create the albumentations object that applies transforms to the input image.

In [None]:
get_transforms??

In [None]:
image_size = (256, 256)
transform = get_transforms(image_size=256, normalization=InputNormalizationMethod.NONE)

#### Classification Task

In [None]:
folder_dataset_classification_train = FolderDataset(
    normal_dir="../../datasets/hazelnut_toy/good",
    abnormal_dir="../../datasets/hazelnut_toy/crack",
    split="train",
    transform=transform,
    task="classification",
)
folder_dataset_classification_train.setup()
folder_dataset_classification_train.samples.head()

Let's look at the first sample in the dataset.

In [None]:
data = folder_dataset_classification_train[0]
data.keys(), data["image"].shape

As can be seen above, when we choose `classification` task and `train` split, the dataset only returns `image`. This is mainly because training only requires normal images and no labels. Now let's try `test` split for the `classification` task

In [None]:
# Folder Classification Test Set
folder_dataset_classification_test = FolderDataset(
    normal_dir="../../datasets/hazelnut_toy/good",
    abnormal_dir="../../datasets/hazelnut_toy/crack",
    split="test",
    transform=transform,
    task="classification",
)
folder_dataset_classification_test.setup()
folder_dataset_classification_test.samples.head()

In [None]:
data = folder_dataset_classification_test[0]
data.keys(), data["image"].shape, data["image_path"], data["label"]

#### Segmentation Task

It is also possible to configure the Folder dataset for the segmentation task, where the dataset object returns image and ground-truth mask.

In [None]:
# Folder Segmentation Train Set
folder_dataset_segmentation_train = FolderDataset(
    normal_dir="../../datasets/hazelnut_toy/good",
    abnormal_dir="../../datasets/hazelnut_toy/crack",
    split="train",
    transform=transform,
    mask_dir="../../datasets/hazelnut_toy/mask/crack",
    task="segmentation",
)
folder_dataset_segmentation_train.setup()
folder_dataset_segmentation_train.samples.head()

In [None]:
# Folder Segmentation Test Set
folder_dataset_segmentation_test = FolderDataset(
    normal_dir="../../datasets/hazelnut_toy/good",
    abnormal_dir="../../datasets/hazelnut_toy/crack",
    split="test",
    transform=transform,
    mask_dir="../../datasets/hazelnut_toy/mask/crack",
    task="segmentation",
)
folder_dataset_segmentation_test.setup()
folder_dataset_segmentation_test.samples.head(10)

In [None]:
data = folder_dataset_segmentation_test[3]
data.keys(), data["image"].shape, data["mask"].shape

Let's visualize the image and the mask...

In [None]:
img = ToPILImage()(data["image"].clone())
msk = ToPILImage()(data["mask"]).convert("RGB")

Image.fromarray(np.hstack((np.array(img), np.array(msk))))

### DataModule

So far, we have shown the Torch Dataset implementation of Folder dataset. This is quite useful to get a sample. However, when we train models end-to-end fashion, we do need much more than this such as downloading the dataset, creating train/val/test/inference dataloaders. To handle all these, we have the PyTorch Lightning DataModule implementation, which is shown below

In [None]:
folder_datamodule = Folder(
    root="../../datasets/hazelnut_toy/",
    normal_dir="good",
    abnormal_dir="crack",
    task="segmentation",
    mask_dir="../../datasets/hazelnut_toy/mask/crack",
    image_size=256,
    normalization=InputNormalizationMethod.NONE,
)
folder_datamodule.setup()

In [None]:
# Train images
i, data = next(enumerate(folder_datamodule.train_dataloader()))
data.keys(), data["image"].shape

In [None]:
# Test images
i, data = next(enumerate(folder_datamodule.test_dataloader()))
data.keys(), data["image"].shape, data["mask"].shape

As can be seen above, creating the dataloaders are pretty straghtforward, which could be directly used for training/testing/inference. We could visualize samples from the dataloaders as well.

In [None]:
img = ToPILImage()(data["image"][0].clone())
msk = ToPILImage()(data["mask"][0]).convert("RGB")

Image.fromarray(np.hstack((np.array(img), np.array(msk))))

`Folder` data module offers much more flexibility cater all different sorts of needs. Please refer to the documentation for more details.