# `Dataset` and `DataLoader` from `torch.utils.data`
On PyTorch's website, there are tutos on how to construct one's own customized dataset, e.g. [here](http://localhost:8888/tree/git-repos/phunc20/ML-frameworks/pytorch/tutorials/Dataset_DataLoader)

However, it won't harm to add a few more remarks. The example given in the link above is as follows.

```python
import os
import pandas as pd
from torchvision.io import read_image

class CustomImageDataset(Dataset):
    def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
        self.img_labels = pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = transform
        self.target_transform = target_transform

    def __len__(self):
        return len(self.img_labels)

    def __getitem__(self, idx):
        img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
        image = read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform:
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)
        return image, label
```

What I want to call to attention is that
> this only gives **_an example_**, in the sense that we could follow the skeleton of this example but we are not obliged to structure our data with exactly the same directory structure on our hard disk.

For instance, we do not need to always have a CSV file like in the code.

In order to illustrate this, we will construct a simple dataset of ours together and we will learn the nitty-gritty along the way. To that end, I have hand-made a dataset in the folder `./pets/`, whose structure is like
```
$ tree pets/
pets/
└── train
    ├── bong
    │   ├── IMG-5547.jpg
    │   ├── IMG-5548.jpg
    │   └── IMG-5549.jpg
    ├── meochi
    │   ├── IMG-5533.jpg
    │   ├── IMG-5534.jpg
    │   └── IMG-5551.jpg
    └── meoem
        ├── IMG-5523.jpg
        ├── IMG-5541.jpg
        └── IMG-5542.jpg

4 directories, 9 files
```
These are the photos of the pets I raise taken by the camera of a smart phone. I have either resized or rotated (or both) these images on purpose in order to

- make them more real
- anticipate some difficulties (as we will see shortly)

**Rmk.** Of course, real-life image dataset won't contain so few images. I just hope that this toy example will help others and myself understand/review PyTorch's `Dataset` and `DataLoader`.

## Let's Get Started
It seems that I forgot to mention the purpose of our dataset: It is created to train a model to distinguish btw each of our pets, named `bong`, `meochi`, `meoem`. But we won't train any model in this notebook; here we are just interested in how to setup a dataset for training in PyTorch.