### PyTorch Dataset and Dataloader Demo

We illustrate how to build a custom dataset and dataloader for object detection. 

We will use our collected and labeled images for object detection. Over 1,000 `640x480` RGB images were collected using an off-the-shelf USB camera (A4TECH PK-635G).

The images were labeled using [VIA](https://www.robots.ox.ac.uk/~vgg/software/via/) and the image filenames and labels are stored in a CSV file. 

Before continuiing, please download the dataset from [here](https://bit.ly/adl2-ssd). Extract the dataset on the same directory as this file. The directory structure is something like this.

```
--> datasets --> python --> config.py
                        --> dataloader_demo.ipynb
                        --> drinks/
                        --> label_utils.py
                        --> sample_labels.png
                        ...
```

**Note**: Before running this demo, please make sure that you have `wandb.ai` account. See our discussion on [`wandb.ai`](https://github.com/roatienza/Deep-Learning-Experiments/blob/master/versions/2022/tools/python/wandb_demo.ipynb)

### Sample image annotation

A sample image annotation is shown below. There are only 3 categories: `Water`, `Soda`, and `Juice`. By default the backgroud is the first category. The bounding boxes and classs names as shown. Each bounding box is defined by 4 numbers. The numbers define 2 corners of the bounding box: xmin, xmax, ymin, and ymax in pixel coordinates.

<img src="sample_labels.png" width="640" height="480">

**Import** the required modules.

[`label_utils`](label_utils.py) is a helper module for loading the CSV file and converting a label to class name. Basically, `0` is `background`, `1` is `Water`, `2` is `Soda` and `3` is `Juice`. It also contains helper functions to build the label dictionary from the CSV file.

In [5]:
import torch
import numpy as np
import wandb
import label_utils
from torch.utils.data import DataLoader
from torchvision import transforms
from PIL import Image

**Login to and initialize** `wandb`. You will need to use your `wandb` API key to run this demo.

We will use the following dataset and dataloader configuration. 


In [6]:
wandb.login()
config = {
    "num_workers": 4,
    "pin_memory": True,
    "batch_size": 32,
    "dataset": "drinks",
    "train_split": "drinks/labels_train.csv",
    "test_split": "drinks/labels_test.csv",}
run = wandb.init(project="dataloader-project", entity="upeee", config=config)

2022-03-19 13:49:29.052078: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-03-19 13:49:29.052124: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


### Dataset and Dataloader for Custom Object Detection

The dataset CSV file is a list of image filenames and their labels. The image filenames and their labels are stored in a CSV file using the following format.

```
frame,xmin,xmax,ymin,ymax,class_id
0001000.jpg,310,445,104,443,1
0000999.jpg,194,354,96,478,1
0000998.jpg,105,383,134,244,1
0000997.jpg,157,493,89,194,1
0000996.jpg,51,435,207,347,1
...
```

A label represents the coordinates of the object bounding box.

We will build a dictionary of `path_to_image` to `label` mapping. The `label` is a tensor of the form `xmin,xmax,ymin,ymax,class_id`. There can be multiple labels for an image since there can be multiple objects in an image.

The `ImageDataset` class is a custom dataset class that loads the images and labels using the dictionary. The `ImageDataset` class is a subclass of the abstract class `torch.utils.data.Dataset` that supports `__len__()` and `__getitem__()` methods. This is also known as **map-style** method. A dataset can also be **iterable-style** that supports the `__iter__()` method.

Our train and test dataloaders use the `wandb` configuration. 

We also create a custom `collate_fn` function to handle the labels per image. `collate_fn` pads all labels in a mini-batch to the same size.

In [7]:
test_dict, test_classes = label_utils.build_label_dictionary(
    config['test_split'])
train_dict, train_classes = label_utils.build_label_dictionary(
    config['train_split'])


class ImageDataset(torch.utils.data.Dataset):
    def __init__(self, dictionary, transform=None):
        self.dictionary = dictionary
        self.transform = transform

    def __len__(self):
        return len(self.dictionary)

    def __getitem__(self, idx):
        # retrieve the image filename
        key = list(self.dictionary.keys())[idx]
        # retrieve all bounding boxes
        boxes = self.dictionary[key]
        # open the file as a PIL image
        img = Image.open(key)
        # apply the necessary transforms
        # transforms like crop, resize, normalize, etc
        if self.transform:
            img = self.transform(img)
        
        # return a list of images and corresponding labels
        return img, boxes


train_split = ImageDataset(train_dict, transforms.ToTensor())
test_split = ImageDataset(test_dict, transforms.ToTensor())

# This is approx 95/5 split
print("Train split len:", len(train_split))
print("Test split len:", len(test_split))

# We do not have a validation split

def collate_fn(batch):
    maxlen = max([len(x[1]) for x in batch])
    images = []
    boxes = []
    for i in range(len(batch)):
        img, box = batch[i]
        images.append(img)
        # pad with zeros if less than maxlen
        if len(box) < maxlen:
            box = np.concatenate(
                (box, np.zeros((maxlen-len(box), box.shape[-1]))), axis=0)

        box = torch.from_numpy(box)
        boxes.append(box)

    return torch.stack(images, 0), torch.stack(boxes, 0)


train_loader = DataLoader(train_split,
                          batch_size=config['batch_size'],
                          shuffle=True,
                          num_workers=config['num_workers'],
                          pin_memory=config['pin_memory'],
                          collate_fn=collate_fn)

test_loader = DataLoader(test_split,
                         batch_size=config['batch_size'],
                         shuffle=False,
                         num_workers=config['num_workers'],
                         pin_memory=config['pin_memory'],
                         collate_fn=collate_fn)

Train split len: 996
Test split len: 51


### Visualizing sample data from train split

We visualize sample images from the train split by creating a `wandb` table with one column to visualize an image and the objects using bounding boxes. The annotation is stored in a list of dictionaries named `dict`. One dictionary per image using `position`, `class_id`, `domain` and `box_caption` as keys. Please check the [`wandb` media documentation](https://docs.wandb.ai/guides/track/log/media) for more details.

In [8]:
# sample one mini-batch
images, boxes = next(iter(train_loader))
# map of label to class name
class_labels = {i: label_utils.index2class(i) for i in train_classes}

run.display(height=1000)
table = wandb.Table(columns=['Image'])

# we use wandb to visualize the objects and bounding boxes
for image, box in zip(images, boxes):
    dict = []
    for i in range(box.shape[0]):
        if box[i, -1] == 0:
            continue
        dict_item = {}
        dict_item["position"] = {
            "minX": box[i, 0].item(),
            "maxX": box[i, 1].item(),
            "minY": box[i, 2].item(),
            "maxY": box[i, 3].item(),
        }
        dict_item["domain"] = "pixel"
        dict_item["class_id"] = (int)(box[i, 4].item())
        dict_item["box_caption"] = label_utils.index2class(
            dict_item["class_id"])
        dict.append(dict_item)

    img = wandb.Image(image, boxes={
        "ground_truth": {
            "box_data": dict,
            "class_labels": class_labels
        }
    })
    table.add_data(img)

wandb.log({"train_loader": table})
wandb.finish()






VBox(children=(Label(value='15.730 MB of 15.730 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, m…