# TABA 2025 Jan Hands-On AI

Intsructor: Zhun-Gee Ong (Dept. of Data & Knowledge Service Engineering, DKU)

## Outline of the day
1. Object Detection with HuggingFace and DETR.
2. Object Segmentation with SAM.
3. PyTorch Tutorial: Custom Dataset

# **DETR for Object Detection**

Paper: https://arxiv.org/abs/2005.12872

DETR (DEtection TRansformers) is a deep learning model for object detection and segmentation that uses a Transformer-based architecture. Unlike traditional convolutional object detectors, DETR leverages the attention mechanism from Transformers to directly predict object positions and classes without relying on anchor boxes or complex post-processing like non-maximum suppression (NMS).

Key Features:

- Transformer-based: It combines a CNN backbone (e.g., ResNet) for feature extraction with a Transformer that captures global context in the image through self-attention.
- Set-based Prediction: DETR treats object detection as a direct set prediction problem, using bipartite matching to pair predictions with ground truth objects, simplifying the training process.
- End-to-end: Its design allows for end-to-end training and inference, making it simpler and more flexible than many traditional methods, though it may require more computational resources for training.

![DETR](./diagrams/detr.jpeg)

Downside:

 - High Computational Cost: DETR requires significant computational resources, especially during training, due to its use of Transformers, which rely heavily on self-attention mechanisms. This can make it challenging to train without access to powerful hardware.

 - Slow Convergence: Compared to traditional convolutional object detectors like Faster R-CNN or YOLO, DETR has a slower training process. It often needs more training epochs to achieve competitive accuracy.

 - Difficulty with Small Objects: DETR can struggle with detecting small objects in complex scenes. Its attention mechanism, while powerful for capturing global context, might overlook finer details that are essential for detecting smaller objects.

![DETR Results](./diagrams/detr_compare.jpeg)

## DETR with HuggingFace

Before start we always want to makesure the packages that are going to be used later are already installed. Here, we will be installing the **transformers** library from HuggingFace. Just execute the command below in your conda environment:

```
pip install transformers
```

In [1]:
pip install transformers

Collecting transformersNote: you may need to restart the kernel to use updated packages.


ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'C:\\Users\\82109\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python311\\site-packages\\transformers\\models\\deprecated\\trajectory_transformer\\convert_trajectory_transformer_original_pytorch_checkpoint_to_pytorch.py'


[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: C:\Users\82109\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip



  Downloading transformers-4.47.1-py3-none-any.whl.metadata (44 kB)
     ---------------------------------------- 0.0/44.1 kB ? eta -:--:--
     --------- ------------------------------ 10.2/44.1 kB ? eta -:--:--
     --------- ------------------------------ 10.2/44.1 kB ? eta -:--:--
     ----------------- -------------------- 20.5/44.1 kB 131.3 kB/s eta 0:00:01
     -------------------------------------- 44.1/44.1 kB 216.9 kB/s eta 0:00:00
Collecting filelock (from transformers)
  Downloading filelock-3.16.1-py3-none-any.whl.metadata (2.9 kB)
Collecting huggingface-hub<1.0,>=0.24.0 (from transformers)
  Downloading huggingface_hub-0.27.1-py3-none-any.whl.metadata (13 kB)
Collecting numpy>=1.17 (from transformers)
  Downloading numpy-2.2.1-cp311-cp311-win_amd64.whl.metadata (60 kB)
     ---------------------------------------- 0.0/60.8 kB ? eta -:--:--
     ---------------------------------------- 60.8/60.8 kB 3.2 MB/s eta 0:00:00
Collecting pyyaml>=5.1 (from transformers)
  Download

In [2]:
import torch
import random
import cv2

from PIL import Image
import requests

import numpy as np
import matplotlib.pyplot as plt

from transformers import DetrImageProcessor, DetrForObjectDetection, DetrConfig

ModuleNotFoundError: No module named 'torch'

Awesome! Now we are going to use a random image as out input of the model.

In [3]:
# PRACTICE => get input image 

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
img = Image.open(requests.get(url, stream=True).raw)
plt.show(img)

In [5]:
# PRACTICE => create model
backbone = "facebook/detr-resnet-101"
img_processor = DetrImageProcessor.from_pretrained(
    backbone
)
detr = DetrForObjectDetection.from_pretrained(
    backbone
)
detr = detr.to("cpu")

In [None]:
# PRACTICE => process input image
input_img = img_processor(images=img, return_tensors="pt").to("cpu")

### What is *pixel_mask*?


The pixel_mask is a binary tensor of the same spatial dimensions as the preprocessed image (after resizing). It indicates which pixels in the image are valid (i.e., belong to the actual image) and which are padding pixels.

Value of 1: Represents valid pixels (part of the original image).
Value of 0: Represents padded pixels (added to create uniform batch sizes).

In [None]:
input_img["pixel_values"].shape

In [None]:
# visualise pixel_values
pv = input_img["pixel_values"].squeeze().permute(1, 2, 0).cpu().numpy()

print(pv.shape)

plt.imshow(pv)

In [8]:
outputs = detr(input_img["pixel_values"])

In [None]:
outputs

source: https://github.com/huggingface/transformers/blob/main/src/transformers/models/detr/modeling_detr.py#L122

Component inside the "outputs":

- loss
- loss_dict
- logits: torch.FloatTensor = None
- pred_boxes: torch.FloatTensor = None
- auxiliary_outputs
- last_hidden_state
- decoder_hidden_states
- decoder_attentions
- cross_attentions
- encoder_last_hidden_state
- encoder_hidden_states
- encoder_attentions

1. logits
    - A tensor containing the classificaiton scores for each object query.
    - Each row represents a specific object query, adn each column corresponds to a class (plus one column for the "no object" class).
    - Shape:
        
        (batch size, num queries, num classes+1)

        - num queries: number of object queries (default is 100)
        - num classes: number of object categories (91 for COCO dataset)
2. pred_boxes
    - A tensor containing the predicted bounding boxes for each object query.
    - represented in a normalized format:

        [x_center, y_center, w, h], where all values are in the range [0, 1].
    - Shape:
        (batch size, num queries, 4)
        - "4" represents the bounding box format: len(bbox)=4


In [None]:
print("Shape of the logits:", outputs.logits.shape)
print("Shape of the pred_boxes:", outputs.pred_boxes.shape)

## post_process_object_detection

- one of the functions from class *DetrImageProcessor*.
- Parameters:
    - outputs: Raw outptus of the DETR model.
    - target_sizes: (height, weight). Predicitons will not be resize if unset.
    - threshold: Score threshold to keep object detection predictions.
- Returns:
    - A list of dictionaries.
    - Each dict: scores, labels, bounding boxes.
    - BBox format: (top_left_x, top_left_y, bottom_right_x, bottom_right_y).

In [None]:
# post-processing of outputs



### Draw bounding boxes on the image.

We have two approaches, one is by using PIL library, the other one is utilise CV2 library.


In [None]:
# Draw bounding boxes with PIL.

from PIL import ImageDraw, ImageFont
from random import randrange

imgc = img.copy()

draw = ImageDraw.Draw(imgc)
w = 2 #width of outline
font = ImageFont.truetype('FreeMono.ttf', 30)

color = [0, 0, 0]

for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
    box = [round(i, 2) for i in box.tolist()]
    pred_obj = detr.config.id2label[label.item()]

    tl = [int(i) for i in box[0:2]] #top_left
    br = [int(i) for i in box[2:]] #bottom_right

    x1y1x2y2 = (tl[0], tl[1], br[0], br[1])

    rc = randrange(len(color))
    color[rc] = randrange(255)

    # draw bounding box
    draw.rectangle(x1y1x2y2, outline=tuple(color), width=w)

    # put text
    draw.text(tl, str(pred_obj), font=font, fill=tuple(color))

    print(
        f"Detected {pred_obj} with confidence "
        f"{round(score.item(), 3)} at location {box}"
    )

imgc

In [None]:
# Draw bounding boxes with CV2.

import cv2
import matplotlib.pyplot as plt

imgc = np.array(img.copy())
# imgc = cv2.cvtColor(np.array(img.copy()), cv2.COLOR_BGR2RGB)

thickness=2
font=cv2.FONT_HERSHEY_SIMPLEX

color = [0, 0, 0]

for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
    box = [round(i, 2) for i in box.tolist()]
    pred_obj = detr.config.id2label[label.item()]

    top_left = [int(i) for i in box[0:2]]
    bottom_right = [int(i) for i in box[2:]]

    rc = random.randrange(len(color))
    color[rc] = 255

    imgc = cv2.rectangle(imgc, top_left, bottom_right, color, thickness)
    imgc = cv2.putText(imgc, str(pred_obj), top_left, font, 1, color, 2)

    print(
        f"Detected {pred_obj} with confidence "
        f"{round(score.item(), 3)} at location {box}"
    )

plt.imshow(imgc)

## Visualizing Attention of the Last Decoder Layer of DETR

This corresponds to visualizing, for each detected object, which part of the image the model was looking at to predict this specific bounding box and class.

In [25]:
# keep only predictions of queries with 0.9+ confidence (excluding no-object class)
probas = outputs.logits.softmax(-1)[0, :, :-1].cpu()
keep = probas.max(-1).values > 0.9

bboxes_scaled = results['boxes'].cpu()

In [26]:
# use lists to store the outputs via up-values
conv_features = []

hooks = [
    detr.model.backbone.conv_encoder.register_forward_hook(
        lambda self, input, output: conv_features.append(output)
    ),
]

# propagate through the model
outputs = detr(**input_img, output_attentions=True)

for hook in hooks:
    hook.remove()

# don't need the list anymore
conv_features = conv_features[0]
# get cross-attention weights of last decoder layer - which is of shape (batch_size, num_heads, num_queries, width*height)
dec_attn_weights = outputs.cross_attentions[-1].cpu()
# average them over the 8 heads and detach from graph
dec_attn_weights = torch.mean(dec_attn_weights, dim=1).detach()

In [None]:
# get the feature map shape
h, w = conv_features[-1][0].shape[-2:]

# colors for visualization
COLORS = [[0.000, 0.447, 0.741], [0.850, 0.325, 0.098], [0.929, 0.694, 0.125],
          [0.494, 0.184, 0.556], [0.466, 0.674, 0.188], [0.301, 0.745, 0.933]]

fig, axs = plt.subplots(ncols=len(bboxes_scaled), nrows=2, figsize=(22, 7))
colors = COLORS * 100
for idx, ax_i, box in zip(keep.nonzero(), axs.T, bboxes_scaled):
    xmin, ymin, xmax, ymax = box.cpu().detach().numpy()
    ax = ax_i[0]
    ax.imshow(dec_attn_weights[0, idx].view(h, w))
    ax.axis('off')
    ax.set_title(f'query id: {idx.item()}')
    ax = ax_i[1]
    ax.imshow(img)
    ax.add_patch(plt.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin,
                               fill=False, color='blue', linewidth=3))
    ax.axis('off')
    ax.set_title(detr.config.id2label[probas[idx].argmax().item()])
fig.tight_layout()

# **Segmentation Anything Model (SAM)**

[Link to paper](https://arxiv.org/abs/2304.02643)

[SAM Playground](https://segment-anything.com/)

[Code reference](https://github.com/facebookresearch/segment-anything/blob/main/notebooks/automatic_mask_generator_example.ipynb)

Segment Anythin Model, a.k.a. SAM, is a highly versatile and general-purpose image segmentation model developed by Meta AI. It is designed to "segment anything" in images with minimal user input. The model can identify objects, parts, or regions in an image, enabling a wide range of applications.

Key Features:

- **Generalization**:
    - SAM is trained on a massive dataset of over 1 billion masks.
    - It can segment objects in unseen images and even new domains with minimal fine-tuning.
- **Interactive Segmentation**:
    - Users can guide the model by providing points, bounding boxes, or free-form input to indicate the regions of interest.
- **Multi-modal Input**:
    - It supports text prompts, which means you can describe the object you want to segment.
- **Multiple Outputs**:
    - SAM generates multiple segmentation masks for ambiguous scenarios, offering flexibility to choose the best result.
- **Fast and Efficient**:
    - Real-time inference is possible, making it suitable for tasks requiring quick segmentation.

![Segment Anything Model (SAM)](./diagrams/sam_arch.png)

SAM relies on a ViT (Vision Transformer) architecture, which excels in understanding visual data. The model integrates:
    
- Encoder: Extracts rich features from the image.
- Prompt Encoder: Processes user inputs (e.g., points, boxes).
- Mask Decoder: Generates segmentation masks based on the prompts.

## SAM from Meta

First, install the necessary libraries:

```
pip install torch torchvision
pip install git+https://github.com/facebookresearch/segment-anything.git
pip install opencv-python matplotlib
```

In [None]:
# install SAM library from Meta
! pip install git+https://github.com/facebookresearch/segment-anything.git

# Download model checkpoint from the official Github
! wget -P ./sam_weight https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth 
! wget -P ./sam_weight https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth
! wget -P ./sam_weight https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth

# Download example images (Option)
!wget -P images https://raw.githubusercontent.com/facebookresearch/segment-anything/main/notebooks/images/truck.jpg
!wget -P images https://raw.githubusercontent.com/facebookresearch/segment-anything/main/notebooks/images/groceries.jpg

In [1]:
import torch
import os
import cv2

import numpy as np
import matplotlib.pyplot as plt

from segment_anything import sam_model_registry, SamPredictor

DEVICE = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

In [2]:
def show_mask(mask, ax, random_color=False):
    if random_color:
        color = np.concatenate([np.random.random(3), np.array([0.6])], axis=0)
    else:
        color = np.array([30/255, 144/255, 255/255, 0.6])
    h, w = mask.shape[-2:]
    mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
    ax.imshow(mask_image)
    
def show_points(coords, labels, ax, marker_size=375):
    pos_points = coords[labels==1]
    neg_points = coords[labels==0]
    ax.scatter(pos_points[:, 0], pos_points[:, 1], color='green', marker='*', s=marker_size, edgecolor='white', linewidth=1.25)
    ax.scatter(neg_points[:, 0], neg_points[:, 1], color='red', marker='*', s=marker_size, edgecolor='white', linewidth=1.25)   
    
def show_box(box, ax):
    x0, y0 = box[0], box[1]
    w, h = box[2] - box[0], box[3] - box[1]
    ax.add_patch(plt.Rectangle((x0, y0), w, h, edgecolor='green', facecolor=(0,0,0,0), lw=2)) 

In [None]:
# read image

## Selecting Objects with SAM

In [41]:
'''
SAM model can be loaded with 3 different encoders: ViT-B, ViT-L, and ViT-H.
'''
model_type = "vit_b"  # Options: vit_b, vit_l, vit_h

checkpoints_root = "./sam_weight"
sam_vit_b = os.path.join(checkpoints_root, "sam_vit_b_01ec64.pth")
sam_vit_l = os.path.join(checkpoints_root, "sam_vit_l_0b3195.pth")
sam_vit_h = os.path.join(checkpoints_root, "sam_vit_h_4b8939.pth") # default

if model_type == "vit_b":
    sam_checkpoint = os.path.join(checkpoints_root, "sam_vit_b_01ec64.pth")
elif model_type == "vit_l":
    sam_checkpoint = os.path.join(checkpoints_root, "sam_vit_l_0b3195.pth")
else:
    print("Using vit_h")
    model_type = "vit_h"
    sam_checkpoint = os.path.join(checkpoints_root, "sam_vit_h_4b8939.pth")

# PRACTICE Load the model


How does our SAM look like?

In [42]:
# sam

Process the image to produce an image embedding by calling `SamPredictor.set_image`. SamPredictor remembers this embedding and will use it for subsequent mask prediction.

In [43]:
# set image of predictor
predictor.set_image(image)

To select the truck, choose a point on it. Points are input to the model in (x,y) format and come with labels 1 (foreground point) or 0 (background point). Multiple points can be input; here we use only one. The chosen point will be shown as a star on the image.

In [None]:
input_point = np.array([[500, 375]])
input_label = np.array([1])

plt.figure(figsize=(10,10))
plt.imshow(image)
show_points(input_point, input_label, plt.gca())
plt.axis('on')
plt.show()

Predict with ``SamPredictor.predict``. The model returns masks, quality predictions for those masks, and low resolution mask logits that can be passed to the next iteration of prediction.

In [45]:
# PRACTICE make prediction with predictor

- With ``multimask_output=True`` (the default setting), SAM outputs 3 masks, where scores gives the model's own estimation of the quality of these masks.
- This setting is intended for ambiguous input prompts, and helps the model disambiguate different objects consistent with the prompt.
- When False, it will return a single mask. 
- For ambiguous prompts such as a single point, it is recommended to use multimask_output=True even if only a single mask is desired; the best single mask can be chosen by picking the one with the highest score returned in scores. This will often result in a better mask.

In [None]:
masks.shape  # (number_of_masks) x H x W

In [None]:
for i, (mask, score) in enumerate(zip(masks, scores)):
    plt.figure(figsize=(10,10))
    plt.imshow(image)
    show_mask(mask, plt.gca())
    show_points(input_point, input_label, plt.gca())
    plt.title(f"Mask {i+1}, Score: {score:.3f}", fontsize=18)
    plt.axis('off')
    plt.show()

## Specifying a specific object with additional points

- The single input point is ambiguous, and the model has returned multiple objects consistent with it.
- To obtain a single object, multiple points can be provided. If available, a mask from a previous iteration can also be supplied to the model to aid in prediction.
- When specifying a single object with multiple prompts, a single mask can be requested by setting ``multimask_output=False``.

In [48]:
def vis_by_points(img, masks, points, labels):
    plt.figure(figsize=(10,10))
    plt.imshow(img)
    show_mask(masks, plt.gca())
    show_points(points, labels, plt.gca())
    plt.axis('off')
    plt.show()

# we can make all operations with one line
def predict_masks(points, labels, scores, multi_mask=True):
    mask_input = logits[np.argmax(scores), :, :]  # Choose the model's best mask
    
    masks, _, _ = predictor.predict(
        point_coords=points,
        point_labels=labels,
        mask_input=mask_input[None, :, :],
        multimask_output=multi_mask,
    )

    return masks

In [49]:
input_point = np.array([[500, 375], [1125, 625]])
input_label = np.array([1, 1])

mask_input = logits[np.argmax(scores), :, :]  # Choose the model's best mask

In [50]:
masks, _, _ = predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    mask_input=mask_input[None, :, :],
    multimask_output=False,
)

In [51]:
masks = predict_masks(input_point, input_label, scores, multi_mask=False)

In [None]:
masks.shape

In [None]:
vis_by_points(image, masks, input_point, input_label)

To exclude the car and specify just the window, a background point (with label 0, here shown in red) can be supplied.

In [None]:
input_label = np.array([1, 0])

masks = predict_masks(input_point, input_label, scores, multi_mask=False)

vis_by_points(image, masks, input_point, input_label)

## Specifying a specific object with a box

The model can also take a box as input, provided in xyxy format.

In [None]:
input_box = np.array([425, 600, 700, 875])
# right rear wheel: [425, 600, 700, 875]

masks, _, _ = predictor.predict(
    point_coords=None,
    point_labels=None,
    box=input_box[None, :],
    multimask_output=False,
)

plt.figure(figsize=(10, 10))
plt.imshow(image)
show_mask(masks[0], plt.gca())
show_box(input_box, plt.gca())
plt.axis('off')
plt.show()

## Combining points and boxes

Points and boxes may be combined, just by including both types of prompts to the predictor. Here this can be used to select just the trucks's tire, instead of the entire wheel.

In [None]:
input_box = np.array([425, 600, 700, 875])
input_point = np.array([[575, 750]])
input_label = np.array([0])

masks, _, _ = predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    box=input_box,
    multimask_output=False,
)

plt.figure(figsize=(10, 10))
plt.imshow(image)
show_mask(masks[0], plt.gca())
show_box(input_box, plt.gca())
show_points(input_point, input_label, plt.gca())
plt.axis('off')
plt.show()

# PyTorch Tutorial: Custom Dataset

1. Custom Dataset
2. Data Loader

## Custom Dataset

When dealing with deep learning projects, data usually doesn't come in a perfect format. You may have images stored in a specific directory structure, or you might have data stored in CSV files.

In these case, writing a custom dataset class helps load and process data in a structured way that PyTorch can work with.

Creating custom dataset in PyTorch cna be done by subclassing the **Dataset** class from <mark>**torch.utils.data**</mark>.

import torch
from torch.utils.data import Dataset


class CustomDataset(Dataset):
    def __init__(self, data_folder, label_folder, transform=None):
        super().__init__()
        """
        Args:
            data_folder (string/path): Path of the folder where the data is stored.
            data_folder (string/path): Path of the folder where the annotation file is stored.
            transform (callable, optional): Optional transform to be applied on a sample.
        """
        self.data_folder = data_folder
        self.label_folder = label_folder
        self.transform = transform

        self.db = self._load_annotation()

    def _load_annotation(self):

        """
        In this function, we can define how our annotation file will be read and process.
        """

        processed_annotation = None

        return processed_annotation
    
    def __len__(self):
        # Return the total number of samples
        return len(self.db)
    
    def __getitem__(self, idx):
        sample, target = self.db[idx]

        if self.transform:
            sample = self.transform(sample)

        return sample, target

Instatiate our **CustomDataset**:

In [None]:
from torchvision import transforms
from torch.utils.data import DataLoader

tfms = transforms.Compose([
    transforms.ToTensor(),
])

my_dataset = CustomDataset(data_folder="./", label_folder="./", transform=tfms)

# dataloader = DataLoader(
#     dataset=my_dataset,
#     batch_size=64,
#     shuffle=True,       # normally True for training set, False for test and validation set
#     sampler=None,       # index of sample that will be used, normally assigned when doing k-fold validation
#     num_workers=2,      # number of subprocesses to use for data loading
#     pin_memory=True     #set to True when using GPU training
# )

Let's assume we have a cats vs dogs dataset with following directory tree:
```
${root_of_the_dataset}
 |--anno
 |  |--train_anno.json
 |  `--val_anno.json
 `--images
    |--train
    |  |-- class_0
    |  |   |-- 01.jpg
    |  |   |-- 02.jpg
    |  |   |-- 03.jpg
    |  |   |-- ...
    |  |-- class_1
    |  |   |-- 01.jpg
    |  |   |-- 02.jpg
    |  |   |-- 03.jpg
    |  |   |-- ...
    `--validation
       |-- class_0
       |   |-- 01.jpg
       |   |-- 02.jpg
       |   |-- 03.jpg
       |   |-- ...
       |-- class_1
       |    |-- 01.jpg
       |    |-- 02.jpg
       |    |-- 03.jpg
       |    |-- ...
```

(dataset can be downloaded from: )

We can create the custom dataset class for this cats vs dogs dataset:

In [None]:
# PRACTICE

import os
import cv2
import torch
import numpy as np
import torch.nn as nn

from torch.utils.data import Dataset
from tqdm import tqdm

class CatDog(Dataset):
    def __init__(self, root, one_hot_enc=False, transform=None):
        super().__init__()

        # HAPPY CODING

    def __len__(self):

        # HAPPY CODING

        return 
    
    def _load_anno(self):

        # HAPPY CODING

        return 
    
    def __getitem__(self, idx):
        
        # HAPPY CODING

        return

In [None]:
import torchvision.transforms as transforms

# PRACTICE

tfms = transforms.Compose([
        # HAPPY CODING

])



In [None]:
ONE_HOT_ENC = True

# HAPPY CODING

# train_ds = CatDog()


# val_ds = CatDog()

## Data loader

In [None]:
BATCH_SIZE = 16
SHUFFLE = True
NUM_WORKERS = 2
PIN_MEMORY = True

# HAPPY CODING

train_loader = DataLoader(
    # HAPPY CODING
)

val_loader = DataLoader(
    # HAPPY CODING

)

for img, target in train_loader:
    print("Input image shape: {}".format(img.shape))
    print("Target shape: {}".format(target.shape))