<p>
  <b>AI Lab: Deep Learning for Computer Vision</b><br>
  <b><a href="https://www.wqu.edu/">WorldQuant University</a></b>
</p>

<div class="alert alert-success" role="alert">
  <p>
    <center><b>Usage Guidelines</b></center>
  </p>
  <p>
    This file is licensed under <a href="https://creativecommons.org/licenses/by-nc-nd/4.0/">Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International</a>.
  </p>
  <p>
    You <b>can</b>:
    <ul>
      <li><span style="color: green">✓</span> Download this file</li>
      <li><span style="color: green">✓</span> Post this file in public repositories</li>
    </ul>
    You <b>must always</b>:
    <ul>
      <li><span style="color: green">✓</span> Give credit to <a href="https://www.wqu.edu/">WorldQuant University</a> for the creation of this file</li>
      <li><span style="color: green">✓</span> Provide a <a href="https://creativecommons.org/licenses/by-nc-nd/4.0/">link to the license</a></li>
    </ul>
    You <b>cannot</b>:
    <ul>
      <li><span style="color: red">✗</span> Create derivatives or adaptations of this file</li>
      <li><span style="color: red">✗</span> Use this file for commercial purposes</li>
    </ul>
  </p>
  <p>
    Failure to follow these guidelines is a violation of your terms of service and could lead to your expulsion from WorldQuant University and the revocation your certificate.
  </p>
</div>

### Getting Started

Let's import the packages we'll need in this notebook.  Most are familiar.  We will need version 2 of the `torchvision.transforms` module here.  The API is slightly different than that of version 1 that we've used previously, but it's pretty similar.

In [None]:
import pathlib
import sys

import matplotlib.pyplot as plt
import torch
import torchinfo
import torchvision
import ultralytics
from PIL import Image
from torchvision.transforms import v2
from ultralytics import YOLO

In case we want to reproduce this notebook in the future, we'll record the version information. 

In [None]:
print("Platform:", sys.platform)
print("Python version:", sys.version)
print("---")
print("matplotlib version:", plt.matplotlib.__version__)
print("PIL version:", Image.__version__)
print("torch version:", torch.__version__)
print("torchvision version:", torchvision.__version__)
print("ultralytics version:", ultralytics.__version__)

These are the classes of the Dhaka AI data set we've seen before.

In [None]:
CLASS_DICT = dict(
    enumerate(
        [
            "ambulance",
            "army vehicle",
            "auto rickshaw",
            "bicycle",
            "bus",
            "car",
            "garbagevan",
            "human hauler",
            "minibus",
            "minivan",
            "motorbike",
            "pickup",
            "policecar",
            "rickshaw",
            "scooter",
            "suv",
            "taxi",
            "three wheelers (CNG)",
            "truck",
            "van",
            "wheelbarrow",
        ]
    )
)

print("CLASS_DICT type,", type(CLASS_DICT))
CLASS_DICT

### Data Augmentation in Our YOLO Model

In the previous notebook, we passed our training images to the YOLO model and let it do its thing.  The obvious assumption to make is that these images would be used as is, but it turns out not to be so.  To demonstrate what was happening, we'll load the model back up and poke around inside of it a bit.

Let's start by finding a saved version of the model.  This cell should show all of the training runs that have been completed.

In [None]:
runs_dir = pathlib.Path("runs", "detect")
list(runs_dir.glob("train*"))

<div class="alert alert-info" role="alert">
If you don't see anything listed here, go back and run the previous notebook all the way through!
</div>

**Task 3.5.1:** Choose a training run, and check that there are model weights saved in the `weights/best.pt` file for that run.

In [None]:
run_dir = ...
weights_file = ...

print("Weights file exists?", weights_file.exists())

**Task 3.5.2:** Load the model from the weights file.

In [None]:
model = ...

torchinfo.summary(model)

We need to get the model set up to load the data.  The easiest way to do that is to train it for an epoch.

<div class="alert alert-info" role="alert">
When you call <code>.train()</code> on a YOLO model, it sets up a data loader, if it doesn't already exist.  Unfortunately, there's no easy way to trigger that set-up step without doing an epoch of training. 😔
</div>

**Task 3.5.3:** Run one epoch of training.

In [None]:
result = model.train(
    data=model.overrides["data"],
    epochs=...,
    batch=8,
    workers=1,
)

The model should now have a `.trainer` attribute, which has a `.train_loader` attribute.  This will be a `DataLoader` that loads the training data.

**Task 3.5.4:** Save this data loader to variable `loader`.

In [None]:
loader = ...

print(type(loader))

Data loaders are _iterables_.  That is, you can put them in a `for` loop to load data one batch at a time.  We just want to read one batch from it, though.

**Task 3.5.5:** Load one batch from `loader` into the variable `batch`.  You can do this by constructing a `for` loop over `loader` and calling `break` inside the loop, so that it only runs once.

In [None]:
...

print(type(batch))

<div class="alert alert-info" role="alert">
A more advanced way to accomplish this same thing is: <code>batch = next(iter(loader))</code>
</div>

We get back a dictionary. (What a surprise!)  Let's explore what's in this structure.

**Task 3.5.6:** Print out the keys in `batch`.

In [None]:
print(...)

**Task 3.5.7:** Print out the shape of the `img` value.

In [None]:
print(...)

The dimension of 3 represents the color channels.  The dimension of 640 are the width and height.  So what does the dimension of 8 represent?

You can get a clue by reviewing the call to `model.train`.  We set a batch size of 8.  This tensor thus represents eight training images.

**Task 3.5.8:** Print out the shape of the `bboxes` value.

In [None]:
print(...)


That seems like a lot of bounding boxes for one image, so these must be the boxes for all of the images in the batch.

<div class="alert alert-info" role="alert">
The exact number of bounding boxes will depend on the random batch that got delivered to you.  If you re-run the cell that creates <code>batch</code>, you'll find that you get another number here.
</div>

The image index in the batch that the box corresponds to is given in the `batch_idx` value.

In [None]:
print(batch["batch_idx"])

Thus, we can select the bounding boxes for a particular image in a batch by finding the rows that correspond to a particular batch index value.  This is implemented for us in the following function, which will plot the bounding boxes on top of the image.

In [None]:
def plot_with_bboxes(img, bboxes, cls, batch_idx=None, index=0, **kw):
    """Plot the bounding boxes on an image.

    Input:  img     The image, either as a 3-D tensor (one image) or a
                    4-D tensor (a stack of images).  In the latter case,
                    the index argument specifies which image to display.
            bboxes  The bounding boxes, as a N x 4 tensor, in normalized
                    XYWH format.
            cls     The class indices associated with the bounding boxes
                    as a N x 1 tensor.
            batch_idx   The index of each bounding box within the stack of
                        images.  Ignored if img is 3-D.
            index   The index of the image in the stack to be displayed.
                    Ignored if img is 3-D.
            **kw    All other keyword arguments are accepted and ignored.
                    This allows you to use dictionary unpacking with the
                    values produced by a YOLO DataLoader.
    """
    if img.ndim == 3:
        image = img[None, :]
        index = 0
        batch_idx = torch.zeros((len(cls),))
    elif img.ndim == 4:
        # Get around Black / Flake8 disagreement
        indp1 = index + 1
        image = img[index:indp1, :]

    inds = batch_idx == index
    res = ultralytics.utils.plotting.plot_images(
        images=image,
        batch_idx=batch_idx[inds] - index,
        cls=cls[inds].flatten(),
        bboxes=bboxes[inds],
        names=CLASS_DICT,
        threaded=False,
        save=False,
    )

    return Image.fromarray(res)

**Task 3.5.9:** Plot the image and bounding boxes for index 0 of this batch.

In [None]:
plot_with_bboxes(...)

That's ... weird looking.  It's not what our original images look like, is it?

<div class="alert alert-info" role="alert">
If it's not weird looking, try looking at another index.  Eventually you'll find something weird looking!
</div>

The file names from the batch are stored in the `im_file` key.  We can use that to look up the original image associated with this index and see what it looks like.

**Task 3.5.10:** Display the original image file for this index.

In [None]:
Image.open(...)

Comparing the two, we can see that the original image was distorted and combined with other images before being loaded into the YOLO model.  The YOLO model applies a number of augmentation steps by default.  (You can take a look at [all of the augmentation settings](https://docs.ultralytics.com/modes/train/#augmentation-settings-and-hyperparameters) in YOLO.)  This increases the diversity of training images, which should help the model generally.

### Data Augmentation with Torchvision

If you're training a YOLO model, it's generally best to use the built-in augmentation setting.  But in other cases, you may need to implement an augmentation system yourself.  Torchvision makes this easy by providing a number of augmentation transforms in its transforms version 2 (v2) module.

To demonstrate this, we'll load a sample image.  The code below will get the file paths for `01.jpg` and its associated label file.  (It's written so that it works whether the image ended up in the training or validation split.)

In [None]:
yolo_base = pathlib.Path("data_yolo")
sample_fn = next((yolo_base / "images").glob("*/01.jpg"))
sample_labels = next((yolo_base / "labels").glob("*/01.txt"))

print(sample_fn)
print(sample_labels)

**Task 3.5.11:** Load the image with PIL.

In [None]:
sample_image = ...

sample_image

**Task 3.5.12:** Convert the image to a tensor.  In the transforms version 2 module, this can be done with the confusingly-named `ToImage` transform.

In [None]:
sample_torch = ...

print(sample_torch.shape)

The bounding boxes are stored in the label file.  Let's take a look a the first five lines to remember what it looks like.

In [None]:
!head -n5 $sample_labels

Each line represents a bounding box.  The first element is the class index.  This is followed by the _x_ and _y_ coordinates of the box center, the width, and the height.

**Task 3.5.13:** Load the bounding box data into a variable named `label_data`.  It should be a list of the bounding boxes.  Each bounding box will itself be a list of five strings in the same order they are in the file.  Don't worry about converting the strings to numbers yet.

In [None]:
# Load the data into `label_data`

label_data[:5]

**Task 3.5.14:** Create a tensor containing the class indices.  For compatibility with our plotting function it should be a $N\times 1$ tensor.

In [None]:
classes = ...

print("Tensor shape:", classes.shape)
print("First 5 elements:\n", classes[:5])

**Task 3.5.15:** Load the bounding box coordinates into a $N\times 4$ tensor.

In [None]:
bboxes = ...

print("Tensor shape:", bboxes.shape)
print("First 5 elements:\n", bboxes[:5])

All of these coordinates are normalized by the width or height, as appropriate.  This won't work with transformations like rotation, which need the same units used on each axis.

**Task 3.5.16:** Convert the bounding box coordinates to pixel units.

In [None]:
sample_width, sample_height = sample_image.size

scale_factor = ...

bboxes_pixels = bboxes * scale_factor

print("Tensor shape:", bboxes_pixels.shape)
print("First 5 elements:\n", bboxes_pixels[:5])

In order for the transformations to know how to transform the bounding boxes, they need to know that the coordinates represent the centers and dimensions of the boxes.  This is done by creating a special `BoundingBoxes` tensor.  This type has a `format` attribute.  By setting this to `"CXCYWH"`, we're telling it that the columns represent the Center *X* coordinate, the Center *Y* coordinate, the Width, and the Height.  This tensor also is given the size of the image, so it doesn't need to look that up for transformations.

In [None]:
bboxes_tv = torchvision.tv_tensors.BoundingBoxes(
    bboxes_pixels,
    format="CXCYWH",
    # Yes, that's right.  Despite using width x height everywhere
    # else, here we have to specify the image size as height x
    # width.
    canvas_size=(sample_height, sample_width),
)

print("Tensor type:", type(bboxes_tv))
print("First 5 elements:\n", bboxes_tv[:5])

Let's double check that we did all of those conversions correctly.  Do the bounding boxes line up with the correct objects?

In [None]:
plot_with_bboxes(sample_torch, bboxes, classes)

If everything looks good, we'll introduce some transformations.  The first one will be a horizontal flip.  Many everyday objects have bilateral symmetry (or nearly so), so a flipped image will still have the same object classes in it.  This makes a horizontal flip a good data augmentation transformation.

(In contrast, up/down symmetry is less common.  A vertical flip is generally not as useful, unless you need to recognize upside-down objects.)

The transforms version 2 module has a `RandomHorizontalFlip` transformation.  This takes the probability of a flip as an argument.

**Task 3.5.17:** Use the `RandomHorizontalFlip` transformation to flip the sample image.  Set `p=1` to ensure that the flip happens.

In [None]:
flipped = ...

plot_with_bboxes(flipped, bboxes_tv, classes)

The image has flipped, but the bounding boxes are still in their original locations.  Note that the bus is now in the bottom left of the image.  Its bounding box is still at the bottom right, and it now contains some asphalt, planters and trees.  If we fed this into a model, it would make the model worse, by confusing it as to what a bus looks like.

So, we need to transform the bounding box coordinates consistent with the image transformation.  The Torchvision version 2 transformations can take multiple arguments. They perform the same transformation on all of the arguments, returning a transformed version of each.  They also understand how to correctly transform the `BoundingBoxes` tensors, depending on their type.

**Task 3.5.18:** Use `RandomHorizontalFlip` to flip both the sample image and its bounding boxes.  Check that they line up correctly now.

In [None]:
flipped, flipped_bboxes = ...

plot_with_bboxes(flipped, flipped_bboxes, classes)

**Task 3.5.19:** Apply the `RandomRotation` transformation.  This takes an argument of the maximum number of degrees to rotate the image.  Set it to 90.

In [None]:
rotated, rotated_bboxes = ...

plot_with_bboxes(rotated, rotated_bboxes, classes)

Multiple augmentation techniques can be chained together to produce even more diversity in the training images.  Within Torchvision, this can be accomplished by the `Compose` element.

**Task 3.5.20:** Create an augmentation pipeline that combines the `RandomHorizontalFlip` with the `RandomRotation`.  This time, set the probability of the flip to 50%.

In [None]:
transforms = v2.Compose(
    [
        ...
    ]
)

transformed, transformed_bboxes = ...

plot_with_bboxes(transformed, transformed_bboxes, classes)

There are a large number of transformations that can be used for data augmentation.  Scroll through [the documentation](https://pytorch.org/vision/stable/transforms.html#v2-api-reference-recommended) to get a view of the range of possibilities.

In addition to the transforms we've already used, note:
- [`RandomResizedCrop`](https://pytorch.org/vision/stable/generated/torchvision.transforms.v2.RandomResizedCrop.html#torchvision.transforms.v2.RandomResizedCrop) will randomly crop the image down, and then it resizes the output to a set dimension.
- [`ColorJitter`](https://pytorch.org/vision/stable/generated/torchvision.transforms.v2.ColorJitter.html#torchvision.transforms.v2.ColorJitter) can randomly adjust the brightness, contrast, saturation, and hue of the image, within specified ranges.

**Task 3.5.21:** Create an augmentation pipeline that applies several of these transformations.  Choose reasonable values for the parameters.  Check that the bounding boxes are transformed correctly through this.

<div class="alert alert-info" role="alert">
There's no right answer.  A good choice of augmentations depends heavily on the problem you're trying to model.
</div>

In [None]:
transforms = ...

transformed, transformed_bboxes = ...
plot_with_bboxes(transformed, transformed_bboxes, classes)

You can run the transformation several times to see the different types of images that result.  This greater diversity of training images will help models learn to generalize instead of memorizing during training.

---
This file &#169; 2024 by [WorldQuant University](https://www.wqu.edu/) is licensed under [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/).