# YOLO V1 implementation
In this notebook the YOLOv1 will implemented based on the original [paper](https://arxiv.org/pdf/1506.02640v5.pdf)

TODO: 
* [ ] Need to rewrite the plot function so to give the name and the probability prediction for each bounding box
* [ ] Get more metrics from the training function (e.g. training and validation losses)
* [ ] Write a function that will plot the training and validation loss as well as the training and validation accuracy
* [ ] Use the model on the videos that I have from towing tank to see how well the algorithm performs
* [ ] Use images/videos with darker light conditions to train and test the model.

# How Yolo works
Yolo is an object detection algorithm and uses features that learned from a cnn network to detect objects. When prerforming object detection we want to correctly identify in the image the objects in the given image. Most of the classic aproaches in the object detection algorithms using the sliding window method where the classifier is run over evenly spaces lacations over the entire image. Such types of algorithms are the Deformable Parts Models (DPM), the R-CNN which uses proposal methods to generate the bounding boxes in the given image and then run the classifier on the proposed bounding boxes. This approch, and particullarly the DPM method is slow and not optimal for real time uses, and the improved version of R-CNN models is gaining some speed by strategically selecting interesting regions and run through them the classifier.

On the other hand YOLO algorithm based on the idea to split the image in a grid, for axample for a given image we can split it in a 3 by 3 grid (**_SxS = 3x3_**) which gives as 9 cells. As the below image shows, the image consists by a 3 by 3 grid with 9 cells, and each cell has 2 bouning boxes (**_B_**) which finally will give the prediction bounding boxe for the object in the image.

![image](../notes/images/image.png) 
    
Figure 1

Generally, the YOLO algorithm has the following steps:

1. Divide the image into cells with an **_SxS_** grid
2. Each cell predicts **_B_** bounding boxes (_A cell is responsible for detecting an object if the object's bounding box is within the cell_
3. Return bounding boxes above a given confidence threshold. _The algorithm will show only the bounding box with the highest probability confidence (e.g. 0.90) and will reject all boxes with less values than this threshold_.

**Note:** In practice will like touse larger values of $S and B$, such as $S = 19$ and $B = 5$ to identify more objects, and each cell will output a prediction with a corresponding bounding box for a given image.

The below image shows the YOLO algorithm's result, which returns the bounding boxes for the detected objects. For the algorithm to perform efficiently needs to be trained sufficiently because with each iteration (epoch), the detection accuracy increases. Also, the bounding boxes can be in more than one cells without any issue, and the detection is performed in the cell where the midpoint of the bounding box belongs.

![image](../notes/images/image2.png)

Figure 2

The YOLO object detection algorithm is faster architecture because uses one Convolutional Neural Network (CNN) to run all components in the given image in contrast with the naive sliding window approach where for each image the algorithm (DPM, R-CNN etc) needs to scan it step by step to find the region of interest, the detected objects. The R-CNN for example needs classify around 2000 regions per image which makes the algorithm very time consuming and it's not ideal for real time applications.

The figure below shows how the YOLO model creates an $S x S$ grid in the input image and then for each grid cell creates multiple bounding boxes as well as class probability map, and at the end gives the final predictions of the objects in the image.

![image](../notes/images/yolo_paper.png "YOLO model image processing")

Figure 3

## How the bouning boxes are encoded in YOLO?
One of the most important aspects of this algorithm is the it builds and specifies the bounding boxes, and the other is the the Loss function. The algorithm uses five components to predict an output:

1. The centre of a bounding box $(b_x b_y)$ relative to the bounds of the grid cell
2. The width $(b_w)$
3. The height $(b_h)$. The width and the height of the entire image.
4. The class of the object $(c)$
5. The prediction confidence $(p_c)$ which is the probability of the existance of an object within the bounding box.

Thus, we, optimally, want one bounding box for each object in the given image and we can be sure that only one object will be predicted for each object by taking the midpoint of the cell that is responsible for outputing that object.

So, each bounding box for each cell will have $[x_1, y_1, x_2, y_2]$ coordinates where in the YOLO algorithm will be $[x, y, w, h]$

* $x$ and $y$ will be the coordinates for object midpoint in cell -> these actually will be between $0 - 1$
* $w$ and $h$ will be the width and the heigth of that object relative to the cell -> $w$ can be _greater_ than 1, if the object is wider than the cell, and $h$ can also be _greater_ than 1, if the object is taller than the cell

The labels will look like the following:

$label_{cell} = [c_1, c_2, ..., c_5, p_c, x, y, w,h]$

where:

* $c_1$ to $c_5$ will be the dataset classes
* $p_c$ probability that there is an object (1 or 0)
* $x, y, w,h$ are the coordinates of the bounding boxes


Predictions will look very similar, but will output two bouning boxes (will specialise to output different bounfding boxes (wide vs tall).

$pred_{cell} = [c_1, c_2, ..., c_5, p_{c_1}, x_1, y_1, w_1, h_1, p_{c_2}, x_2, y_2, w_2, h_2]$

**Note:** A cell can only detect one object, this is also one of the YOLO limitations (we can have finer grid to achieve multiple detections as mentioned above.

This is for every cell and the **target** shape for one image will be $(S, S, 10)$

where:

* $S * S$ is the grid size
* $5$ is for the class predictions, $1$ is for the probability score, and $4$ is for the bouning boxes

The **predictions** shape will be $(S, S, 15)$ where there is and additional probability score and four extra bounding box predictions.

## The model architecture
![image](../notes/images/model.png)

The original YOLO model consists of 24 convolutional layers followed by 2 fully connected layers.
The model accepts 448x448 images and at the first layer has a 7x7 kernel with 64 output filters with stride of 2 (**also need to have a padding of 3 to much the dimensions**), also there is a 2x2 Maxpool Layer with the stride of 2. Simillarly, the rest of the model consists of convolutional layers and Maxpool layers except the last two layers where there are a fully conected layers where the first one takes as and input the convolutional output and make it a linear layer of 4096 feature vector and outputs to the fully connected which is reshaped to become a 7 by 7 by 30 which is the final split size of the image ($S = 7$ which is a $7$ x $7$ grid) with a vector output of 30 (in my case this will be 15).

To help whith the architecture building it will be usefull to pre-determine the architecure configuration:

```python
architecture_config = [
    # Tuple: (kernel_size, num_filters, stride, padding)
    (7, 64, 2, 3), 
    "M",    # M stands for the MaxPolling Layer and has stride 2x2 and kernel 2x2
    (3, 192, 1, 1),
    "M",
    (1, 128, 1, 0),
    (3, 256, 1, 1), 
    (1, 256, 1, 0), 
    (3, 512, 1, 1), 
    "M",
    # List of tuples: (kernel_size, num_filters, stride, padding), num_of_repeats
    [(1, 256, 1, 0), (3, 512, 1, 1), 4],
    (1, 512, 1, 0), 
    (3, 1024, 1, 1),
    "M", 
    [(1, 512, 1, 0), (3, 1024, 1, 1), 2], 
    (3, 1024, 1, 1),
    (3, 1024, 2, 1),
    (3, 1024, 1, 1), 
    (3, 1024, 1, 1),
]
```

## The Loss Function

The YOLO loss function is the second most important aspect of the algorithm. The basic concept behind all these losses is that are the sum squared error, and if we look at the first part of the loss function is going to be the loss for the box coordinate for the midpoint (taking the $x$ midpoint value and subtractining from the predicted $\hat{x}$ squared). The $\mathbb{1}_{ij}^{obj}$ is the identity function which is calculated when there is an object in the cell, so summurizing there is:

* $\mathbb{1}_{i}^{obj}$ is 1 when there is an object in the cell $i$ otherwise is 0.
* $\mathbb{1}_{ij}^{obj}$ is the $j^{th}$ bounding box prediction for the cell $i$ 
* $\mathbb{1}_{ij}^{noobj}$ has the same concept with the previous one, except that is 1 when there is no object and 0 when there is an object. 

So, to know which bounding box is responsible for outputing that bounding box is by looking at the cell and see which of the predicted bounding boxes has the highest Intersection over Union (IoU) value with the target bouning box. The one with the highest IoU will be the responsible bounding box for the prediction and will be send to the loss function. 

\begin{align}
&\lambda_{coord} \sum_{i=0}^{S^2}\sum_{j=0}^B \mathbb{1}_{ij}^{obj}[(x_i-\hat{x}_i)^2 + (y_i-\hat{y}_i)^2 ] \\&+ \lambda_{coord} \sum_{i=0}^{S^2}\sum_{j=0}^B \mathbb{1}_{ij}^{obj}[(\sqrt{w_i}-\sqrt{\hat{w}_i})^2 +(\sqrt{h_i}-\sqrt{\hat{h}_i})^2 ]\\
&+ \sum_{i=0}^{S^2}\sum_{j=0}^B \mathbb{1}_{ij}^{obj}(C_i - \hat{C}_i)^2 + \lambda_{noobj}\sum_{i=0}^{S^2}\sum_{j=0}^B \mathbb{1}_{ij}^{noobj}(C_i - \hat{C}_i)^2 \\
&+ \sum_{i=0}^{S^2} \mathbb{1}_{i}^{obj}\sum_{c \in classes}(p_i(c) - \hat{p}_i(c))^2 \\
\end{align}




# Algorithm Implementation


In [8]:
# imports
import os
import logging
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torch.optim as optim
import matplotlib.pyplot as plt

from PIL import Image
from tqdm import tqdm
from torch.utils.data import DataLoader
from torch.random import seed
from torch.functional import chain_matmul 
from torch.nn.modules import padding
from torch.utils.tensorboard import SummaryWriter

In [9]:
# Get the correct path for utils.py script
if os.getcwd() == '/media/ioannis/DATA/Documents/Machine_learning/Project/src/yolo_v1':
    print(f"The working direcory is: {os.getcwd()}")
else:    
    os.chdir("../src/yolo_v1/")
    print(f"Change to yolo dir: {os.getcwd()}")

Change to yolo dir: /media/ioannis/DATA/Documents/Machine_learning/Project/src/yolo_v1


In [10]:
from utils import intersection_over_union
from utils import(
    intersection_over_union,
    non_max_suppression,
    mean_average_precision,
    cellboxes_to_boxes,
    get_bboxes,
    plot_image,
    save_checkpoint,
    load_checkpoint
)

### YOLO model architecure

#### Architecture configuration based on YOLO paper

In [11]:
architecture_config = [
    (7, 64, 2, 3), 
    "M",    # M stands for the MaxPolling Layer and has stride 2x2 and kernel 2x2
    (3, 192, 1, 1),
    "M",
    (1, 128, 1, 0),
    (3, 256, 1, 1), 
    (1, 256, 1, 0), 
    (3, 512, 1, 1), 
    "M", 
    [(1, 256, 1, 0), (3, 512, 1, 1), 4],
    (1, 512, 1, 0), 
    (3, 1024, 1, 1),
    "M", 
    [(1, 512, 1, 0), (3, 1024, 1, 1), 2], 
    (3, 1024, 1, 1),
    (3, 1024, 2, 1),
    (3, 1024, 1, 1), 
    (3, 1024, 1, 1),
]

#### The YOLO Architecture
The CNNBlock class will be used as a block code to build the various convolutional layers in the YoloV1 class, which is the main model.

In [12]:
class CNNBlock(nn.Module):
    """
    This CNN block is used to as a blueprint of the conv layers for the YoloV1 model.
    Need to use convolutional layers multiple times, so we'll use the CNNBlock for easy of use.

    Args:
        nn ([type]): [description]
    """
    def __init__(self, in_channels, out_channels, **kwargs):
        super(CNNBlock, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, bias=False, **kwargs)
        self.batchnorm = nn.BatchNorm2d(out_channels)
        self.leakyrelu = nn.LeakyReLU(0.1)

    def forward(self, x):
        x = self.leakyrelu(self.batchnorm(self.conv(x)))
        return x
    
class YoloV1(nn.Module):
    def __init__(self, in_channels=3, **kwargs):
        super(YoloV1, self).__init__()
        self.architecture = architecture_config
        self.in_channels = in_channels
        self.darknet = self._create_conv_layers(self.architecture)
        self.fcs = self._create_fcs(**kwargs)

    def forward(self, x):
        x = self.darknet(x)
        return self.fcs(torch.flatten(x, start_dim=1))

    def _create_conv_layers(self, architecture):
        layers = []
        in_channels = self.in_channels

        for x in architecture:
            if type(x) == tuple:
                layers += [
                    CNNBlock(
                        in_channels, x[1], kernel_size=x[0], stride=x[2], padding=x[3],
                    )
                ]
                in_channels = x[1]

            elif type(x) == str:
                layers += [nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2))]

            elif type(x) == list:
                conv1 = x[0]
                conv2 = x[1]
                num_repeats = x[2]

                for _ in range(num_repeats):
                    layers += [
                        CNNBlock(
                            in_channels,
                            conv1[1],
                            kernel_size=conv1[0],
                            stride=conv1[2],
                            padding=conv1[3],
                        )
                    ]
                    layers += [
                        CNNBlock(
                            conv1[1],
                            conv2[1],
                            kernel_size=conv2[0],
                            stride=conv2[2],
                            padding=conv2[3],
                        )
                    ]
                    in_channels = conv2[1]

        return nn.Sequential(*layers)

    def _create_fcs(self, split_size, num_boxes, num_classes):
        S, B, C = split_size, num_boxes, num_classes

        return nn.Sequential(
            nn.Flatten(),
            nn.Linear(1024 * S * S, 496),
            nn.Dropout(0.0),
            nn.LeakyReLU(0.1),
            nn.Linear(496, S * S * (C + B * 5)),  # (S, S, 30) where C + B * 5 = 30
        )

In [13]:
def test(S=7, B=2, C=5):
    """
    A function to test YoloV1 model
    """
    model = YoloV1(split_size=S, num_boxes=B, num_classes=C)
    x = torch.randn((2, 3, 448, 448))
    print(model(x).shape)


test()

torch.Size([2, 735])


In [39]:
# YOLO V1 model summary

from torchsummary import summary
DEVICE = "cuda" if torch.cuda.is_available else "cpu"
S=7
B=2 
C=5
model = YoloV1(split_size=S, num_boxes=B, num_classes=C).to(DEVICE)
summary(model, (3, 448, 448))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 224, 224]           9,408
       BatchNorm2d-2         [-1, 64, 224, 224]             128
         LeakyReLU-3         [-1, 64, 224, 224]               0
          CNNBlock-4         [-1, 64, 224, 224]               0
         MaxPool2d-5         [-1, 64, 112, 112]               0
            Conv2d-6        [-1, 192, 112, 112]         110,592
       BatchNorm2d-7        [-1, 192, 112, 112]             384
         LeakyReLU-8        [-1, 192, 112, 112]               0
          CNNBlock-9        [-1, 192, 112, 112]               0
        MaxPool2d-10          [-1, 192, 56, 56]               0
           Conv2d-11          [-1, 128, 56, 56]          24,576
      BatchNorm2d-12          [-1, 128, 56, 56]             256
        LeakyReLU-13          [-1, 128, 56, 56]               0
         CNNBlock-14          [-1, 128,

So, if we manually calculate the tensor shape we will get:

\begin{align}
S * S * (B * 5 + C) 
=> 7 * 7 * (2 * 5 + 5) = 245 * 3 = 735
\end{align}

**Note:** 3 is the number of channels in the photo (RGB)

### Code implementation of Yolo loss

In [14]:
class YoloLoss(nn.Module):
    def __init__(self, S=7, B=2, C=5):
        super(YoloLoss, self).__init__()
        self.mse = nn.MSELoss(reduction="sum")
        self.S = S
        self.B = B
        self.C = C
        self.lambda_noobj = 0.5
        self.lambda_coord = 5

    def forward(self, predictions, target):
        predictions = predictions.reshape(
            -1,  self.S, self.S, self.C + self.B * 5
            )
        iou_b1 = intersection_over_union(predictions[..., 6:10], target[..., 6:10])
        iou_b2 = intersection_over_union(predictions[..., 11:15], target[..., 6:10])
        ious = torch.cat([iou_b1.unsqueeze(0), iou_b2.unsqueeze(0)], dim=0)

        iou_maxes, best_box = torch.max(ious, dim=0)
        exists_box = target[..., 5].unsqueeze(3)  # Iobj_i identity_of_object_i

        # For Box Coordinates
        box_predictions = exists_box * (
            (
                best_box * predictions[..., 11:15]
                + (1 - best_box) * predictions[..., 6:10]
             )
        )
        box_targets = exists_box * target[..., 6:10]

        box_predictions[..., 2:4] = torch.sign(box_predictions[..., 2:4]) * torch.sqrt(
            torch.abs(box_predictions[..., 2:4] + 1e-6))

        box_targets[..., 2:4] = torch.sqrt(box_targets[..., 2:4])

        box_loss = self.mse(
            torch.flatten(box_predictions, end_dim=-2),
            torch.flatten(box_targets, end_dim=-2),
        )

        # For Object Loss
        pred_box = (
            best_box *
            predictions[..., 10:11] + (1 - best_box) *
            predictions[..., 5:6]
        )

        object_loss = self.mse(
            torch.flatten(exists_box * pred_box),
            torch.flatten(exists_box * target[..., 5:6])
        )

        # For no Object Loss
        # (N, S, S, 1) -> (N, S*S)
        no_object_loss = self.mse(
            torch.flatten((1 - exists_box) * predictions[..., 5:6], start_dim=1),
            torch.flatten((1 - exists_box) * target[..., 5:6], start_dim=1)
        )

        no_object_loss += self.mse(
            torch.flatten((1 - exists_box) * predictions[..., 10:11], start_dim=1),
            torch.flatten((1 - exists_box) * target[..., 5:6], start_dim=1)
        )

        # For Class Loss
        # (N, S, S, 20) -> (N*S*S, 20)
        class_loss = self.mse(
            torch.flatten(exists_box * predictions[..., :5], end_dim=-2,),
            torch.flatten(exists_box * target[..., :5], end_dim=-2,),
        )

        loss = (
            self.lambda_coord * box_loss    # First two rows of loss in paper
            + object_loss
            + self.lambda_noobj * no_object_loss
            + class_loss
        )

        return loss

### Dataset Class for custom dataset from Labs

In [15]:
class LabDataset(torch.utils.data.Dataset):
    def __init__(
            self, csv_file, img_dir, label_dir, S=7, B=2, C=5, transform=None
    ):
        self.annotations = pd.read_csv(csv_file)
        self.img_dir = img_dir
        self.label_dir = label_dir
        self.transform = transform
        self.S = S
        self.B = B
        self.C = C

    def __len__(self):
        return len(self.annotations)

    def __getitem__(self, index):
        label_path = os.path.join(self.label_dir,
                                  self.annotations.iloc[index, 1])
        boxes = []
        # read the labels in the yolo annotations and append to boxes list
        with open(label_path) as f:
            for label in f.readlines():
                # print(label)
                class_label, x, y, width, height = [
                    x for x in label.replace("\n", " ").split()
                ]

                boxes.append([int(class_label), float(x), float(y), float(width), float(height)])
        # print(len(boxes))
        # read the images of the dataset
        img_path = os.path.join(self.img_dir,
                                self.annotations.iloc[index, 0])
        image = Image.open(img_path)
        boxes = torch.tensor(boxes)

        if self.transform:
            image, boxes = self.transform(image, boxes)

        label_matrix = torch.zeros((self.S, self.S, self.C + 5 * self.B))

        for box in boxes:
            class_label, x, y, width, height = box.tolist()
            class_label = int(class_label)
            i, j = int(self.S * y), int(self.S * x)
            x_cell, y_cell = self.S * x - j, self.S * y - i

            width_cell, height_cell = (
                width * self.S,
                height * self.S,
            )

            if label_matrix[i, j, 5] == 0:
                label_matrix[i, j, 5] = 1

                box_coordinates = torch.tensor(
                    [x_cell, y_cell, width_cell, height_cell]
                )

                label_matrix[i, j, 6:10] = box_coordinates

                # Set one hot encoding for class_labels
                label_matrix[i, j, class_label] = 1

        return image, label_matrix

In [16]:
# use the logging module to create a training log file
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
file_Handler = logging.FileHandler(f'{time.strftime("%Y_%m_%d")}_training.log', mode="w")
logger.addHandler(file_Handler)
logger.info("******" * 20)

# to get the same dataset loading each time 
seed = 123
torch.manual_seed(seed)

# Setup the Hyperparameters

LEARNING_RATE = 2E-5
DEVICE = "cuda" if torch.cuda.is_available else "cpu"
BATCH_SIZE = 4
WEIGHT_DECAY = 0
EPOCHS = 100
NUM_WORKERS = 4
PIN_MEMORY = True
LOAD_MODEL = False
LOAD_MODEL_FILE = "model_training/overfit.pth.tar"
IMG_DIR = "dataset/images"
LABEL_DIR = "dataset/labels"


class Compose(object):
    def __init__(self, transforms):
        self.transforms = transforms

    def __call__(self, img, bboxes):
        for t in self.transforms:
            img, bboxes = t(img), bboxes
        
        return img, bboxes


transform = Compose([
    transforms.Resize((448, 448)),
    transforms.ToTensor()
    ])


def train_fn(train_loader, model, optimizer, loss_fn):
    loop = tqdm(train_loader, leave=True)
    mean_loss = []

    for batch_idx, (x, y) in enumerate(loop):
        x, y = x.to(DEVICE), y.to(DEVICE)
        out = model(x)
        loss = loss_fn(out, y)
        mean_loss.append(loss.item())
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # Update the progress bar
        loop.set_postfix(loss=loss.item())
    
    print(f"Mean loss was {sum(mean_loss) / len(mean_loss)}")


def main():
    model = YoloV1(split_size=7, num_boxes=2, num_classes=5).to(DEVICE)
    optimizer = optim.Adam(
        model.parameters(), lr=LEARNING_RATE,
        weight_decay=WEIGHT_DECAY
    )
    loss_fn = YoloLoss()

    if LOAD_MODEL:
        load_checkpoint(torch.load(LOAD_MODEL_FILE), model, optimizer)  

    train_dataset = LabDataset(
        "dataset/train.csv",
        transform=transform,
        img_dir=IMG_DIR,
        label_dir=LABEL_DIR,
    )

    test_dataset = LabDataset(
        "dataset/test.csv",
        transform=transform,
        img_dir=IMG_DIR,
        label_dir=LABEL_DIR,
    )

    train_loader = DataLoader(
        dataset=train_dataset,
        batch_size=BATCH_SIZE,
        num_workers=NUM_WORKERS,
        pin_memory=PIN_MEMORY,
        shuffle=True,
        drop_last=True
    )

    test_loader = DataLoader(
        dataset=test_dataset,
        batch_size=BATCH_SIZE,
        num_workers=NUM_WORKERS,
        pin_memory=PIN_MEMORY,
        shuffle=True,
        drop_last=True
    )

    for epoch in range(EPOCHS):
        # for x, y in train_loader:
        #     x = x.to(DEVICE)
        #     for idx in range(8):
        #         bboxes = cellboxes_to_boxes(model(x))
        #         bboxes = non_max_suppression(bboxes[idx], iou_threshold=0.5, threshold=0.4, box_format="midpoint")
        #         plot_image(x[idx].permute(1,2,0).to("cpu"), bboxes)
        # import sys
        # sys.exit()

        pred_boxes, target_boxes = get_bboxes(
            train_loader, model, iou_threshold=0.5, threshold=0.4
        )
        mean_average_prec = mean_average_precision(
            pred_boxes, target_boxes, iou_threshold=0.5, box_format="midpoint"
        )

        print(f"Train mAP: {mean_average_prec}")

        # if mean_average_prec > 0.9:
        #     checkpoint = {
        #         "state_dict": model.state_dict(),
        #         "optimizer": optimizer.state_dict(),
        #     }
        #     save_checkpoint(checkpoint, filename=LOAD_MODEL_FILE)
        #     import time
        #     time.sleep(10)

        train_fn(train_loader, model, optimizer, loss_fn)

    for x, y in train_loader:
        x = x.to(DEVICE)
        # train_len = open("dataset/train.csv", "r").readlines()
        for idx in range(4):
            bboxes = cellboxes_to_boxes(model(x))
            bboxes = non_max_suppression(bboxes[idx], iou_threshold=0.5, threshold=0.4, box_format="midpoint")
            plot_image(x[idx].permute(1, 2, 0).to("cpu"), bboxes)

In [10]:
if __name__ == "__main__":
    main()

### Results


#### Training mean Average Precision

```bash
torch.Size([2, 735])
Train mAP: 0.0
100%|██████████| 52/52 [00:06<00:00,  7.53it/s, loss=230]
Mean loss was 277.01525966937726
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 1.5975972473825095e-06
100%|██████████| 52/52 [00:06<00:00,  7.73it/s, loss=125]
Mean loss was 150.3601137307974
Train mAP: 0.00036260823253542185
100%|██████████| 52/52 [00:06<00:00,  7.59it/s, loss=131]
Mean loss was 111.0760328586285
Train mAP: 0.0010389338713139296
100%|██████████| 52/52 [00:06<00:00,  7.56it/s, loss=67.1]
Mean loss was 93.82369613647461
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.0017443771939724684
100%|██████████| 52/52 [00:07<00:00,  7.35it/s, loss=101]
Mean loss was 85.7502593260545
Train mAP: 0.0019065936794504523
100%|██████████| 52/52 [00:06<00:00,  7.59it/s, loss=73.2]
Mean loss was 79.8317461013794
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.0021070390939712524
100%|██████████| 52/52 [00:06<00:00,  7.66it/s, loss=42.9]
Mean loss was 74.16719363285945
Train mAP: 0.008167828433215618
100%|██████████| 52/52 [00:06<00:00,  7.66it/s, loss=71]
Mean loss was 68.89766018207257
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.01164398156106472
100%|██████████| 52/52 [00:06<00:00,  7.49it/s, loss=59.6]
Mean loss was 65.96115589141846
Train mAP: 0.022016068920493126
100%|██████████| 52/52 [00:06<00:00,  7.59it/s, loss=60.7]
Mean loss was 63.24265480041504
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.026762772351503372
100%|██████████| 52/52 [00:06<00:00,  7.63it/s, loss=49.2]
Mean loss was 59.42209089719332
Train mAP: 0.03686966747045517
100%|██████████| 52/52 [00:06<00:00,  7.46it/s, loss=70.4]
Mean loss was 57.88046492063082
Train mAP: 0.04176099970936775
100%|██████████| 52/52 [00:06<00:00,  7.50it/s, loss=69.4]
Mean loss was 53.74426863743709
Train mAP: 0.06215560436248779
100%|██████████| 52/52 [00:06<00:00,  7.48it/s, loss=47.8]
Mean loss was 53.232022505540115
Train mAP: 0.04161568731069565
100%|██████████| 52/52 [00:06<00:00,  7.63it/s, loss=66.5]
Mean loss was 51.478044509887695
Train mAP: 0.05706929415464401
100%|██████████| 52/52 [00:06<00:00,  7.64it/s, loss=71.5]
Mean loss was 49.621550193199745
Train mAP: 0.09943243861198425
100%|██████████| 52/52 [00:07<00:00,  6.60it/s, loss=39.7]
Mean loss was 46.71747504747831
Train mAP: 0.09798908233642578
100%|██████████| 52/52 [00:06<00:00,  7.59it/s, loss=37.2]
Mean loss was 44.66095953721266
Train mAP: 0.09688045084476471
100%|██████████| 52/52 [00:07<00:00,  7.17it/s, loss=39.6]
Mean loss was 46.553726746485786
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.10760778188705444
100%|██████████| 52/52 [00:06<00:00,  7.56it/s, loss=48.2]
Mean loss was 45.06020501943735
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.13196484744548798
100%|██████████| 52/52 [00:07<00:00,  7.27it/s, loss=29.7]
Mean loss was 42.605860196627106
Train mAP: 0.12919731438159943
100%|██████████| 52/52 [00:07<00:00,  7.09it/s, loss=41]
Mean loss was 43.64498409858117
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.12473483383655548
100%|██████████| 52/52 [00:07<00:00,  7.26it/s, loss=37.6]
Mean loss was 42.67212941096379
Train mAP: 0.15012900531291962
100%|██████████| 52/52 [00:07<00:00,  7.35it/s, loss=40.9]
Mean loss was 41.00279015761156
Train mAP: 0.18024876713752747
100%|██████████| 52/52 [00:07<00:00,  7.01it/s, loss=36.3]
Mean loss was 41.815959233504074
Train mAP: 0.19966743886470795
100%|██████████| 52/52 [00:07<00:00,  7.22it/s, loss=33.7]
Mean loss was 41.42074548281156
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.14009423553943634
100%|██████████| 52/52 [00:07<00:00,  7.42it/s, loss=38.7]
Mean loss was 40.43391271737906
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.20363807678222656
100%|██████████| 52/52 [00:07<00:00,  7.23it/s, loss=25.4]
Mean loss was 36.65179105905386
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.23210477828979492
100%|██████████| 52/52 [00:07<00:00,  7.42it/s, loss=35.6]
Mean loss was 36.59433174133301
Train mAP: 0.20036259293556213
100%|██████████| 52/52 [00:07<00:00,  7.26it/s, loss=45.4]
Mean loss was 37.1617332972013
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.24710893630981445
100%|██████████| 52/52 [00:06<00:00,  7.44it/s, loss=59.9]
Mean loss was 36.56079688439002
Train mAP: 0.252134770154953
100%|██████████| 52/52 [00:06<00:00,  7.52it/s, loss=23.9]
Mean loss was 34.84693479537964
Train mAP: 0.3164646029472351
100%|██████████| 52/52 [00:07<00:00,  7.13it/s, loss=27.4]
Mean loss was 32.70728085591243
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.27682849764823914
100%|██████████| 52/52 [00:07<00:00,  6.77it/s, loss=22.3]
Mean loss was 33.125311888181244
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.31891489028930664
100%|██████████| 52/52 [00:06<00:00,  7.52it/s, loss=37.4]
Mean loss was 31.009997001061073
Train mAP: 0.2684682011604309
100%|██████████| 52/52 [00:07<00:00,  7.38it/s, loss=26.1]
Mean loss was 29.65498792208158
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.31201815605163574
100%|██████████| 52/52 [00:06<00:00,  7.47it/s, loss=26.2]
Mean loss was 29.540525986598087
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.2407427281141281
100%|██████████| 52/52 [00:06<00:00,  7.48it/s, loss=35.6]
Mean loss was 33.021992500011734
Train mAP: 0.3038192093372345
100%|██████████| 52/52 [00:07<00:00,  6.93it/s, loss=25.7]
Mean loss was 31.74266558427077
Train mAP: 0.3210943639278412
100%|██████████| 52/52 [00:06<00:00,  7.58it/s, loss=26.2]
Mean loss was 29.95376906028161
Train mAP: 0.35638627409935
100%|██████████| 52/52 [00:07<00:00,  7.20it/s, loss=34.3]
Mean loss was 27.2260375389686
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.33736562728881836
100%|██████████| 52/52 [00:07<00:00,  7.14it/s, loss=17.3]
Mean loss was 28.47225623864394
Train mAP: 0.3314797282218933
100%|██████████| 52/52 [00:07<00:00,  7.34it/s, loss=21.2]
Mean loss was 27.52292165389428
Train mAP: 0.38283103704452515
100%|██████████| 52/52 [00:06<00:00,  7.60it/s, loss=29.2]
Mean loss was 26.119591272794285
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.41640034317970276
100%|██████████| 52/52 [00:07<00:00,  6.87it/s, loss=33.2]
Mean loss was 29.60625433921814
Train mAP: 0.25621533393859863
100%|██████████| 52/52 [00:06<00:00,  7.52it/s, loss=30.1]
Mean loss was 32.28092266963078
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.3336578905582428
100%|██████████| 52/52 [00:07<00:00,  7.30it/s, loss=18.4]
Mean loss was 26.53524373127864
Train mAP: 0.3782862722873688
100%|██████████| 52/52 [00:06<00:00,  7.60it/s, loss=20.1]
Mean loss was 24.900437575120193
Train mAP: 0.38781529664993286
100%|██████████| 52/52 [00:07<00:00,  7.16it/s, loss=30.8]
Mean loss was 23.54118515894963
Train mAP: 0.39370396733283997
100%|██████████| 52/52 [00:07<00:00,  6.68it/s, loss=23.4]
Mean loss was 22.866925588020912
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.41854867339134216
100%|██████████| 52/52 [00:07<00:00,  7.42it/s, loss=14.1]
Mean loss was 23.564331861642692
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.38859686255455017
100%|██████████| 52/52 [00:07<00:00,  7.30it/s, loss=21.6]
Mean loss was 23.251168507796066
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.4097711145877838
100%|██████████| 52/52 [00:07<00:00,  7.24it/s, loss=25.2]
Mean loss was 22.88079210428091
Train mAP: 0.48224368691444397
100%|██████████| 52/52 [00:06<00:00,  7.68it/s, loss=30.7]
Mean loss was 30.80921220779419
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.32274675369262695
100%|██████████| 52/52 [00:07<00:00,  7.25it/s, loss=24.8]
Mean loss was 28.22428868367122
Train mAP: 0.3558574318885803
100%|██████████| 52/52 [00:07<00:00,  7.35it/s, loss=21.9]
Mean loss was 26.428694394918587
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.3741680681705475
100%|██████████| 52/52 [00:06<00:00,  7.47it/s, loss=25.3]
Mean loss was 26.91001536295964
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.41832470893859863
100%|██████████| 52/52 [00:06<00:00,  7.58it/s, loss=16.7]
Mean loss was 23.46789888235239
Train mAP: 0.4454154372215271
100%|██████████| 52/52 [00:07<00:00,  7.31it/s, loss=21.8]
Mean loss was 19.80866953042837
Train mAP: 0.5059086084365845
100%|██████████| 52/52 [00:07<00:00,  7.14it/s, loss=24]
Mean loss was 17.913883154208843
Train mAP: 0.5745978355407715
100%|██████████| 52/52 [00:07<00:00,  7.35it/s, loss=13.1]
Mean loss was 17.709768258608303
Train mAP: 0.5389236211776733
100%|██████████| 52/52 [00:07<00:00,  6.70it/s, loss=17.3]
Mean loss was 18.873461264830368
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.4622669219970703
100%|██████████| 52/52 [00:07<00:00,  7.14it/s, loss=22.2]
Mean loss was 17.93864067701193
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.4740144610404968
100%|██████████| 52/52 [00:07<00:00,  6.87it/s, loss=15.5]
Mean loss was 17.754800374691303
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.47399455308914185
100%|██████████| 52/52 [00:07<00:00,  6.78it/s, loss=17.3]
Mean loss was 18.328510302763718
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.5266884565353394
100%|██████████| 52/52 [00:07<00:00,  7.18it/s, loss=16.2]
Mean loss was 19.317802851016705
Train mAP: 0.5007422566413879
100%|██████████| 52/52 [00:07<00:00,  7.07it/s, loss=17.1]
Mean loss was 18.273959544988777
Train mAP: 0.47520798444747925
100%|██████████| 52/52 [00:07<00:00,  7.07it/s, loss=13.7]
Mean loss was 18.052292035176205
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.5374414920806885
100%|██████████| 52/52 [00:07<00:00,  7.09it/s, loss=11.7]
Mean loss was 18.70733952522278
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.3883427679538727
100%|██████████| 52/52 [00:07<00:00,  7.14it/s, loss=14.2]
Mean loss was 18.25417786378127
Train mAP: 0.5039719939231873
100%|██████████| 52/52 [00:07<00:00,  7.23it/s, loss=10.5]
Mean loss was 15.796511576725887
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.5351071357727051
100%|██████████| 52/52 [00:07<00:00,  7.17it/s, loss=14.7]
Mean loss was 16.732610097298256
Train mAP: 0.45163068175315857
100%|██████████| 52/52 [00:07<00:00,  7.28it/s, loss=10.7]
Mean loss was 19.227208871107834
Train mAP: 0.45331859588623047
100%|██████████| 52/52 [00:07<00:00,  7.28it/s, loss=11.6]
Mean loss was 17.978595880361702
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.5259262919425964
100%|██████████| 52/52 [00:07<00:00,  7.01it/s, loss=26.4]
Mean loss was 20.94123187431922
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.4709042012691498
100%|██████████| 52/52 [00:07<00:00,  6.90it/s, loss=14.5]
Mean loss was 30.559716609808113
Train mAP: 0.43308621644973755
100%|██████████| 52/52 [00:07<00:00,  7.35it/s, loss=13.3]
Mean loss was 26.60452835376446
Train mAP: 0.41244402527809143
100%|██████████| 52/52 [00:07<00:00,  7.01it/s, loss=19]
Mean loss was 21.87913021674523
Train mAP: 0.4528992772102356
100%|██████████| 52/52 [00:07<00:00,  6.90it/s, loss=9.83]
Mean loss was 18.8553817822383
Train mAP: 0.481926828622818
100%|██████████| 52/52 [00:07<00:00,  6.57it/s, loss=26.3]
Mean loss was 16.028261413941017
Train mAP: 0.5580487847328186
100%|██████████| 52/52 [00:07<00:00,  6.71it/s, loss=12.4]
Mean loss was 15.23406903560345
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.5300213694572449
100%|██████████| 52/52 [00:07<00:00,  7.27it/s, loss=10.8]
Mean loss was 13.7139373926016
Train mAP: 0.5464785099029541
100%|██████████| 52/52 [00:07<00:00,  6.68it/s, loss=11.6]
Mean loss was 14.1117734175462
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.579340934753418
100%|██████████| 52/52 [00:07<00:00,  7.08it/s, loss=12.6]
Mean loss was 15.100121534787691
Train mAP: 0.5814196467399597
100%|██████████| 52/52 [00:07<00:00,  7.15it/s, loss=11.3]
Mean loss was 16.984538490955646
Train mAP: 0.5180677771568298
100%|██████████| 52/52 [00:07<00:00,  6.94it/s, loss=10]
Mean loss was 16.059333617870625
Train mAP: 0.5292803049087524
100%|██████████| 52/52 [00:07<00:00,  7.15it/s, loss=18.4]
Mean loss was 13.16787363932683
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.586297869682312
100%|██████████| 52/52 [00:07<00:00,  7.27it/s, loss=8.94]
Mean loss was 12.104050205304073
Train mAP: 0.5984959602355957
100%|██████████| 52/52 [00:07<00:00,  6.96it/s, loss=8.55]
Mean loss was 11.733054436170137
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.5529853701591492
100%|██████████| 52/52 [00:07<00:00,  7.02it/s, loss=9.72]
Mean loss was 11.902309912901659
Train mAP: 0.5623416304588318
100%|██████████| 52/52 [00:07<00:00,  6.95it/s, loss=6.27]
Mean loss was 12.17488286128411
Train mAP: 0.5263509154319763
100%|██████████| 52/52 [00:07<00:00,  7.16it/s, loss=20.2]
Mean loss was 11.95502089537107
Train mAP: 0.5501671433448792
100%|██████████| 52/52 [00:07<00:00,  7.16it/s, loss=8.3]
Mean loss was 11.710243546045744
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.583227813243866
100%|██████████| 52/52 [00:07<00:00,  7.02it/s, loss=26.3]
Mean loss was 12.27673394863422
Train mAP: 0.5810926556587219
100%|██████████| 52/52 [00:07<00:00,  6.90it/s, loss=13.2]
Mean loss was 11.398397807891552
Train mAP: 0.5858425498008728
100%|██████████| 52/52 [00:07<00:00,  7.23it/s, loss=11]
Mean loss was 9.952778064287626
Train mAP: 0.6241357922554016
100%|██████████| 52/52 [00:07<00:00,  7.12it/s, loss=15.2]
Mean loss was 11.054220538872938
Train mAP: 0.5691715478897095
100%|██████████| 52/52 [00:07<00:00,  7.10it/s, loss=12.4]
Mean loss was 14.413500584088839
Train mAP: 0.5697323083877563
100%|██████████| 52/52 [00:07<00:00,  6.99it/s, loss=9.85]
Mean loss was 17.656728909565853
  0%|          | 0/52 [00:00<?, ?it/s]
Train mAP: 0.4619218707084656
100%|██████████| 52/52 [00:07<00:00,  7.17it/s, loss=19.4]
Mean loss was 16.712135039843044
```

#### Object bounding boxes

![](../notes/images/Figure_1.png)
![](../notes/images/Figure_2.png)
![](../notes/images/Figure_3.png)
![](../notes/images/Figure_4.png)
![](../notes/images/Figure_5.png)
![](../notes/images/Figure_6.png)
![](../notes/images/Figure_7.png)
![](../notes/images/Figure_8.png)
![](../notes/images/Figure_9.png)

In [20]:
from torchviz import make_dot, make_dot_from_trace

In [54]:
model = YoloV1(split_size=7, num_boxes=2, num_classes=5)

x = torch.randn((2, 3, 448, 448)).requires_grad_(True)
y = model(x)
vis_graph = make_dot(y, params=dict(list(model.named_parameters()) + [('x', x)]))   #.render("attached", format="png")

vis_graph.view()

'Digraph.gv.pdf'

In [52]:
from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter('runs/test')
writer.add_graph(model, torch.zeros([2, 3, 448, 448]))

In [53]:
%load_ext tensorboard
%tensorboard --logdir runs

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


Reusing TensorBoard on port 6007 (pid 381560), started 0:02:20 ago. (Use '!kill 381560' to kill it.)

# Use the Albumentations library for image augmentation

In [57]:
import albumentations as A
import cv2

image_path = "dataset/images/images_0.jpg"
transform = A.Compose([
    A.RandomCrop(width=256, height=256),
    A.HorizontalFlip(p=0.5),
    A.RandomBrightnessContrast(p=0.2),
])

# Read an image with OpenCV and convert it to the RGB colorspace
image = cv2.imread(image_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Augment an image
transformed = transform(image=image)
transformed_image = transformed["image"]

In [61]:
plot_image(transformed_image)

TypeError: plot_image() missing 1 required positional argument: 'boxes'