$$
\newcommand{\mat}[1]{\boldsymbol {#1}}
\newcommand{\mattr}[1]{\boldsymbol {#1}^\top}
\newcommand{\matinv}[1]{\boldsymbol {#1}^{-1}}
\newcommand{\vec}[1]{\boldsymbol {#1}}
\newcommand{\vectr}[1]{\boldsymbol {#1}^\top}
\newcommand{\rvar}[1]{\mathrm {#1}}
\newcommand{\rvec}[1]{\boldsymbol{\mathrm{#1}}}
\newcommand{\diag}{\mathop{\mathrm {diag}}}
\newcommand{\set}[1]{\mathbb {#1}}
\newcommand{\cset}[1]{\mathcal{#1}}
\newcommand{\norm}[1]{\left\lVert#1\right\rVert}
\newcommand{\pderiv}[2]{\frac{\partial #1}{\partial #2}}
\newcommand{\bb}[1]{\boldsymbol{#1}}
\newcommand{\E}[2][]{\mathbb{E}_{#1}\left[#2\right]}
\newcommand{\ip}[3]{\left<#1,#2\right>_{#3}}
\newcommand{\given}[]{\,\middle\vert\,}
\newcommand{\DKL}[2]{\cset{D}_{\text{KL}}\left(#1\,\Vert\, #2\right)}
\newcommand{\grad}[]{\nabla}
$$

# Part 1: Mini-Project
<a id=part3></a>

# Table of Contents

#### Creating the Dataset
###### Downloading the data
###### Data Directory Preprocessing
###### Datasets creation with the Roboflow API

#### Creating the Model
###### YOLOv8
###### Model Architecture
###### Loss function
###### Optimization

#### Evaluating the Model
###### Predicting on the test set
###### COCO evaluation

#### Results and Discussion
###### Training and Validation results
###### yolo test results
###### coco evaluation results

## Creating the Datasets

### Downloading the data
As instructed for the project, we've used the stable version of TACO dataset which consists of 1500 images and 4787 annotations.
The images were downloaded from the https://github.com/pedropro/TACO repository.
The annotations were downloaded from https://github.com/wimlds-trojmiasto/detect-waste/tree/main/annotations repository.
We've used the annotations_train.json and annotations_test.json files, including 7 detect-waste categories for the object detection task:
- bio
- glass
- metals_and_plastic
- non_recyclable
- other
- paper
- unknown

### Data Directory Preprocessing
We used the Roboflow API to create our datasets. In order to use the Roboflow API we've implemnted a python script that edits the data directory in two steps (the script is ran in the next cell).
1. Flattening the directory: The original structure of the directory had 15 sub-directories (batches 1-15) each containing ~100 images, each image named as a number in [1, ~100]. We flattend the subdirectories to one directory, and changed the images names, adjusting the corresponding images names in the annotations files.
2. Splitting the directory into two: Test_dir and Train_dir, based on the images partition in the annotation files.

### Datasets creation with the Roboflow API
Once the data directory was prepared, we've uploaded it to Roboflow and created two seperated datasets.
1. A training set with 1182 images (~79% of the data). We split this dataset to two subsets: Train set (1062 images, 90%) and a validation subset (118 images, 10%).
2. A test set with 317 images. (~21% of the data).

With the roboflow API we've processed the images with the following tools:
1. resized all training images to 640x640.
2. applied auto-orientation to correct mismatchs between annotations and images.

Then We created the dataset, which can be downloaded using the code in the following cell.

In [1]:
# from roboflow import Roboflow
# from ultralytics import YOLO
# import project.model_training as mt

# preprocess data directory to fit roboflow
# the script is commented as the data location in the server may vary. 
# [it is said to be in `datasets/TACO-master` but we could not find it there]  

# %run project/preprocess_imgs.py

# upload the processed data to roboflow as explained above

# download the datasets
# train_set, test_set = mt.load_datasets()

## Creating the Model
### YOLOv8
We chose to approach the task by custom training the YOLOv8 model. This model is regarded as one of the leading models in image classification, detection and segmentation. To achive best results, we've used the largest, most accurate version of the model (YOLOv8x). We trained our model for a 100 epochs, taking into considaration 3 main factors:
1. Maximizing validation result during the training process
2. Avoiding overfitting to the training set
3. least important factor, but still realevent: Cost–benefit analysis for time and resources consumption during training the model [i.e: with more time and resources, it is possible to run more training sessions, perform cross-validation etc to reach better results].

In [2]:
# Initializing and training the model.
# This code block is commented so that the notebook wont preform training.
# To re-create our the training process, uncomment and run the next line

# model, train_res = mt.set_model(train_set, 'yolov8x')

In [3]:
# To avoid training, load our trained model:
# model = YOLO("runs/detect/train27/weights/best.pt")

### Model Architecture:
YOLO V8 consists of two main components. A backbone and a head. The backbone is a series of convolutional networks and course to fine (C2f) layers. The backbone creates features which are then passed to the head for detection using the models loss function. A diagram by [RangeKing](https://github.com/RangeKing) of the model can be seen here.

<div>
<img src="imgs/yolov8_architecture_diagram.jpeg" width="1000"/>
</div>

Sublayers are included in the diagram and it illustrates each well.

The architecture utilizes bottlenecks and a pyramidal structure for the architecture. One pyramidal concept is the spatial pyramid pooling layers (SPP/SPPF).

Some changes in this version of YOLO include;  

    - Not using anchor boxes for detection which increased speed.
    
    - A new backbone consisting of new convolutional building block and new C2f layers which have additional residual connections.
    
    - And new loss functions
    
The full model can be seen here on the [YOLOv8 repo](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/models/v8/yolov8.yaml)

#### Loss function:

The model uses a loss function that combines several elements to measure the total loss.

- The first part is a Bbox Loss. The bbox loss returns two seperate loss values. The Bbox loss holds a componenet of the total loss that measures and evaluates the loss of the bounding boxes generated by the model.  

1. IoU Loss: Which is a standard intersection over union loss. Calculated by using an external bbox_iou method.

2. DFL Loss: Which is a distributional focal loss function. As proposed in this [paper](https://ieeexplore.ieee.org/document/9792391). In short this is a loss function that also measures the quality of box locations but does so using distribution based methods. 

Below is the code of the Bbox loss.


In [4]:
# class BboxLoss(nn.Module):

#     def __init__(self, reg_max, use_dfl=False):
#         super().__init__()
#         self.reg_max = reg_max
#         self.use_dfl = use_dfl

#     def forward(self, pred_dist, pred_bboxes, anchor_points, target_bboxes, target_scores, target_scores_sum, fg_mask):
#         # IoU loss
#         weight = torch.masked_select(target_scores.sum(-1), fg_mask).unsqueeze(-1)
#         iou = bbox_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], xywh=False, CIoU=True)
#         loss_iou = ((1.0 - iou) * weight).sum() / target_scores_sum

#         # DFL loss
#         if self.use_dfl:
#             target_ltrb = bbox2dist(anchor_points, target_bboxes, self.reg_max)
#             loss_dfl = self._df_loss(pred_dist[fg_mask].view(-1, self.reg_max + 1), target_ltrb[fg_mask]) * weight
#             loss_dfl = loss_dfl.sum() / target_scores_sum
#         else:
#             loss_dfl = torch.tensor(0.0).to(pred_dist.device)

#         return loss_iou, loss_dfl

#     @staticmethod
#     def _df_loss(pred_dist, target):
#         # Return sum of left and right DFL losses
#         # Distribution Focal Loss (DFL) proposed in Generalized Focal Loss https://ieeexplore.ieee.org/document/9792391
#         tl = target.long()  # target left
#         tr = tl + 1  # target right
#         wl = tr - target  # weight left
#         wr = 1 - wl  # weight right
#         return (F.cross_entropy(pred_dist, tl.view(-1), reduction='none').view(tl.shape) * wl +
#                 F.cross_entropy(pred_dist, tr.view(-1), reduction='none').view(tl.shape) * wr).mean(-1, keepdim=True)

- The second part is a Varifocal loss which gives a classification loss component to the total loss. It is defined in this [paper](https://arxiv.org/pdf/2008.13367.pdf) as:  

<div>
<img src="imgs/VFL3.png" width="500"/>
</div>
<div>
<img src="imgs/VFL2.png" width="500"/>
</div>

Which is a take on binary cross entropy and is further explained in detail in the paper. In general focal losses help classify when we have imbalanced classes. Where some examples are easily classified and others are more difficult, the loss then focuses more on the challenging examples. In general this is a strong classification loss function.

We can see that the code of the loss function also includes an existing binary cross entropy method: binary_cross_entropy_with_logits

Which from its documentation is a combination of binary cross entropy with a sigmoid layer.



In [5]:
# class VarifocalLoss(nn.Module):
#     # Varifocal loss by Zhang et al. https://arxiv.org/abs/2008.13367
#     def __init__(self):
#         super().__init__()

#     def forward(self, pred_score, gt_score, label, alpha=0.75, gamma=2.0):
#         weight = alpha * pred_score.sigmoid().pow(gamma) * (1 - label) + gt_score * label
#         with torch.cuda.amp.autocast(enabled=False):
#             loss = (F.binary_cross_entropy_with_logits(pred_score.float(), gt_score.float(), reduction='none') *
#                     weight).sum()
#         return loss

#### Optimization: 

The YOLOv8 model uses a default optimizer of ADAM.
ADAM is an extented version of stochastic gradient decent with momentum that only uses first order gradients and 
is based on adaptive estimates of lower-order moments. Empirical results show that adam produces good results in comparison to other
optimizer algorithms and is well suited for large data.
the default hyper parameters in the model are: Learning rate=0.001, Momentum=0.9, Decay=1e-5

We choose to use this optimizer relying on the fact that ADAM is a SOTA optimization algorithim.

## Evaluating the Model
### Predicting on the test set
After we trained the model, we used the YOLO.val() method to predict on our test set.
The method performs object detection on our unseen test set images.
To deeper our model evaluation, we've performed another analysis using the cocotools. 
To perform the evaluation, we compared the ground truth annotation file and the detected annotations for the test dataset.
The detected annotation file is created by the VAL() function, and processed by our next script to fit the cocoEVAL() comparison.

In [6]:
# edit the test_set to fit evaluations
# %run project/edit_test.py

In [7]:
# predict with the model on the test set for evaluation
# test_res = mt.evaluate_model(test_set, model)

In [8]:
# perform the COCO evaluation
# %run project/cocoEval.py

## Results and dicussion
### Training and Validation results
During the training process, the model validates its performence after each epoch on the validation subset.
- Reminder & Clarification: Our validation subset is part of the training Dataset but the model does not train on it explicitly. This means that the validation subset does not affect the weights directly. However, as we use those results to manualy tune the model's version, size and hyperparameters, the validation subset is *not* used for the test results (which we'll discuss later on). 

In the following graph, we can see the model's performence as a function of the training epochs.
<div>   
<img src="imgs/results_train.png" width="800"/>  
</div>

* Few important distinctions:
1. As expected, we can see that during the training process the loss values (box, cls, dfl) decrease and the precision values increase for both training and validation set.
2. We see that the graphs did not reach a plateau yet, indicating we might still be able to improve the model performence. After severall training processes, we've noticed that we can still improve the model's precision on the training set, but this will cause overfitting effect for some of the categories. Therfore we limited the model epochs. 

In the following figure, we can observe the different predictions disribution broken into categories.
<div>   
<img src="imgs/matrix_train.png" width="800"/>  
</div>

* We can see that there are two categories that are "overpredicted"(False Positive - FP): background and metals_and_plastic.
We can lower the background FP value by decreasing the confidence threshold for the detection. However, this lowers the overall precision as it increases the FN values, and specifically increases the metals_and_plastic FP value.

* Another distinction is that the model rarely predicts the "other" category. We assume that this happens due to the fact that as opposed to the rest of the cattegories, "other" has no dintinct definition. therfore the model cant find unique patterns (weights) to detect it.

* The 'bio' column is empty as there are no bio labels in the validation set (generally, there are very few bio labels in the TACO dataset, hence we expect that the model wont detect them well or wont detect them at all).

### YOLO test results

<div>   
<img src="imgs/matrix_test_3.png" width="800"/>  
</div>

Like in the training results we can again notice the over-prediction of background and metals_and_plastics which is not surprising having been a difficulty of the model during training.

There were no major changes in comparison to the training. However we can notice some added confusion between glass and non-recyclables and unknown and paper. The changes are not major and we can still observe similar results.

 When taking into account the large variability of the data set and the fact that we trained a large model and likely slightly overfitted the data to some inevitable extent the results are reasonable. It is worth noting that in general the entire YOLOv8 model barely reaches over 50 mAP on the COCO 2017 image dataset. Considering that the dataset is substantially smaller and that we had less time and resources to train the model on it, we think the results are decent.

### COCO evaluation results

<div>   
<img src="imgs/coco_eval.png" width="600"/>  
</div>

We can see that the results are similar to the results produced in the training set. One thing worth noting is that with the final Average recall with IoU=0.5:095 and large area we reach an Average recall of 0.13 which is decent recall value when taking into consideration the limitations of the task.