$$
\newcommand{\mat}[1]{\boldsymbol {#1}}
\newcommand{\mattr}[1]{\boldsymbol {#1}^\top}
\newcommand{\matinv}[1]{\boldsymbol {#1}^{-1}}
\newcommand{\vec}[1]{\boldsymbol {#1}}
\newcommand{\vectr}[1]{\boldsymbol {#1}^\top}
\newcommand{\rvar}[1]{\mathrm {#1}}
\newcommand{\rvec}[1]{\boldsymbol{\mathrm{#1}}}
\newcommand{\diag}{\mathop{\mathrm {diag}}}
\newcommand{\set}[1]{\mathbb {#1}}
\newcommand{\cset}[1]{\mathcal{#1}}
\newcommand{\norm}[1]{\left\lVert#1\right\rVert}
\newcommand{\pderiv}[2]{\frac{\partial #1}{\partial #2}}
\newcommand{\bb}[1]{\boldsymbol{#1}}
\newcommand{\E}[2][]{\mathbb{E}_{#1}\left[#2\right]}
\newcommand{\ip}[3]{\left<#1,#2\right>_{#3}}
\newcommand{\given}[]{\,\middle\vert\,}
\newcommand{\DKL}[2]{\cset{D}_{\text{KL}}\left(#1\,\Vert\, #2\right)}
\newcommand{\grad}[]{\nabla}
$$
# Model Evaluation

### Implementation Overview:
We chose to approach the task by training a model using the YOLO8 model. This model is regarded as one of the leading models in image classification, detection and segmentation. To be able to train and test the model on the given data set we used the RoboFlow API to preproccess the dataset. 

#TODO add depth.



### Code Structure:
#TODO How to run and reproduce the results

### External Code Usage:

#TODO write about roboflow usage and YOLO. Coco Eval tools?

### Analysis:


#### Architecture:
YOLO V8 consists of two main components. A backbone and a head. The backbone is a series of convolutional networks and course to fine (C2f) layers. The backbone creates features which are then passed to the head for detection using the models loss function. A diagram by [RangeKing](https://github.com/RangeKing) of the model can be seen here.

<div>
<img src="imgs/yolov8_architecture_diagram.jpeg" width="1000"/>
</div>

Sublayers are included in the diagram and it illustrates each well.

The architecture utilizes bottlenecks and a pyramidal structure for the architecture. One pyramidal concept is the spatial pyramid pooling layers (SPP/SPPF).

Some changes in this version of YOLO include;  
    - Not using anchor boxes for detection which increased speed.
    - A new backbone consisting of new convolutional building block and new C2f layers which have additional residual connections.
    - And new loss functions
    
The full model can bee seen here on the [YOLOv8 repo](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/models/v8/yolov8.yaml)

#### Loss function:

The model uses a loss function that combines several elements to measure the total loss.

- The first part is a Bbox Loss. The bbox loss returns two seperate loss values. 

1. IoU Loss: Which is a standard intersection over union loss. Calculated by using an external bbox_iou method.

2. DFL Loss: Which is a distributional focal loss function. As proposed in this [paper](https://ieeexplore.ieee.org/document/9792391).

Below is the code of the Bbox loss.


In [None]:
# class BboxLoss(nn.Module):

#     def __init__(self, reg_max, use_dfl=False):
#         super().__init__()
#         self.reg_max = reg_max
#         self.use_dfl = use_dfl

#     def forward(self, pred_dist, pred_bboxes, anchor_points, target_bboxes, target_scores, target_scores_sum, fg_mask):
#         # IoU loss
#         weight = torch.masked_select(target_scores.sum(-1), fg_mask).unsqueeze(-1)
#         iou = bbox_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], xywh=False, CIoU=True)
#         loss_iou = ((1.0 - iou) * weight).sum() / target_scores_sum

#         # DFL loss
#         if self.use_dfl:
#             target_ltrb = bbox2dist(anchor_points, target_bboxes, self.reg_max)
#             loss_dfl = self._df_loss(pred_dist[fg_mask].view(-1, self.reg_max + 1), target_ltrb[fg_mask]) * weight
#             loss_dfl = loss_dfl.sum() / target_scores_sum
#         else:
#             loss_dfl = torch.tensor(0.0).to(pred_dist.device)

#         return loss_iou, loss_dfl

#     @staticmethod
#     def _df_loss(pred_dist, target):
#         # Return sum of left and right DFL losses
#         # Distribution Focal Loss (DFL) proposed in Generalized Focal Loss https://ieeexplore.ieee.org/document/9792391
#         tl = target.long()  # target left
#         tr = tl + 1  # target right
#         wl = tr - target  # weight left
#         wr = 1 - wl  # weight right
#         return (F.cross_entropy(pred_dist, tl.view(-1), reduction='none').view(tl.shape) * wl +
#                 F.cross_entropy(pred_dist, tr.view(-1), reduction='none').view(tl.shape) * wr).mean(-1, keepdim=True)

- The second part is a Varifocal loss. Which as defined  

<div>
<img src="imgs/VFL3.png" width="500"/>
</div>
<div>
<img src="imgs/VFL2.png" width="500"/>
</div>


in this [paper](https://arxiv.org/pdf/2008.13367.pdf) 

Which is a take on binary cross entropy and is further explained in detail in the paper. 

We can see that the code of the loss function also includes an existing binary cross entropy method: binary_cross_entropy_with_logits

Which from its documentation is a combination of binary cross entropy with a sigmoid layer.



In [2]:
# class VarifocalLoss(nn.Module):
#     # Varifocal loss by Zhang et al. https://arxiv.org/abs/2008.13367
#     def __init__(self):
#         super().__init__()

#     def forward(self, pred_score, gt_score, label, alpha=0.75, gamma=2.0):
#         weight = alpha * pred_score.sigmoid().pow(gamma) * (1 - label) + gt_score * label
#         with torch.cuda.amp.autocast(enabled=False):
#             loss = (F.binary_cross_entropy_with_logits(pred_score.float(), gt_score.float(), reduction='none') *
#                     weight).sum()
#         return loss

#### Optimization: 

The YOLOv8 model uses a default optimizer of ADAM with the following default hyper parameters.

Learning rate=0.001, Momentum=0.9, Decay=1e-5

We choose to use this optimizer relying on the fact that ADAM is a SOTA optimization algorithim and the model was designed around these hyperparams.





#### Additional evaluation metrics?
#### Accuracy:
#### Results:
#### Conclusions:

In [None]:
from ultralytics import YOLO