$$
\newcommand{\mat}[1]{\boldsymbol {#1}}
\newcommand{\mattr}[1]{\boldsymbol {#1}^\top}
\newcommand{\matinv}[1]{\boldsymbol {#1}^{-1}}
\newcommand{\vec}[1]{\boldsymbol {#1}}
\newcommand{\vectr}[1]{\boldsymbol {#1}^\top}
\newcommand{\rvar}[1]{\mathrm {#1}}
\newcommand{\rvec}[1]{\boldsymbol{\mathrm{#1}}}
\newcommand{\diag}{\mathop{\mathrm {diag}}}
\newcommand{\set}[1]{\mathbb {#1}}
\newcommand{\cset}[1]{\mathcal{#1}}
\newcommand{\norm}[1]{\left\lVert#1\right\rVert}
\newcommand{\pderiv}[2]{\frac{\partial #1}{\partial #2}}
\newcommand{\bb}[1]{\boldsymbol{#1}}
\newcommand{\E}[2][]{\mathbb{E}_{#1}\left[#2\right]}
\newcommand{\ip}[3]{\left<#1,#2\right>_{#3}}
\newcommand{\given}[]{\,\middle\vert\,}
\newcommand{\DKL}[2]{\cset{D}_{\text{KL}}\left(#1\,\Vert\, #2\right)}
\newcommand{\grad}[]{\nabla}
$$

# Part 1: Mini-Project
<a id=part3></a>

### Guidelines

- You should implement the code which displays your results in this notebook, and add any additional code files for your implementation in the `project/` directory. You can import these files here, as we do for the homeworks.
- Running this notebook should not perform any training - load your results from some output files and display them here. The notebook must be runnable from start to end without errors.
- You must include a detailed write-up (in the notebook) of what you implemented and how. 
- Explain the structure of your code and how to run it to reproduce your results.
- Explicitly state any external code you used, including built-in pytorch models and code from the course tutorials/homework.
- Analyze your numerical results, explaining **why** you got these results (not just specifying the results).
- Where relevant, place all results in a table or display them using a graph.
- Before submitting, make sure all files which are required to run this notebook are included in the generated submission zip.
- Try to keep the submission file size under 10MB. Do not include model checkpoint files, dataset files, or any other non-essentials files. Instead include your results as images/text files/pickles/etc, and load them for display in this notebook. 

## Object detection on TACO dataset

TACO is a growing image dataset of waste in the wild. It contains images of litter taken under diverse environments: woods, roads and beaches.

<center><img src="imgs/taco.png" /></center>


you can read more about the dataset here: https://github.com/pedropro/TACO

and can explore the data distribution and how to load it from here: https://github.com/pedropro/TACO/blob/master/demo.ipynb


The stable version of the dataset that contain 1500 images and 4787 annotations exist in `datasets/TACO-master`
You do not need to download the dataset.


### Project goals:

* You need to perform Object Detection task, over 7 of the dataset.
* The annotation for object detection can be downloaded from here: https://github.com/wimlds-trojmiasto/detect-waste/tree/main/annotations.
* The data and annotation format is like the COCOAPI: https://github.com/cocodataset/cocoapi (you can find a notebook of how to perform evalutation using it here: https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb)
(you need to install it..)
* if you need a beginner guild for OD in COCOAPI, you can read and watch this link: https://www.neuralception.com/cocodatasetapi/ 

### What do i need to do?

* **Everything is in the game!** as long as your model does not require more then 8 GB of memory and you follow the Guidelines above.


### What does it mean?
* you can use data augmentation, rather take what's implemented in the directory or use external libraries such as https://albumentations.ai/ (notice that when you create your own augmentations you need to change the annotation as well)
* you can use more data if you find it useful (for examples, reviwew https://github.com/AgaMiko/waste-datasets-review)


### What model can i use?
* Whatever you want!
you can review good models for the coco-OD task as a referance:
SOTA: https://paperswithcode.com/sota/object-detection-on-coco
Real-Time: https://paperswithcode.com/sota/real-time-object-detection-on-coco
Or you can use older models like YOLO-V3 or Faster-RCNN
* As long as you have a reason (complexity, speed, preformence), you are golden.

### Tips for a good grade:
* start as simple as possible. dealing with APIs are not the easiest for the first time and i predict that this would be your main issue. only when you have a running model that learn, you can add learning tricks.
* use the visualization of a notebook, as we did over the course, check that your input actually fitting the model, the output is the desired size and so on.
* It is recommanded to change the images to a fixed size, like shown in here :https://github.com/pedropro/TACO/blob/master/detector/inspect_data.ipynb
* Please adress the architecture and your loss function/s in this notebook. if you decided to add some loss component like the Focal loss for instance, try to show the results before and after using it.
* Plot your losses in this notebook, any evaluation metric can be shown as a function of time and possibe to analize per class.

Good luck!

# Deep project

structure of this file:
- bla
-
-
-
-

## Creating the Datasets

### Downloading the data
As instructed for the project, we've used the stable version of TACO dataset which consists of 1500 images and 4787 annotations.
The images were downloaded from the https://github.com/pedropro/TACO repository.
The annotations were downloaded from https://github.com/wimlds-trojmiasto/detect-waste/tree/main/annotations repository.
We've used the annotations_train.json and annotations_test.json files, including 7 detect-waste categories for the object detection:
- bio
- glass
- metals_and_plastic
- non_recyclable
- other
- paper
- unknown

### Data Directory Preprocessing
We used the Roboflow API to create our datasets. In order to use the Roboflow API we've implemnted a python script that edits the data directory in two steps (the script is ran in the next cell).
1. Flattening the directory: The original structure of the directory had 15 sub-directories (batches 1-15) each containing ~100 images, each image named as a number in [1, ~100]. We flattend the subdirectories to one directory, and changed the images names, adjusting the corresponding images names in the annotations files.
2. Splitting the directory into two: Test_dir and Train_dir, based on the images partition in the annotation files.

### Datasets creation with the Roboflow API
Once the data directory was prepared, we've uploaded it to Roboflow and created two seperated datasets.
1. A training set with 1182 images (~79% of the data). We split this dataset to two subsets: Train set (1062 images, 90%) and a validation subset (118 images, 10%).
2. A test set with 317 images. (~21% of the data).

With the roboflow API we've processed the images with the following tools:
1. resized all training images to 640x640.
2. applied auto-orientation to correct mismatchs between annotations and images.

Then We created the dataset, which can be downloaded using the code in the following cell.

In [2]:
from roboflow import Roboflow
from ultralytics import YOLO
import project.model_training as mt

# preprocess data directory to fit roboflow
# the script is commented as the data location in the server may vary. 
# [it is said to be in `datasets/TACO-master` but we could not find it there]  
# %run project/preprocess_imgs.py

# download the datasets
train_set, test_set = mt.load_datasets()

FileNotFoundError: [Errno 2] No such file or directory: 'data'

loading Roboflow workspace...
loading Roboflow project...
Dependency ultralytics<=8.0.20 is required but found version=8.0.59, to fix: `pip install ultralytics<=8.0.20`
Downloading Dataset Version Zip in TACO_train_only-1 to yolov8: 8% [10641408 / 126861221] bytes

KeyboardInterrupt: 

## Creating the Model
### YOLOv8
We chose to approach the task by custom training the YOLOv8 model. This model is regarded as one of the leading models in image classification, detection and segmentation. To achive best results, we've used the largest, most accurate version of the model (YOLOv8x). We trained our model for a 100 epochs, taking into considaration 3 main factors:
1. Maximizing validation result during the training process
2. Avoiding overfitting to the training set
3. Cost–benefit analysis for time and resources consumption during training the model [i.e: with more time and resources, it is possible to run more training sessions, perform cross-validation etc to reach better results].

In [3]:
# Initializing and training the model.
# This code block is commented so that the notebook wont preform training.
# To re-create our the training process, uncomment and run the next line

# model, train_res = mt.set_model(train_set, 'yolov8x')

In [None]:
# To avoid training, load our trained model:
model = YOLO("runs/detect/train27/weights/best.pt")

### Model Architecture:
YOLO V8 consists of two main components. A backbone and a head. The backbone is a series of convolutional networks and course to fine (C2f) layers. The backbone creates features which are then passed to the head for detection using the models loss function. A diagram by [RangeKing](https://github.com/RangeKing) of the model can be seen here.

<div>
<img src="imgs/yolov8_architecture_diagram.jpeg" width="1000"/>
</div>

Sublayers are included in the diagram and it illustrates each well.

The architecture utilizes bottlenecks and a pyramidal structure for the architecture. One pyramidal concept is the spatial pyramid pooling layers (SPP/SPPF).

Some changes in this version of YOLO include;  
    - Not using anchor boxes for detection which increased speed.
    - A new backbone consisting of new convolutional building block and new C2f layers which have additional residual connections.
    - And new loss functions
    
The full model can bee seen here on the [YOLOv8 repo](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/models/v8/yolov8.yaml)

#### Loss function:

The model uses a loss function that combines several elements to measure the total loss.

- The first part is a Bbox Loss. The bbox loss returns two seperate loss values. The Bbox loss holds a componenet of the total loss that measures and evaluates the loss of the bounding boxes generated by the model.  

1. IoU Loss: Which is a standard intersection over union loss. Calculated by using an external bbox_iou method.

2. DFL Loss: Which is a distributional focal loss function. As proposed in this [paper](https://ieeexplore.ieee.org/document/9792391). In short this is a loss function that also measures the quality of box locations but does so using distribution based methods. 

Below is the code of the Bbox loss.


In [None]:
# class BboxLoss(nn.Module):

#     def __init__(self, reg_max, use_dfl=False):
#         super().__init__()
#         self.reg_max = reg_max
#         self.use_dfl = use_dfl

#     def forward(self, pred_dist, pred_bboxes, anchor_points, target_bboxes, target_scores, target_scores_sum, fg_mask):
#         # IoU loss
#         weight = torch.masked_select(target_scores.sum(-1), fg_mask).unsqueeze(-1)
#         iou = bbox_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], xywh=False, CIoU=True)
#         loss_iou = ((1.0 - iou) * weight).sum() / target_scores_sum

#         # DFL loss
#         if self.use_dfl:
#             target_ltrb = bbox2dist(anchor_points, target_bboxes, self.reg_max)
#             loss_dfl = self._df_loss(pred_dist[fg_mask].view(-1, self.reg_max + 1), target_ltrb[fg_mask]) * weight
#             loss_dfl = loss_dfl.sum() / target_scores_sum
#         else:
#             loss_dfl = torch.tensor(0.0).to(pred_dist.device)

#         return loss_iou, loss_dfl

#     @staticmethod
#     def _df_loss(pred_dist, target):
#         # Return sum of left and right DFL losses
#         # Distribution Focal Loss (DFL) proposed in Generalized Focal Loss https://ieeexplore.ieee.org/document/9792391
#         tl = target.long()  # target left
#         tr = tl + 1  # target right
#         wl = tr - target  # weight left
#         wr = 1 - wl  # weight right
#         return (F.cross_entropy(pred_dist, tl.view(-1), reduction='none').view(tl.shape) * wl +
#                 F.cross_entropy(pred_dist, tr.view(-1), reduction='none').view(tl.shape) * wr).mean(-1, keepdim=True)

- The second part is a Varifocal loss which gives a classification loss component to the total loss. It is defined this paper in this [paper](https://arxiv.org/pdf/2008.13367.pdf) as:  

<div>
<img src="imgs/VFL3.png" width="500"/>
</div>
<div>
<img src="imgs/VFL2.png" width="500"/>
</div>

Which is a take on binary cross entropy and is further explained in detail in the paper. In general focal losses help classify when we have imbalanced classes. Where some examples are easily classified and others are more difficult, the loss then focuses more on the challenging examples. In general this is a strong classification loss function.

We can see that the code of the loss function also includes an existing binary cross entropy method: binary_cross_entropy_with_logits

Which from its documentation is a combination of binary cross entropy with a sigmoid layer.



In [2]:
# class VarifocalLoss(nn.Module):
#     # Varifocal loss by Zhang et al. https://arxiv.org/abs/2008.13367
#     def __init__(self):
#         super().__init__()

#     def forward(self, pred_score, gt_score, label, alpha=0.75, gamma=2.0):
#         weight = alpha * pred_score.sigmoid().pow(gamma) * (1 - label) + gt_score * label
#         with torch.cuda.amp.autocast(enabled=False):
#             loss = (F.binary_cross_entropy_with_logits(pred_score.float(), gt_score.float(), reduction='none') *
#                     weight).sum()
#         return loss

#### Optimization: 

The YOLOv8 model uses a default optimizer of ADAM with the following default hyper parameters.

Learning rate=0.001, Momentum=0.9, Decay=1e-5

We choose to use this optimizer relying on the fact that ADAM is a SOTA optimization algorithim and the model was designed around these hyperparams.


#### Additional evaluation metrics?
#### Accuracy:
#### Results:
#### Conclusions:

## Evaluating the Model
### Predicting on the test set
After we trained the model, we used the YOLO.val() method to predict on our test set.
The method performs object detection on our unseen test set images. After the detection, a confusion matrix is produced: 
MAP values:... 
graphs: ....

In [6]:
# predict with the model on the test set for evaluation
test_res = mt.evaluate_model(test_set, model)

Ultralytics YOLOv8.0.59 🚀 Python-3.8.12 torch-1.10.1 CUDA:0 (NVIDIA GeForce RTX 2080, 7982MiB)
Model summary (fused): 268 layers, 68130309 parameters, 0 gradients, 257.4 GFLOPs
[34m[1mval: [0mScanning /home/muradek/Deep_project/TACO_test_set-1/valid/labels... 317 images, 0 backgrounds, 0 corrupt: 100%|██████████| 317/317 [00:00<00:00, 511.57it[0m
[34m[1mval: [0mNew cache created: /home/muradek/Deep_project/TACO_test_set-1/valid/labels.cache
This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 20/20 [00:27<00:00,  1.37s/it]
                   all        317        957     0.0425     

### COCO evaluation
To deeper our model evaluation, we've performed another analysis using the cocotools. 
To perform the evaluation, we compared the groung truth annotation file and the detected annotations for the dataset.
The detected annotation file is created by the VAL() function, and processed by our next script to fit the cocoEVAL() comparison.

In [1]:
# perform the COCO evaluation
%run project/pycocoEvalDemo.py

loading annotations into memory...
Done (t=0.06s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.46s).
Accumulating evaluation results...
DONE (t=0.16s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.010
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.011
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.010
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.017
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.021
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.025
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets

In [3]:
# edit the test_set to fit evaluations
%run project/edit_test.py

/home/muradek/Deep_project/project/annotations_test.json
loading annotations into memory...
Done (t=0.05s)
creating index...
index created!


## Results and dicussion
### Training and Validation results
During the training process, the model validates its performence after each epoch on the validation subset.
- Reminder & Clarification: Our validation subset is part of the training Dataset but the model does not train on it explicitly. This means that the validation subset does not affect the weights directly. However, as we use those results to mannualy tune the model's version, size and hyperparameters, the validation subset is *not* used for the test results (which we'll discuss later on). 

In the following graph, we can see the model's performence as a function of the training epochs.
<div>   
<img src="runs/detect/train27/results.png" width="800"/>  
</div>
Few important distinctions:
1. As expected, we can see that during the training process the loss values (box, cls, dfl) decrease and the precision values increase for both training and validation set.
2. We see that the values did not reach a flat line yet, inticatin we might still be able to improve the model performence. After severall training process, we've noticed that we can still improve the models precision on the training set, but this will cause overfitting effect for some of the categories. Therfore we limited the model epochs. 

In the following figure, we can observe the different predictions disribution broken into categories.
<div>   
<img src="runs/detect/train27/confusion_matrix.png" width="600"/>  
</div>
We can see that there are two categories that are "overpredicted"(False Positive - FP): background and metals_and_plastic.
We can lower the the background FP by decreasing the confidence threshold for the detection. However, this leads to lower precision overall by increasing the FN, and specifically increases the metals_and_platic FP.

Another distinction is that the model rarely predicts the "other" category. We assume that the reason is the fact that as opposed to the rest of the cattegories, "other" has no dintinct definition. therfore the model cant find uniqu patterns (wights) to detect it.

### yolo test results


<div>   
<img src="runs/detect/val18/confusion_matrix.png" width="600"/>  
</div>
### coco evaluation results