$$
\newcommand{\mat}[1]{\boldsymbol {#1}}
\newcommand{\mattr}[1]{\boldsymbol {#1}^\top}
\newcommand{\matinv}[1]{\boldsymbol {#1}^{-1}}
\newcommand{\vec}[1]{\boldsymbol {#1}}
\newcommand{\vectr}[1]{\boldsymbol {#1}^\top}
\newcommand{\rvar}[1]{\mathrm {#1}}
\newcommand{\rvec}[1]{\boldsymbol{\mathrm{#1}}}
\newcommand{\diag}{\mathop{\mathrm {diag}}}
\newcommand{\set}[1]{\mathbb {#1}}
\newcommand{\cset}[1]{\mathcal{#1}}
\newcommand{\norm}[1]{\left\lVert#1\right\rVert}
\newcommand{\pderiv}[2]{\frac{\partial #1}{\partial #2}}
\newcommand{\bb}[1]{\boldsymbol{#1}}
\newcommand{\E}[2][]{\mathbb{E}_{#1}\left[#2\right]}
\newcommand{\ip}[3]{\left<#1,#2\right>_{#3}}
\newcommand{\given}[]{\,\middle\vert\,}
\newcommand{\DKL}[2]{\cset{D}_{\text{KL}}\left(#1\,\Vert\, #2\right)}
\newcommand{\grad}[]{\nabla}
$$

# Part 1: Mini-Project
<a id=part3></a>

In this part you'll implement a small comparative-analysis project, heavily based on the materials from the tutorials and homework.

### Guidelines

- You should implement the code which displays your results in this notebook, and add any additional code files for your implementation in the `project/` directory. You can import these files here, as we do for the homeworks.
- Running this notebook should not perform any training - load your results from some output files and display them here. The notebook must be runnable from start to end without errors.
- You must include a detailed write-up (in the notebook) of what you implemented and how. 
- Explain the structure of your code and how to run it to reproduce your results.
- Explicitly state any external code you used, including built-in pytorch models and code from the course tutorials/homework.
- Analyze your numerical results, explaining **why** you got these results (not just specifying the results).
- Where relevant, place all results in a table or display them using a graph.
- Before submitting, make sure all files which are required to run this notebook are included in the generated submission zip.
- Try to keep the submission file size under 10MB. Do not include model checkpoint files, dataset files, or any other non-essentials files. Instead include your results as images/text files/pickles/etc, and load them for display in this notebook. 

## Object detection on TACO dataset

TACO is a growing image dataset of waste in the wild. It contains images of litter taken under diverse environments: woods, roads and beaches.

<center><img src="imgs/taco.png" /></center>


you can read more about the dataset here: https://github.com/pedropro/TACO

and can explore the data distribution and how to load it from here: https://github.com/pedropro/TACO/blob/master/demo.ipynb


The stable version of the dataset that contain 1500 images and 4787 annotations exist in `datasets/TACO-master`
You do not need to download the dataset.


### Project goals:

* You need to perform Object Detection task, over 7 of the dataset.
* The annotation for object detection can be downloaded from here: https://github.com/wimlds-trojmiasto/detect-waste/tree/main/annotations.
* The data and annotation format is like the COCOAPI: https://github.com/cocodataset/cocoapi (you can find a notebook of how to perform evalutation using it here: https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb)
(you need to install it..)
* if you need a beginner guild for OD in COCOAPI, you can read and watch this link: https://www.neuralception.com/cocodatasetapi/ 

### What do i need to do?

* **Everything is in the game!** as long as your model does not require more then 8 GB of memory and you follow the Guidelines above.


### What does it mean?
* you can use data augmentation, rather take what's implemented in the directory or use external libraries such as https://albumentations.ai/ (notice that when you create your own augmentations you need to change the annotation as well)
* you can use more data if you find it useful (for examples, reviwew https://github.com/AgaMiko/waste-datasets-review)


### What model can i use?
* Whatever you want!
you can review good models for the coco-OD task as a referance:
SOTA: https://paperswithcode.com/sota/object-detection-on-coco
Real-Time: https://paperswithcode.com/sota/real-time-object-detection-on-coco
Or you can use older models like YOLO-V3 or Faster-RCNN
* As long as you have a reason (complexity, speed, preformence), you are golden.

### Tips for a good grade:
* start as simple as possible. dealing with APIs are not the easiest for the first time and i predict that this would be your main issue. only when you have a running model that learn, you can add learning tricks.
* use the visualization of a notebook, as we did over the course, check that your input actually fitting the model, the output is the desired size and so on.
* It is recommanded to change the images to a fixed size, like shown in here :https://github.com/pedropro/TACO/blob/master/detector/inspect_data.ipynb
* Please adress the architecture and your loss function/s in this notebook. if you decided to add some loss component like the Focal loss for instance, try to show the results before and after using it.
* Plot your losses in this notebook, any evaluation metric can be shown as a function of time and possibe to analize per class.

Good luck!

## Implementation

**TODO**: This is where you should write your explanations and implement the code to display the results.
See guidelines about what to include in this section.

## YOLOv8
The model we chose for our project is the updated version of yolo that we saw in class. YOLOv8 is a state-of-the-art object detection and image segmentation model created by Ultralytics, the developers of YOLOv5, and launched on January 10, 2023.
This new version features many improvements, such as:
- A new backbone network based on ResNet-101 with attention modules
- A design that makes it easy to compare model performance with older models in the YOLO family
- A new loss function that combines cross-entropy, IoU and Dice losses
- A new data augmentation technique called MixUp that blends images and labels from different classes

YOLOv8 is designed to be fast, accurate, and easy to use, making it an excellent choice for our project. It is especially suited for image classification tasks. YOLOv8 is a modern and powerful model that can handle the complexity and diversity of the TACO dataset. We also wanted to explore the new features and improvements that YOLOv8 offers over previous versions. In this notebook, we will show you how we trained and evaluated YOLOv8 on the TACO dataset, and compare our results with other models. We will also discuss the challenges and limitations we faced, and suggest some possible directions for future work.

## Architecture
<figure>
    <center><img src="imgs/architecturepic.jpg" /></center>
    <figcaption>
    Courtesy of <a href="https://github.com/ultralytics/ultralytics/issues/189">RangeKing</a>
    </figcaption>
</figure>
YOLOv8 made some changes to the overall architecture of the previous models. 
In general the backbone of YOLO models is a modified version of the CSPDarknet53 architecture, which is a convolutional neural network that uses cross-stage partial connections to reduce redundancy and increase efficiency. The backbone actually consists of 5 layers, each with different architectures. The results are then sent to the head for prediction.

### Activation function
YOLO models also use the Mish activation function,  
defined as: $f(x) = x tanh(softplus(x))$
The Mish activation was used as it is smoother and easier to compute. 

### Head
The head of YOLOv8 is composed of three detection branches, each with a different output resolution and scale. The detection branches use SPP (Spatial Pyramid Pooling) modules, which aggregate features from different levels of the feature pyramid to enhance the receptive field and robustness. The detection branches also use PANet (Path Aggregation Network) modules, which fuse features from the backbone and the previous branches to improve the feature quality. The head is now anchor free, leading to better accuracy.

### Mosiac augmentation
Mosaic augmentation, in which 4 images are stiched together was an important part of YOLOs initaial success. However it is now understood that turning it off for the last few training epochs produces better results.

### Module and Convolution changes
- Replace the C3 module with the C2f module
- Replace the first 6x6 Conv with 3x3 Conv in the Backbone
- Replace the first 1x1 Conv with 3x3 Conv in the Bottleneck
- Use decoupled head and delete the objectness branch


### Fun features
YOLOv8 introduces a new feature called FPNAS (Feature Pyramid Network with AutoShape), which automatically adjusts the input image size and the anchor box shapes according to the dataset statistics. This allows YOLOv8 to adapt to different datasets and scenarios without manual tuning, and makes it very versatile.





### Loss functions
YOLOv8 has a number of different loss functions.

##### Box loss
This is the mean squared error (MSE) between the predicted box coordinates ( $x,y,w,h$ ) and the ground truth box coordinates ( $x∗,y∗,w∗,h∗$ ). It measures how well the model can localize the objects in the image. 
It is computed as:

$$
\mathcal{L}_{\text{box}} = \frac{1}{N} \sum_{i=1}^{N} \left[ (x_i - x_i^*)^2 + (y_i - y_i^*)^2 + (w_i - w_i^*)^2 + (h_i - h_i^*)^2 \right]
$$
where $N$ is the number of boxes in the batch


##### Varifocal  loss
This is the loss between the predicted and ground truth class probabilities. YOLOv8 uses Varifocal Loss (VFL) as the classification loss. VFL is a focal loss variant that adapts to the quality-aware predictions.It attempts to predict the Intersection-over-Area-of-Union Score (IACS). It dynamically adjusts the focal parameter according to the prediction quality, which is measured by the predicted confidence score. VFL can reduce the impact of easy negatives and hard positives, and focus more on hard negatives and easy positives. VFL can also handle class imbalance and noisy labels better than focal loss.
It is computed as:
$$
VFL(p,q) = \begin{cases}
-q(q\log p + (1-q)\log(1-p)) & \text{if } q > 0 \\
-\alpha(1-p)^\gamma\log p & \text{if } q = 0
\end{cases}
$$
where $p$ is the predicted $IACS$, $q$ is the ground truth $IACS$, $α$ and $γ$ are hyperparameters.

The loss was based on this [paper](https://arxiv.org/abs/2008.13367)


##### CIoU loss
Complete IoU Loss is a bounding box regression loss that considers overlap, aspect ratio, and distance between the predicted and ground truth boxes. CIOU can handle various bounding box shapes and sizes, and penalize inaccurate predictions more effectively.
It is computed as:
$$
\mathcal{L}_{\text{CIoU}} = 1 - \text{IoU}(B, B^*) + \frac{\rho(B_c, B_c^*)}{c^2} + \alpha v
$$
where $IoU(B,B^∗)$ is the intersection over union between $B$ and $B^∗$, $ρ(Bc​,B_c^∗​)$ is the Euclidean distance between the center points of $B$ and $B^∗$, $c$ is the diagonal length of the smallest enclosing box that covers both $B$ and $B^∗$, $α$ is a trade-off parameter, and $v$ is a penalty term for aspect ratio consistency.

##### DFL loss
This is the distribution focal loss (DFL) between the predicted class probabilities $p$ and the ground truth class probabilities $p^∗$. It is a distribution-based loss that models the label distribution as a Dirichlet distribution. It aims to align the predicted class probabilities with the target class probabilities. It penalizes misalignment between $p_j$​ and $q_j$​, where $q_j​=p_j^∗​+ϵ$, and $ϵ$ is a small positive constant.
It is computed as:
$$
\mathcal{L}_{\text{DFL}} = - \sum_{i=1}^{C} \alpha_i (p_i - q_i)^{\gamma} \log(p_i)
$$
where $C$ is the number of classes, $α_i​$ is a scaling factor, and $γ$ is a focusing parameter.

The loss was based on this [paper](https://ieeexplore.ieee.org/document/9792391)

#### Total loss
The total loss is computed by summing up the weighted classification loss, localization loss, and confidence loss for all bounding boxes across all images in a batch. 
It is computed as:
$$
\mathcal{L}_{\text{total}} = \lambda_{\text{box}} \mathcal{L}_{\text{box}} + \lambda_{\text{cls}} \mathcal{L}_{\text{cls}} + \lambda_{\text{reg}} \mathcal{L}_{\text{reg}}
$$
where $λ_{box}​,λ_{cls​},λ_{reg​}$ are the weights for each loss term, and $L_{reg​}=L_{CIoU​}+L_{DFL​}$.

# Data Preprocessing
In order to train our model, we first had to do some preproccesing to the TACO dataset. The raw TACO data set comes as multiple batches of images with overlapping names. YOLO requires one folder of images for training, and one for testing. First we consolidated the images into one large folder, updating the image names in the annotation files as well. 


All of this was done by running the preprocessing script we wrote on the TACO dataset

In [None]:
# process the TACO dataset, unifying the image folders and updating the annotation files for use in roboflow
# in order to run this script, simply provide the path to the folder containing both the TACO images
# and annotations. Two new folders will be created, train_images and test_images, 
# annotation files will be updated inplace
# replace DATAFOLDERPATH with the path to your data.
# the DATAPATH must contain the batches of images as downloaded from the TACO repo, and the train and test annotation files we were referred to
# %run project.preprocess.py DATAFOLDERPATH

# Augmentations and Train, Val and Test folders
In order to augment the data set, and split the data into the relevant folders, we used Roboflow.
Roboflow is the API recommended by Ultralytics. This Tool provides an easy online way to augment, resize, and split custom datasets. We uploaded the processed TACO dataset and annotations to Roboflow, where we used their online tools to do the following:
- Resize all images to 640 $\times$ 640, the recommeded size for YOLOv8
- Auto Orient: Roboflow will automatically orient the images based on the information provided in annotations
- Split the data into 90% training data and 10% validation dat

Once the augmentations were applied, the dataset was ready for the model

# Dowloading Dataset
After using roboflow to augment and resize the data, the new data set can be downloaded using roboflows API

In [None]:
#Use this block to download the dataset
#running this block will download the image and label files in the folder you are in
#import project.model_utils as utils
#dataset = utils.download_data(apikey="tbgnNS8bCW5iRz1lVg3O", workspace="deep-learning-q1acw",
#                   project="trash-detection-original-train-images", version=1, model="yolov8")

# Training the model on the original dataset
Once we have the dataset, we can begin training. This is straight forward to do using the YOLOv8 API
Yolo will create a configuration file .config/ultralytics/settings.yaml that contains a path to the folder where the first download took place.
It is important that all downloads must are done to there in order for the notebook to run smoothly and automatically. 
this is the full path to the folder **containing** the folders with the data, not the folder of data itself.
Roboflow should create a download folder with another configuration file called data.yaml, this is the path you give to the model.
If all went well in the download process it should work smoothly. 

otherwise you will get an error explaining what needs to be done. 

In [None]:
#Here we initialize and train a model for 150 epochs, this will take a long time if run
# it will also automatically validate the model on the validation set. 
#model = utils.initialize_and_train_model(dataset, 'yolov8x')

# Training
Here are the results of the models performance on the training set per Epoch as graphs:
<figure>
    <center><img src="imgs/resultsOriginalTrain1.png" /></center>
</figure>

# Observations
The model behaves as expected, with the various loss functions decreases as we progress through the epochs, and evaluation metrics increase.
The model was given an 150 epochs limit, but stopped training after 75, after not observing an improvement for 25 epochs.
The validation subset is used for hyperparameter tuning and does not affect the weights directly. 
During our early test we noticed that CLS loss tended to overfit, and therefor adjusted the weight associated to it accordingly. 


 

# Results

The confusion matrix of best results we got:

<figure>
    <center><img src="imgs/confMatOrig.png" /></center>
</figure>

As we can see some of the classes where not represented in the training data, and therefor have no rows.
Some of the data is overrepresented, and results in many false postives. These are background, a category that is added automatically. 

We could lower the false positive rate by learning for longer, but we then run the risk of overfitting. The split of the categories is also simply not ideal. Some of the categories are severly under represented, and some over represented. The model simply does not have enough trainig data for Bio to assume that it would be able to learn anything about the generalization. Some of the categories are also inherently flawed, for instance Other seems to be random, meaning there are very little defining features of the class, just an assortment of object. Glass is moslty broken glass, and YOLO is known to have difficulty with small objects. 



# COCO evaluation
We followed the demo provided above to perform a coco evaluation on the test image annotations provided

## Download the test set
We used Roboflow to prepare the images for testing like above. Downloading the data set is very similar, but the COCO download must be done as well.

In [None]:
#Use this block to download the test set
#running this block will download the image and label files in the folder you are in, make sure to do so in the roboflow download dir
#import project.model_utils as utils
#utils.download_data(apikey="tbgnNS8bCW5iRz1lVg3O", workspace="deep-learning-q1acw",
#                   project="trash-detection-original-test-images", version=1, model="yolov8")
#utils.download_data(apikey="tbgnNS8bCW5iRz1lVg3O", workspace="deep-learning-q1acw",
#                   project="trash-detection-original-test-images", version=1, model="coco")

# Test set preprocessing
The test set images and labels are downloaded with the names given by roboflow. 
We wished to evaluate using both the yolo evaluation methods and COCOeval 
In order to do so the file names must be changed to the image ids in the annotation files. 
Run the script below to do so, the folders will need to be updated accordingly. Run the block below, changing the PATH_TO_FOLDER_PATH to the place you downloaded the dataset to, after moving the annotation files to it. 

In [None]:
# %run project/processTestset.py PATH_TO_TEST_SET_FOLDER

# running evaluation
We first run yolo evaluation, then the COCOeval function
the path to the weights obtained in learning and the path to the test set data.yaml folder should be switched below.

In [None]:
# %run project/evaluate.py PATH_TO_WEIGHTS PATH_TO_TEST_SET_ANOTATIONS

# YOLO Evaluation

The YOLO evaluation gave a mAP of 0.05 on the test set, which unfortunatly is quite low.
The confusion matrix is not surprising, giving similar results to the test set. 

<figure>
    <center><img src="imgs/conMatOrigOnTest1.png" /></center>
</figure>



# Image results

Here are some of the results:
<div style="display: flex;">
  <div style="flex: 50%; padding: 10px;">
    <img src="imgs/valLabelOrigTrain1.jpg" alt="Image 1" width="100%">
    <p>Labels</p>
  </div>
  <div style="flex: 50%; padding: 10px;">
    <img src="imgs/valPredOrigTrain1.jpg" alt="Image 2" width="100%">
    <p>Predictions</p>
  </div>
</div>


As we can see, the model tends to predict Metals and Plastic, as it is very over represented.

# Coco Evaluation
In order to further our understanding, we used the COCOeval functions you referenced us to. 

In [None]:
# This script recievs the paths to the ground truth annotations for the test set downloaded from ROBOFLOW
# and the predictions generated above
# %run project/COCOeval.py PATH_TO_TRUTH_ANNOTAIONS PATH_TO_PREDICTIONS

# COCO results:
| Metric | IoU Range | Area | Max Dets | Value |
|--------|-----------|------|----------|-------|
| AP     | 0.50:0.95 | all  | 100      | 0.076 |
| AP     | 0.50      | all  | 100      | 0.106 |
| AP     | 0.75      | all  | 100      | 0.081 |
| AP     | 0.50:0.95 | small| 100      | 0.037 |
| AP     | 0.50:0.95 | medium| 100     | 0.124 |
| AP     | 0.50:0.95 | large | 100     | 0.112 |
| AR     | 0.50:0.95 | all  | 1        | 0.181 |
| AR     | 0.50:0.95 | all  | 10       | 0.332 |
| AR     | 0.50:0.95 | all  | 100      | 0.386 |
| AR     | 0.50:0.95 | small| 100      | 0.276 |
| AR     | 0.50:0.95 | medium| 100     | 0.496 |
| AR     | 0.50:0.95 | large | 100     | 0.572 |

 
 Above we see the results of the model using COCO evaluation. 
 Some notable results include a Average recall of 0.57 with IOU of 0.50:0.95 and a large area, which is actually quite good comparitively.

# Augmentations
After the experiment on the original dataset we decided to augment the data to see if we could obtain better results. We again used Roboflow to do the following:
- Resize all images to 640 $\times$ 640, the recommeded size for YOLOv8
- Auto Orient: Roboflow will automatically orient the images based on the information provided in annotations
- Greyscale: We applied grayscale to around 25% of the data set to simulate more lowlight conditions
- Shear: We applied a shear of 15° horizontal and 15° vertical, to simulate various photo angles.
- Flip: we applied a horizontal and vertical flip, again to simulate different angle

Once the augmentations were applied, the new dataset was ready for the model.
We redownloaded the new dataset, trained a model and performed the evaluations.

In [None]:
#dataset = utils.download_data(apikey="tbgnNS8bCW5iRz1lVg3O", workspace="deep-learning-q1acw",
#                   project="trash-detection-original-train-images", version=2, model="yolov8")
#model = utils.initialize_and_train_model("dataset", 'yolov8x')


In [None]:
# %run project/evaluate.py PATH_TO_WEIGHTS PATH_TO_TEST_SET

In [None]:
# %run project/COCOeval.py PATH_TO_TRUTH_ANNOTAIONS PATH_TO_PREDICTIONS

# Augmented results

We got the following training results and  confusion matrix:


<figure>
    <center><img src="imgs/resultsAugTrain1.png" /></center>
</figure>

<figure>
    <center><img src="imgs/confMatAugTrain1.png" /></center>
</figure>


We can see that this is a small improvement, however it still isn't great, with pretty much the same problems as before. Again, Metals and Plastics simply dominate the result. The image predictions show this very well

# Image results
Here are some of the results:
<div style="display: flex;">
  <div style="flex: 50%; padding: 10px;">
    <img src="imgs/valLabelsAugTrain1.jpg" alt="Image 1" width="100%">
    <p>Labels</p>
  </div>
  <div style="flex: 50%; padding: 10px;">
    <img src="imgs/valPredAugTrain1.jpg" alt="Image 2" width="100%">
    <p>Predictions</p>
  </div>
</div>





# Validation results:

The YOLO evaluation on the test set gave an mAP of 0.10, an improvemnet. We assume that now the Metals and Plastics had enough data to be both really learned and overfit, and as such the results were skewed to it.
 # COCO
 The coco evaluation was similar:

 | Metric | IoU Range | Area | Max Dets | Value |
|--------|-----------|------|----------|-------|
| AP     | 0.50:0.95 | all  | 100      | 0.066 |
| AP     | 0.50      | all  | 100      | 0.097 |
| AP     | 0.75      | all  | 100      | 0.072 |
| AP     | 0.50:0.95 | small| 100      | 0.021 |
| AP     | 0.50:0.95 | medium| 100     | 0.091 |
| AP     | 0.50:0.95 | large | 100     | 0.140 |
| AR     | 0.50:0.95 | all  | 1        | 0.161 |
| AR     | 0.50:0.95 | all  | 10       | 0.316 |
| AR     | 0.50:0.95 | all  | 100      | 0.378 |
| AR     | 0.50:0.95 | small| 100      | 0.315 |
| AR     | 0.50:0.95 | medium| 100     | 0.464 |
| AR     | 0.50:0.95 | large | 100     | 0.596 |

# Mid Summary

The YOLO model did not achieve amazing results on the TACO data set, however it did not fail abysmally either. 
It seems that with more images and a larger data set the results could be improved. 


## Changing the categories

As we thought that a major issue was that the categories were not split correctly, we decided to resplit the categories. 

# Data Preprocessing
In order update the categories, we had to do some preproccesing to the TACO dataset. The raw TACO data set comes as multiple batches of images with overlapping names. YOLO requires one folder of images for training, and one for validation. First we consolidated the images into one large folder, updating the image names in the annotation files as well. 

## Annotation updates
We ran a script to change the category names into new ones, updating the relevant fields

All of this was done by running the preprocessing script we wrote on the TACO dataset

In [None]:
# process the TACO dataset, unifying the image folders and categories
# in order to run this script, simply provide the path to the folder containing both the TACO images
# and annotations. a new folder called images will be created containing the images, and the 
# annotations folder will be updated inplace
# replace DATAFOLDERPATH with the path to your data.
# run the split you want
# %run project.preprocess_second_split.py DATAFOLDERPATH
# %run project.preprocess_third_split.py DATAFOLDERPATH
# the categories have to be changed manually in the script

# Augmentation
augmentation was done using Roboflow again.

# Download
The datasets must be downloaded, then used to train again. We note that running this notebook caused the download paths to vary alot, and therefor it is required to manually input the paths to the relevant folders and downloads.

In [None]:
#Use this block to download the first new split
#running this block will download the image and annotation files
#import project.model_utils as utils
#dataset1 = utils.download_data(apikey="tbgnNS8bCW5iRz1lVg3O", workspace="deep-learning-q1acw",
#                   project="trash-detection-new-categories", version=2, model="yolov8")
#dataset1 = utils.download_data(apikey="tbgnNS8bCW5iRz1lVg3O", workspace="deep-learning-q1acw",
#                   project="trash-detection-new-categories", version=2, model="coco")

In [None]:
#Use this block to download the second new split
#running this block will download the image and annotation files
#import project.model_utils as utils
#dataset2 = utils.download_data(apikey="tbgnNS8bCW5iRz1lVg3O", workspace="deep-learning-q1acw",
#                   project="trash-detection-third", version=1, model="yolov8")
#Use this block to download the second new split
#running this block will download the image and annotation files
#import project.model_utils as utils
#dataset2 = utils.download_data(apikey="tbgnNS8bCW5iRz1lVg3O", workspace="deep-learning-q1acw",
#                   project="trash-detection-third", version=1, model="coco")

#Now you can run a model on one of the splits
#the function recieves a path to the data.yaml folder in the downloaded data folder
#model = utils.initialize_and_train_model(datapath, 'yolov8x')

# Process anotations and evaluation
Much like above the annotaion files must processed than evaluated.

In [None]:
# %run project/processTestset.py PATH_TO_TEST_SET_FOLDER

In [None]:
# %run project/evaluate.py PATH_TO_WEIGHTS PATH_TO_TEST_SET

In [None]:
# %run project/COCOeval.py PATH_TO_TRUTH_ANNOTAIONS PATH_TO_PREDICTIONS

# Results
### Attempt number 1:
After using the categories provided we attempted to train a model using resplit categories. The Bio category was severly under represented and therefor not detected well. This simply added noise to the confusion matrix.
We now split the data into: Glass, Metal and Plastic, Other Plastic, Cigarette, Non Recyclable, Paper and Other. As each category was now more balanced, the classes were not under represented. This seemed to us a better split. 
The best results we got where:




<figure>
    <center><img src="imgs/confMatSplit2.png" /></center>
    <figcaption>
    Confusion Matrix from the first attempt to split categories
    </figcaption>
</figure>




The mAP was much better on the validation test, 0.34. 

The overall COCO evaluation was worse:
| Metric | IoU Range | Area | Max Dets | Value |
|--------|-----------|------|----------|-------|
| AP     | 0.50:0.95 | all  | 100      | 0.010 |
| AP     | 0.50      | all  | 100      | 0.015 |
| AP     | 0.75      | all  | 100      | 0.009 |
| AP     | 0.50:0.95 | small| 100      | 0.002 |
| AP     | 0.50:0.95 | medium| 100     | 0.008 |
| AP     | 0.50:0.95 | large | 100     | 0.027 |
| AR     | 0.50:0.95 | all  | 1        | 0.064 |
| AR     | 0.50:0.95 | all  | 10       | 0.145 |
| AR     | 0.50:0.95 | all  | 100      | 0.177 |
| AR     | 0.50:0.95 | small| 100      | 0.061 |
| AR     | 0.50:0.95 | medium| 100     | 0.172 |
| AR     | 0.50:0.95 | large | 100     | 0.289 |

performing worse in almost every metric. 

We noticed that was not getting detected well at all, and assumed that the glass annotations were skewing the results as they make up a large percentage of the dataset.
We attempted to improve this in numerous ways. We decided to run the model on larger images, assuming that cigarettes was too small to detect. This took a whole day to train and the results were worse. We attempted to run the project using a verison of yolo that is designed for smaller images, v8-p2, this also did not improve the results. We therefor decided to resplit the dataset. 





### Attempt number 2:
The results we got above did not satisfy us, and we wanted to refine the split even more. We decided that as well as material, shape and size should be taken into account when splitting the categories. 
We now split the data into: Bags, Bottles,cups and cans, Non_recyclables, Other, Other plastic, Paper, Small waste. As each category now contatined objects that were similar in many aspects, thhis seemed to us a better split. 
The best results we got where:


<figure>
    <center><img src="imgs/confMatSplt3.png" /></center>
    <figcaption>
    Confusion Matrix from the second attempt to split categories
    </figcaption>
</figure>



The mAP again was better than the original split, at 0.18, but worse that the first split. We did however get a much better COCO evaluation. 




# COCO EVAL 
for the third split we got the following COCO evaluation results:
| Metric | IoU Range | Area | Max Dets | Value |
|--------|-----------|------|----------|-------|
| AP     | 0.50:0.95 | all  | 100      | 0.129 |
| AP     | 0.50      | all  | 100      | 0.169 |
| AP     | 0.75      | all  | 100      | 0.144 |
| AP     | 0.50:0.95 | small| 100      | 0.067 |
| AP     | 0.50:0.95 | medium| 100     | 0.155 |
| AP     | 0.50:0.95 | large | 100     | 0.200 |
| AR     | 0.50:0.95 | all  | 1        | 0.216 |
| AR     | 0.50:0.95 | all  | 10       | 0.319 |
| AR     | 0.50:0.95 | all  | 100      | 0.337 |
| AR     | 0.50:0.95 | small| 100      | 0.280 |
| AR     | 0.50:0.95 | medium| 100     | 0.358 |
| AR     | 0.50:0.95 | large | 100     | 0.520 |

We can see that we get slighlty more balanced results, with over al higher metrics and sensitivity. It seems that our resplit resulted in a better overall model. The new split resulted in a model that generally more accurate, with higher accuracy and recall. The model is also more accurate at higher IoU thresholds, and is overall more balanced, providing better results accross multiple metrics. 


# Summary
Overall, we did not succeed in getting ground breaking results with our model on the TACO dataset. However we also assert that our results are nothing to be ashamed of given the limitations of the task. The Yolo model did quite well for such a limited task. 