In [None]:
import os
from datetime import datetime
import fiftyone as fo
import fiftyone.utils.random as four
from ultralytics import YOLO

In [None]:
# Load the dataset from Hugging Face if it's your first time using it
# import fiftyone.utils.huggingface as fouh

# train_dataset = fouh.load_from_hub(
#     "Voxel51/Coursera_lecture_dataset_train", 
#     dataset_name="lecture_dataset_train", 
#     persistent=True
#     )

# test_dataset = fouh.load_from_hub(
#     "Voxel51/Coursera_lecture_dataset_test", 
#     dataset_name="lecture_dataset_test", 
#     persistent=True
#     )

In [None]:
#because I have the dataset saved locally, I will load it like so
train_dataset = fo.load_dataset("lecture_dataset_train_clone")

test_dataset = fo.load_dataset(name="lecture_dataset_test_clone")

In [None]:
train_dataset = train_dataset.take(100)
test_dataset = test_dataset.take(100)

We'll first train a model. 

The code here isn't the star of the show, but I'll briefly describe what we're doing. After this lesson, we'll simply import this code from some helper file. This code defines a pipeline for training and evaluating a YOLO object detection model using the FiftyOne library. The main steps are:

1. Export dataset to YOLO format. You can learn more about converting dataset formats [here](https://docs.voxel51.com/recipes/convert_datasets.html).

2. Train YOLO model on the formatted dataset

3. Run inference on evaluation set

4. Evaluate model performance




You can learn more about the hypeparameters for the Ultralytics model [here](https://docs.ultralytics.com/modes/train/#train-settings).

The following code will add tags for `train` and `val` to the Dataset and then converts a FiftyOne dataset to YOLO format.

In [None]:
four.random_split(train_dataset, {"train": 0.90, "val": 0.10})

train_dataset.export(
    export_dir="./model_training/data",
    dataset_type=fo.types.YOLOv5Dataset,
    label_field="ground_truth",
    classes=train_dataset.default_classes,
    split='train'
)

train_dataset.export(
    export_dir="./model_training/data",
    dataset_type=fo.types.YOLOv5Dataset,
    label_field="ground_truth",
    classes=train_dataset.default_classes,
    split='val'
)

We can go ahead and instantiate a model like so:

In [None]:
model = YOLO("yolov10m.pt")

### Model training hyperparameters

Here are some recommendations for training YOLOv8m on images with small detections, similar looking objects, possibly mixed up labels, and a large number of detections per image:

1. Image size: Use a larger input image size to help with small object detection. Consider using `imgsz=1280` or even `1536` if your GPU memory allows.

2. Mosaic and scale augmentations: Enable strong mosaic and scale augmentations to help with small object detection and similar looking objects.

   ```python
   model.train(data=dataset.yaml, imgsz=1280, epochs=30, batch=16, 
               mosaic=1.0, scale=0.9)
   ```

3. Anchor optimization: YOLOv8 is anchor-free, but you can still optimize detection parameters:

   ```python
   model.train(data=dataset.yaml, imgsz=1280, epochs=30, batch=16,
               overlap_mask=True, mask_ratio=4)
   ```

4. Learning rate: Use a lower initial learning rate and cosine learning rate scheduler:

   ```python
   model.train(data=dataset.yaml, imgsz=1280, epochs=30, batch=16,
               lr0=0.001, lrf=0.01)
   ```

5. Regularization: To help with possibly mixed up labels, use label smoothing and increased weight decay:

   ```python
   model.train(data=dataset.yaml, imgsz=1280, epochs=30, batch=16,
               label_smoothing=0.1, weight_decay=0.0005)
   ```

6. Data augmentation: Use strong augmentations to help with similar looking objects:

   ```python
   model.train(data=dataset.yaml, imgsz=1280, epochs=30, batch=16,
               degrees=45, translate=0.2, scale=0.9, shear=10, 
               perspective=0.001, flipud=0.5, fliplr=0.5)
   ```

7. Focal loss: Consider using focal loss to help with class imbalance due to many detections per image:

   ```python
   model.train(data=dataset.yaml, imgsz=1280, epochs=30, batch=16,
               dfl=1.5)
   ```

8. Mixed precision training: Enable AMP for faster training:

   ```python
   model.train(data=dataset.yaml, imgsz=1280, epochs=30, batch=16, amp=True)
   ```

9. Patience and epochs: Train for a longer time with patience for early stopping:

   ```python
   model.train(data=dataset.yaml, imgsz=1280, epochs=30, batch=16,
               patience=50)
   ```

### I'm just going to combine these settings into a single training config, and I'll use the same settings throughout the course.

In [None]:
training_config = {
    # Dataset split
    "train_split": 0.9,
    "val_split": 0.1,

    # Training parameters
    "train_params": {
        "epochs": 100,
        "batch": 16,
        "imgsz": 640, # just keep in mind that your gpu might not be able to handle large image sizes
        "lr0": 0.001,
        "lrf": 0.01,
        "momentum": 0.937,
        "weight_decay": 0.0005,
        "warmup_epochs": 3.0,
        "warmup_momentum": 0.8,
        "warmup_bias_lr": 0.1,
        "box": 7.5,
        "cls": 0.5,
        "dfl": 1.5,
        "label_smoothing": 0.1,
        "nbs": 64,
        "hsv_h": 0.015,
        "hsv_s": 0.7,
        "hsv_v": 0.4,
        "degrees": 45,
        "translate": 0.2,
        "scale": 0.9,
        "shear": 10,
        "perspective": 0.001,
        "flipud": 0.5,
        "fliplr": 0.5,
        "mosaic": 1.0,
        "mixup": 0.1,
        "erasing":0.25,
        "copy_paste": 0.1,
        "amp": True,
        "overlap_mask": True,
        "mask_ratio": 4,
        "patience": 50
    }
}

In [None]:
results = model.train(
    data="./yolo_formatted/dataset.yaml",
    **training_config['train_params']
)

We can get the best trained model like so:

In [None]:
best_model_path = str(results.save_dir / "weights/best.pt")

best_model = YOLO(best_model_path)

Once we have the best trained model, we can apply it to the evaluation set using the `apply_model` method of the Dataset object. Visit [the docs](https://docs.voxel51.com/api/fiftyone.core.dataset.html#fiftyone.core.dataset.Dataset.apply_model) for more detail on the `apply_model` function.

eval_dataset.apply_model(best_model, label_field="baseline_predictions")

Finally, we can evaluate the model using the built in `evaluate_detections` method of the Dataset object. You can read more about the `evaluate_detections` method [in the docs](https://docs.voxel51.com/api/fiftyone.core.dataset.html#fiftyone.core.dataset.Dataset.evaluate_detections), and check out [this tutorial](https://docs.voxel51.com/tutorials/evaluate_detections.html) for a different perspective on evaluations.

In [None]:
detection_results = test_dataset.evaluate_detections(
    gt_field="ground_truth",  
    pred_field="predictions",
    eval_key=f"evalrun_baseline_predictions",
    compute_mAP=True,
    )

We end up with a subclass of [`DetectionResults`](https://docs.voxel51.com/api/fiftyone.core.evaluation.html#fiftyone.core.evaluation.EvaluationResults), in this case we have a [`COCODetectionResults`](https://docs.voxel51.com/api/fiftyone.utils.eval.coco.html#fiftyone.utils.eval.coco.COCODetectionResults) object.

When running `evaluate_detections()` the default evaluation is COCO-style evaluation (we won't worry about other evaluation styles):

 - Predicted and ground truth objects are matched using a specified IoU threshold (default = 0.50). This threshold can be customized via the iou parameter

 - By default, only objects with the same label will be matched. Classwise matching can be disabled via the classwise parameter. Classwise means whether to only match objects with the same class label or allow matches between classes. 

 - Ground truth objects can have an `iscrowd` attribute that indicates whether the annotation contains a crowd of objects. Multiple predictions can be matched to crowd ground truth objects. The name of this attribute can be customized by passing the optional `iscrowd` attribute of [`COCOEvaluationConfig`](https://docs.voxel51.com/api/fiftyone.utils.eval.coco.html#fiftyone.utils.eval.coco.COCOEvaluationConfig) to `evaluate_detections()`

In [None]:
type(detection_results)

# Evaluate detections

Let's take a look at the new fields that have been added to our test dataset. You'll notice:

- `predictions`

- `evalrun_..._tp`

- `evalrun_..._fp`

- `evalrun_..._fn`

In [None]:
test_dataset

First, visually inspect the results

In [None]:
fo.launch_app(test_dataset)

In [None]:
# Print some statistics about the total TP/FP/FN counts
print("TP: %d" % test_dataset.sum("evalrun_..._tp"))
print("FP: %d" % test_dataset.sum("evalrun_..._fp"))
print("FN: %d" % test_dataset.sum("evalrun_..._fn"))

Create a view that has samples with the most false positives first, and only includes false positive boxes in the `predictions` field

In [None]:
from fiftyone import ViewField as F

view = (
    test_dataset
    .sort_by("evalrun_..._fp", reverse=True)
    .filter_labels("predictions", F("eval") == "fp")
)

In [None]:
fo.launch_app(view)

We can get the overall mean Average Precision by calling the `mAP` method on the results object.

The mAP is calculated based on  [`cocoeval`](https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/cocoeval.py)

To understand how Mean Average Precision (mAP) is calculated in this codebase, let's break down the key steps:

1. Evaluation preparation:
   - The code prepares ground truth (gt) and detection (dt) data for each image and category.
   - It computes IoU (Intersection over Union) between gt and dt objects.

2. Per-image evaluation:
   - For each image, category, area range, and max detection number:
     - It matches detections to ground truths based on IoU thresholds.
     - It tracks which detections match which ground truths, and which are ignored.

3. Accumulation of results:
   - It calculates precision and recall values for various IoU thresholds, categories, area ranges, and max detection numbers.

4. Precision-Recall curve:
   - For each combination of IoU threshold, category, area range, and max detection number:
     - It sorts detections by score.
     - It computes cumulative true positives (tp) and false positives (fp).
     - It calculates precision and recall at each detection.

5. Average Precision calculation:
   - For each precision-recall curve:
     - It interpolates the precision values at specific recall thresholds (0 to 1 with step 0.01).
     - The average of these interpolated precision values gives the Average Precision (AP) for that specific setting.

6. Mean Average Precision:
   - The mAP is typically the mean of the AP values across different IoU thresholds, categories, or other dimensions, depending on the specific metric being reported.

The mAP is calculated by averaging the AP values across the desired dimensions (e.g., IoU thresholds, categories).

In [None]:
results.mAP()

We can get a report of the results for all classes:

In [None]:
results.print_report()

Or, for just one class:

In [None]:
results.print_report(classes = ["jacket"])

We can also plot the precision-recall curves and confusion matrix like so:

In [None]:
results.plot_pr_curves()

In [None]:
results.plot_confusion_matrix()

We'll build on this foundation throughout the rest of the course!

Required reading for this lesson is the ['Evaluating Object Detections with FiftyOne'](https://docs.voxel51.com/tutorials/evaluate_detections.html) docs page, which you can expect to have questions about in the quiz.


If you ever need assistance, have more complex questions, or want to keep in touch, feel free to join the Voxel51 community Discord server [here](https://discord.gg/QAyfnUhfpw)