In [None]:
# pip install -U fiftyone sahi ultralytics huggingface_hub 

In [None]:
import fiftyone as fo
import fiftyone.utils.huggingface as fouh

# Load the dataset from Hugging Face if it's your first time using it

# test_dataset = fouh.load_from_hub(
#     "Voxel51/Coursera_lecture_dataset_test", 
#     dataset_name="lecture_dataset_test", 
#     persistent=True
#     )

In [None]:
dataset = fo.load_dataset("lecture_dataset_test_clone")

# SAHI


<img src="https://raw.githubusercontent.com/obss/sahi/main/resources/sliced_inference.gif">

Image source: [SAHI GitHub Repo](https://github.com/obss/sahi)

SAHI divides large images into smaller, overlapping slices. This makes small objects to appear relatively larger within each slice, and easier for the model to detect. The model performs detection on each slice independently, potentially capturing small objects that might be undetected in the full image.

In a nutshell, here's how it works:

1. SAHI takes a the full image as input.

2. The input image is divided into smaller, overlapping slices. The slice size and overlap ratio are configurable parameters.

3. The object detection model processes each slice independently.

4. The chosen object detection model performs inference on each slice. Note: SAHI can be integrated with various object detection models, including YOLO series, without requiring modifications to the underlying detector

5. The coordinates of detected objects in each slice are transformed back to the original image's coordinate system .

6. Detections from all slices are collected and combined.

7. Duplicate detections from overlapping slices are merged or filtered, often using non-maximum suppression (NMS).

8. A consolidated list of detections for the original image is produced .

Keep in mind that inference times will be longer than the original inference time. 

This is because we're running the model on multiple slices *per* image, which increases the number of forward passes the model has to make. This is a trade-off we're making to improve the detection of small objects.

To use SAHI start by running `pip install sahi` in your terminal or notebook. 

Then you pass the path of your trained detection model to create an instance of SAHI's `AutoDetectionModel` class.

In [None]:
import urllib.request

url = "https://huggingface.co/harpreetsahota/coursera_week1_lesson7/resolve/main/model.pt"
output_path = "./model.pt"

urllib.request.urlretrieve(url, output_path)

In [None]:
from sahi import AutoDetectionModel
from sahi.predict import get_prediction, get_sliced_prediction

ckpt_path = "model.pt" #this will be the path to the best_model you trained in the previous module. 

detection_model = AutoDetectionModel.from_pretrained(
    model_type='yolov8',
    model_path=ckpt_path,
    confidence_threshold=0.25,
    image_size=640,
    # device="cuda", # if you have a GPU
)

To get a sense of what the output looks like, use SAHI's `get_prediction` function:

In [None]:
result = get_prediction(dataset.first().filepath, detection_model, verbose=0)
print(result)

SAHI results objects have a `to_fiftyone_detections()` method, which converts the results to FiftyOne detections:

In [None]:
print(result.to_fiftyone_detections())

SAHI's `get_sliced_prediction()` function works in the same way as `get_prediction()`, with a few additional hyperparameters that let us configure how the image is sliced. In particular, we can specify the slice height and width, and the overlap between slices. Here's an example:

In [None]:
sliced_result = get_sliced_prediction(
    dataset.skip(40).first().filepath,
    detection_model,
    slice_height = 320,
    slice_width = 320,
    overlap_height_ratio = 0.2,
    overlap_width_ratio = 0.2,
)

Now compare the number of detections in the sliced predictions to the number of detections in the original predictions:

In [None]:
num_sliced_dets = len(sliced_result.to_fiftyone_detections())
num_orig_dets = len(result.to_fiftyone_detections())

print(f"Detections predicted without slicing: {num_orig_dets}")
print(f"Detections predicted with slicing: {num_sliced_dets}")

Notice the change in the number of predictions.

Next, we'll use [FiftyOne's Evaluation API](https://docs.voxel51.com/user_guide/evaluation.html) to determine if the additional predictions are valid or just false positives.

Our goal is to find optimal hyperparameters for slicing. We'll apply SAHI to the entire dataset.

The function below performs sliced predictions on a sample and adds the results to a specified label field. We'll iterate over the dataset, passing each sample's filepath and slicing hyperparameters to `get_sliced_prediction()`, and then add the predictions to the sample:

In [None]:
def predict_with_slicing(sample, label_field, **kwargs):
    """
    Perform sliced prediction on a sample and add the results to a specified label field.

    This function uses SAHI's get_sliced_prediction to perform object detection on
    slices of the image, then converts the results to FiftyOne Detections and adds
    them to the sample.

    Args:
        sample (fiftyone.core.sample.Sample): The FiftyOne sample to process.
        label_field (str): The name of the field to store the predictions in.
        **kwargs: Additional keyword arguments to pass to get_sliced_prediction.

    Returns:
        None. The function modifies the sample in-place.

    Note:
        This function assumes that a global 'detection_model' object is available,
        which should be an instance of a SAHI-compatible detection model.
    """
    result = get_sliced_prediction(
        sample.filepath, detection_model, verbose=0, **kwargs
    )
    sample[label_field] = fo.Detections(detections=result.to_fiftyone_detections())

We'll keep the slice overlap fixed at $0.2$, and see how the slice height and width affect the quality of the predictions:

In [None]:
kwargs = {"overlap_height_ratio": 0.2, "overlap_width_ratio": 0.2}

small_sample = dataset.take(100, seed=51)

for sample in small_sample.iter_samples(progress=True, autosave=True):
    predict_with_slicing(sample, label_field="small_slices", slice_height=320, slice_width=320, **kwargs)
    predict_with_slicing(sample, label_field="large_slices", slice_height=480, slice_width=480, **kwargs)

Let's run an evaluation routine comparing our predictions from each of the prediction label fields to the ground truth labels. 

Using the `evaluate_detections()` method will mark each detection as a true positive, false positive, or false negative. Here we use the default IoU threshold of $0.5$, but you can adjust this as needed.

Note that this will take some time!

In [None]:
base_results = small_sample.evaluate_detections("baseline_predictions", gt_field="ground_truth", eval_key="eval_base_model")

large_slice_results = small_sample.evaluate_detections("large_slices", gt_field="ground_truth", eval_key="eval_large_slices")

small_slice_results = small_sample.evaluate_detections("small_slices", gt_field="ground_truth", eval_key="eval_small_slices")

In [None]:
print("Base model results:")
base_results.print_report()

print("-" * 50)
print("Large slice results:")
large_slice_results.print_report()

print("-" * 50)
print("Small slice results:")
small_slice_results.print_report()

We can see that as we introduce more slices, the number of false positives increases, while the number of false negatives decreases. This is expected, as the model is able to detect more objects with more slices, but also makes more mistakes! You could apply more agressive confidence thresholding to combat this increase in false positives, but even without doing this the $F_1$-score has significantly improved.

### Evaluate performance on small objects 

Let's dive a little bit deeper into these results. We noted earlier that the model struggles with small objects, so let's see how these three approaches fare on objects smaller than $32 \times 32$ pixels. We can perform this filtering using FiftyOne's [ViewField](https://docs.voxel51.com/recipes/creating_views.html#View-expressions):

In [None]:
## Filtering for only small boxes
from fiftyone import ViewField as F

box_width, box_height = F("bounding_box")[2], F("bounding_box")[3]
rel_bbox_area = box_width * box_height

im_width, im_height = F("$metadata.width"), F("$metadata.height")
abs_area = rel_bbox_area * im_width * im_height

small_boxes_view = small_sample.filter_labels("ground_truth.detections", abs_area < 32**2, only_matches=False)

In [None]:
small_boxes_base_results = small_boxes_view.evaluate_detections("baseline_predictions", gt_field="ground_truth", eval_key="eval_small_boxes_base_model")

small_boxes_large_slice_results = small_boxes_view.evaluate_detections("large_slices", gt_field="ground_truth", eval_key="eval_small_boxes_large_slices")

small_boxes_small_slice_results = small_boxes_view.evaluate_detections("small_slices", gt_field="ground_truth", eval_key="eval_small_boxes_small_slices")

In [None]:
print("Small Box — Base model results:")
small_boxes_base_results.print_report()

print("-" * 50)
print("Small Box — Large slice results:")
small_boxes_large_slice_results.print_report()

print("-" * 50)
print("Small Box — Small slice results:")
small_boxes_small_slice_results.print_report()

This makes the value of SAHI crystal clear! The recall when using SAHI is much higher for small objects without significant dropoff in precision, leading to improved F1-score. This is especially pronounced for `` detections, where the $F_1$ score is tripled!


### Edge cases
Now that we know SAHI is effective at detecting small objects, let's look at the places where our predictions are most confident but do not align with the ground truth labels. We can do this by creating an evaluation patches view, filtering for predictions tagged as false positives and sorting by confidence:

In [None]:
high_conf_fp_view = small_sample.to_evaluation_patches(eval_key="eval_small_slices").match(F("type")=="fp").sort_by("small_slices.detection.confidence")

In [None]:
fo.launch_app(high_conf_fp_view)

Our predictions are mostly accurate, but some ground truth labels are missing. Implementing human-in-the-loop (HITL) workflows can help correct this. We can then re-evaluate our models and train new ones with the updated data.

##### Required Reading

- [Albumentations Integration](https://docs.voxel51.com/integrations/albumentations.html)


If you ever need assistance, have more complex questions, or want to keep in touch, feel free to join the Voxel51 community Discord server [here](https://discord.gg/QAyfnUhfpw)