# Detect Small Objects with `InferenceSlicer`

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow/supervision/blob/develop/docs/notebooks/small-object-detection-with-sahi.ipynb)
[![Roboflow](https://raw.githubusercontent.com/roboflow-ai/notebooks/main/assets/badges/roboflow-blogpost.svg)](https://blog.roboflow.com/detect-small-objects/)
[![arXiv](https://img.shields.io/badge/arXiv-2401.17270-b31b1b.svg)](https://arxiv.org/abs/2202.06934)


This cookbook shows how to use [Slicing Aided Hyper Inference (SAHI) ](https://arxiv.org/abs/2202.06934) for small object detection with `supervision`.

!["Small Object Detection"](https://raw.githubusercontent.com/ediardo/notebooks/main/sahi/animation.gif "Small Object Detection")

Click the Open in Colab button to run the cookbook on Google Colab.

### Before you start

You'll need:

- A free Roboflow account. Don't have one? [Create one here](https://app.roboflow.com/login).
- An API key from Roboflow. Need help getting one? [Learn more here](https://docs.roboflow.com/api-reference/authentication).


## Install required packages

Let's install the dependencies for this project. Here's a list of what



*   `inference`: a package by Roboflow for easy deployment of computer vision models.
*   `supervision`: a package by Roboflow that provides utilities for building and managing computer vision applications. The `[assets]` keyword adds the optional dependencies.
*   `opencv-python`: A library for computer vision tasks, including image processing and object detection.
*   `numpy`: a core library for numerical computing with powerful array and matrix operations.
*  `leafmap`: a tool for creating interactive visualizations.


In [None]:
# TODO: change to pip install supervision when https://github.com/roboflow/supervision/pull/1434 is released on 0.23.0
%pip install git+https://github.com/roboflow/supervision.git@develop

In [None]:
%pip install inference opencv-python numpy leafmap

## Crowd counting with Computer Vision

How would you go about solving the problem of counting people in crowds? After some tests, I found that the best approach is to detect people’s heads. Other body parts are likely occluded by other people, but heads are usually exposed, especially in aerial or high-level shots.

### Using an Open-Source Public Model for People Detection

Detecting people (or their heads) is a common problem that has been addressed by many researchers in the past. In this project, we’ll use an open-source public dataset and a fine-tuned model to perform inference on images.

![Roboflow Universe](https://raw.githubusercontent.com/ediardo/notebooks/main/sahi/roboflow_universe.png "Open source model for counting people's heads")

Some details about the project ["people_counterv0 Computer Vision Project"](https://universe.roboflow.com/sit-cx0ng/people_counterv0) hosted on Roboflow Universe:

- Dataset of 4,574 images
- mAP=49.2% / Precision=74.5% / Recall=39.2
- Model: Roboflow 2.0 Object Detection (fast)
- Checkpoint: COCOv6n
- Created by: [SIT](https://universe.roboflow.com/sit-cx0ng)

### Download the image

Run the code below to download the image we'll use in this cookbook.

In [None]:
import requests
import matplotlib.pyplot as plt
import cv2
import supervision as sv

def download_image(url):
  target_name = "human_tower.jpg"    
  try:
    print("Downloading image from wikimedia.org...")
    response = requests.get(url, headers={'User-Agent': 'YourCustomUserAgent/1.0'})
    response.raise_for_status()  # Raises an error for bad responses
        
    with open(target_name, "wb") as file:
      file.write(response.content)
      print("Download complete")
      return target_name
    
  except requests.exceptions.RequestException as e:
    print(f"Failed to download the image: {e}")
    return None

image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/d0/4_de_8_amb_l%27agulla_carregat_Castellers_de_Barcelona_%2821937141066%29.jpg/2560px-4_de_8_amb_l%27agulla_carregat_Castellers_de_Barcelona_%2821937141066%29.jpg"
image_path = download_image(image_url)

if image_path is None:
    print("Could not download the image...")
else:
    image = cv2.imread(image_path)
    image_wh = (image.shape[1], image.shape[0])
    print(f"Image shape: {image_wh[0]}w x {image_wh[1]}h")
    sv.plot_image(image)

You're looking at a Castell, a human tower traditionally built at festivals in parts of Catalonia, Spain, and has since spread to the Balearic Islands and the Valencian Community. The source of the image is [here](https://commons.wikimedia.org/wiki/File:4_de_8_amb_l%27agulla_carregat_Castellers_de_Barcelona_(21937141066).jpg), and you could learn more about these human towers in [Wikipedia](https://en.wikipedia.org/wiki/Castell)



## Let's try our model's performance

Before we dive into the SAHI technique for small object detection, it’s useful to see how a fine-tuned model performs with the image as is—without any pre-processing or slicing. The goal is to understand when the model starts to fail so that we can progressively move towards an efficient slicing strategy.

Let’s run the model!

In [None]:
from inference_sdk import InferenceHTTPClient
import supervision as sv
import os

MODEL_ID = "people_counterv0/1"
API_KEY=os.environ["ROBOFLOW_API_KEY"] # <---Set you API key here

# Set your API key here. Retrieve your API key: https://docs.roboflow.com/api-reference/authentication
# from google.colab import userdata
# API_KEY = userdata.get("ROBOFLOW_API_KEY") 

client = InferenceHTTPClient(
  api_url="https://detect.roboflow.com",
  api_key=API_KEY,
)

# Run inference
results = client.infer(image_path, model_id=MODEL_ID)
detections = sv.Detections.from_inference(results)

print(f"Inference time: {results['time']} seconds")
print(f"Found {len(detections)} people")

COLOR_BBOX_PEOPLE=sv.ColorPalette.DEFAULT.colors[6]

bbox_annotator = sv.BoxAnnotator(
    color=COLOR_BBOX_PEOPLE,
    thickness=2
)

# Annotate our image with detections.
image_no_sahi = bbox_annotator.annotate(scene=image.copy(), detections=detections)

sv.plot_image(image_no_sahi)
cv2.imwrite("1x1.jpg", image_no_sahi)


The model shows strong performance in detecting people in the lower half of the image, but it struggles to accurately predict boxes in the upper half. This suggests two key insights: first, the model is proficient at identifying people’s heads from various angles, and second, using SAHI could effectively address the detection challenges in the upper portion of the image. Now, it’s time to try SAHI!

## Using SAHI for small object detection

Detecting small objects with CNNs can be tricky because small pixel groups provide fewer features for deeper layers, making details easy to miss. SAHI addresses this by:

1. Image Slicing: Dividing the high-resolution image into smaller, overlapping tiles, allowing the model to “zoom in” on details.
2. Hyper Inference: Aggregating results from all slices into a coherent set of detections.

By using SAHI, models like YOLOv8 can greatly improve small object detection, overcoming the usual challenges with feature representation and resolution.

![SAHI](https://raw.githubusercontent.com/obss/sahi/main/resources/sliced_inference.gif "something")

SAHI can be seen as a framework for addressing the small object detection problem. If you’re interested in learning more about the motivations behind this solution, you can read the paper [Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection](https://arxiv.org/abs/2202.06934).

## Slicing our image with `supervision`

Let’s begin by visualizing how these tiles would appear on our image. I’ll start with a small set of 2x2 tiles, with a zero overlap both vertically (height) and horizontally (width) between the tiles. The final values of these parameters will ultimately depend on your use case, so trial and error is encouraged!

Some of the methods below are for visualizing the tiles and overlapping. You'll only need the `calculate_tile_size` method in your application to calculate the size of the tiles.

### Utility functions for visualizing tiles

In [None]:
import numpy as np
from inference import get_model
import math

def tile_image(image_shape, slice_wh, overlap_wh)-> np.ndarray:
    """
    Computes the coordinates and dimensions of tiles for an image with specified slicing and overlap parameters.
    
    Parameters:
    -----------
    image_shape : tuple of int
        The shape of the image to be tiled, specified as (width, height).
    
    slice_wh : tuple of int
        The dimensions (width, height) of each tile.
    
    overlap_wh : tuple of int
        The overlap dimensions (width, height) between adjacent tiles.
    
    Returns:
    --------
    np.ndarray
        An array of offsets where each row represents a tile's bounding box 
        in the format (x1, y1, x2, y2). The coordinates are rounded up to the nearest integer.
    
    """
    offsets = sv.InferenceSlicer._generate_offset(
        resolution_wh=image_shape,
        slice_wh=slice_wh,
        overlap_ratio_wh=None,
        overlap_wh=overlap_wh
    )
    
    offsets = np.ceil(offsets).astype(int)
    
    return offsets

def draw_transparent_tiles(scene: np.ndarray, x: int, y: int, w:int, h:int, index: int = None):
    """
    Draws a transparent tile with an optional index label on the given scene.

    Parameters:
    -----------
    scene : np.ndarray
        The input image on which the transparent tile will be drawn. It should be a NumPy array representing the image.

    x : int
        The x-coordinate of the top-left corner of the tile.

    y : int
        The y-coordinate of the top-left corner of the tile.

    w : int
        The width of the tile.

    h : int
        The height of the tile.

    index : int, optional, default=None
        If provided, draws the index number in the center of the tile. If None, no index number is drawn.

    Returns:
    --------
    np.ndarray
        A new image with a transparent tile and optional index label drawn on it.
    """
    alpha=0.18
    overlay_image = scene.copy()
    
    # Generate a mask for the tile
    rectangle = np.zeros((h, w, 3), dtype=np.uint8)
    rectangle.fill(255)
    
    rect = sv.Rect(x=x, y=y, width=w, height=h)
    overlay_image = sv.draw_image(scene=overlay_image, image=rectangle, opacity=alpha, rect=rect)
    
    # Draw a border around the edge of the mask
    border_color = sv.Color.BLACK
    border_thickness=2
    overlay_image = sv.draw_rectangle(scene=overlay_image,rect=sv.Rect(x=x, y=y, width=w, height=h), color=border_color, thickness=border_thickness)
    
    # Draw index number on the center of the tile after blending
    if index is not None:
        text_anchor = sv.Point(x=x + w // 2, y=y + h // 2)
        # Calculate the center of the rectangle
        center_x = rect.x + rect.width // 2
        center_y = rect.y + rect.height // 2
        
        # Define the text anchor point as the center of the rectangle
        text_anchor = sv.Point(x=center_x, y=center_y)
        text_scale=3
        text_thickness=5
        text=str(index + 1)
        text_color=sv.Color.BLACK
        # Draw the text on the scene
        overlay_image = sv.draw_text(overlay_image, text, text_anchor,  text_color, text_scale, text_thickness)
        
    return overlay_image

def draw_tiles(scene: np.ndarray, offsets, show_index=False):
    """
    Draws transparent tiles on a scene based on the given offsets.

    Parameters:
    -----------
    scene : np.ndarray
        The input image on which tiles will be drawn. It should be a NumPy array representing the image.
    
    offsets : list of tuples
        A list of tuples where each tuple represents the bounding box of a tile in the format (x1, y1, x2, y2).
        - (x1, y1) is the top-left corner of the tile.
        - (x2, y2) is the bottom-right corner of the tile.

    show_index : bool, optional, default=False
        If True, draws the index number of each tile on the scene. Otherwise, the tiles are drawn without indices.

    Returns:
    --------
    np.ndarray
        A new image with transparent tiles drawn on it.

    """
    tiled_image = scene.copy()
    
    for index, offset in enumerate(offsets):
        x = offset[0]
        y = offset[1]
        width = offset[2] - x
        height = offset[3] - y
    
        if (show_index):
            tiled_image = draw_transparent_tiles(scene=tiled_image, x=x, y=y, w=width, h=height, index=index)
        else:
            tiled_image = draw_transparent_tiles(scene=tiled_image, x=x, y=y, w=width, h=height, index=None)
    
    return tiled_image

def print_offsets(offsets):
    for index, offset in enumerate(offsets):
      w = (offset[2] - offset[0])
      h = (offset[3] - offset[1])
      print(f"Tile {index + 1}")
      print(f"  w={w}, h={h}, x1={offset[0]}, y1={offset[1]}, x2={offset[2]}, y2={offset[3]}, area={w*h}")

### Calculate Tile Size

The `calculate_tile_size` function computes the dimensions of each tile when dividing an image into a grid, considering both the image’s size and a specified tiling strategy. It takes the image dimensions (`image_shape`), the number of rows and columns for the grid (`tiles`), and an optional overlap ratio (`overlap_ratio_wh`) to determine how much adjacent tiles overlap. The function returns the adjusted tile size, including the overlap, and the specific overlap dimensions.

**NOTE**: As of `supervision==0.22.0` you need to provide the tile size. This function calculates it for you.

In [None]:
def calculate_tile_size(image_shape: tuple[int, int], tiles: tuple[int, int], overlap_ratio_wh: tuple[float, float] = (0.0, 0.0)):
    """
    Calculate the size of the tiles based on the image shape, the number of tiles, and the overlap ratio.

    Parameters:
    ----------
    image_shape : tuple[int, int]
        The dimensions of the image as (width, height).
    
    tiles : tuple[int, int]
        The tiling strategy defined as (rows, columns), specifying the number of tiles along the height and width of the image.
    
    overlap_ratio_wh : tuple[float, float], optional
        The overlap ratio for width and height as (overlap_ratio_w, overlap_ratio_h). This defines the fraction of overlap between adjacent tiles. Default is (0.0, 0.0), meaning no overlap.

    Returns:
    -------
    tuple[tuple[int, int], tuple[int, int]]
        A tuple containing:
        - The size of each tile as (tile_width, tile_height), accounting for overlap.
        - The overlap dimensions as (overlap_width, overlap_height).

    Example:
    -------
    >>> image_shape = (1024, 768)
    >>> tiles = (4, 4)
    >>> overlap_ratio_wh = (0.15, 0.15)
    >>> calculate_tile_size(image_shape, tiles, overlap_ratio_wh)
    ((295, 221), (39, 29))

    Notes:
    -----
    - The function calculates the width and height of each tile based on the number of rows and columns specified.
    - It then applies the overlap ratio to adjust the tile size, ensuring that tiles overlap by a specified fraction.
    - The overlap dimensions are rounded up to the nearest integer to ensure complete coverage.
    """

    w, h = image_shape
    rows, columns = tiles
    
    tile_width = (w / columns)
    tile_height = (h / rows)
    
    overlap_ratio_w, overlap_ratio_h = overlap_ratio_wh
    overlap_wh = (math.ceil(tile_width * overlap_ratio_w), math.ceil(tile_height * overlap_ratio_h))
    
    tile_width = math.ceil(tile_width + overlap_wh[0])
    tile_height = math.ceil(tile_height + overlap_wh[1])
    tile_size = (tile_width, tile_height)
       
    return tile_size, overlap_wh

In [None]:
    
tiles = (2,2) # The number of tiles you want
overlap_ratio_wh = (0.0, 0.0) # The overlap between tiles
slice_wh, overlap_wh = calculate_tile_size(image_wh, tiles, overlap_ratio_wh)
offsets = tile_image(image_wh, slice_wh, overlap_wh)

print(f"Image shape: {image_wh[0]}w x {image_wh[1]}h")
print(f"Tiles: {tiles}")
print(slice_wh)
print(f"Generated {len(offsets)} tiles. These are the calculated dimensions")
print_offsets(offsets)

tiled_image = draw_tiles(scene=image.copy(), offsets=offsets, show_index=True)

sv.plot_image(tiled_image)
# cv2.imwrite("2x2.jpg", image_no_sahi)


You can see that the image has been sliced into four different tiles. Next, each tile will be independently processed by the model, and supervision will merge all the predictions into a coherent set of detections. Notice that we're not using overlapping in at this time (more on that later).

## Run Inference on a Sliced Image With `supervision`

Running inference on slices of your image is easy with the class `InferenceSlicer` from [Supervision](https://supervision.roboflow.com/latest/detection/tools/inference_slicer/#inferenceslicer). This API from Roboflow divides a larger image into smaller slices, performs inference on each slice, and then merges the detections into a single `detections` object.


In [None]:
import time
import leafmap

def callback(image_slice: np.ndarray) -> sv.Detections:
  result = get_model(model_id=MODEL_ID, api_key=API_KEY).infer(image_slice )[0]
  return sv.Detections.from_inference(result)

tiles = (2,2) # The number of tiles you want
overlap_ratio_wh = (0.0, 0.0) # The overlap between tiles
slice_wh, overlap_wh = calculate_tile_size(image_wh, tiles, overlap_ratio_wh)
offsets = tile_image(image_wh, slice_wh, overlap_wh)

slicer = sv.InferenceSlicer(
  callback=callback,
  slice_wh=slice_wh,
  overlap_ratio_wh=None,
  overlap_wh=overlap_wh,
  thread_workers=4
)

start_time = time.time()
detections = slicer(image)
end_time = time.time()
elapsed_time = end_time - start_time

print(f"Image shape: {image_wh[0]}w x {image_wh[1]}h")
print(f"Tile size: {slice_wh[0]}w x {image_wh[1]}")
print(f"Overlap: {overlap_wh[0]}w x {overlap_wh[1]}h. Ratio {overlap_ratio_wh}")
print(f"Found {len(detections)} people")
print(f"Inference time: {elapsed_time} seconds")

tiled_image_with_sahi_2x2 = draw_tiles(scene=image.copy(), offsets=offsets, show_index=True)
tiled_image_with_sahi_2x2 = bbox_annotator.annotate(scene=tiled_image_with_sahi_2x2, detections=detections)

# For visualization
leafmap.image_comparison(
  img1=cv2.cvtColor(tiled_image_with_sahi_2x2, cv2.COLOR_BGR2RGB),
  img2=cv2.cvtColor(image_no_sahi, cv2.COLOR_BGR2RGB),
  label1="Slicing 2x2",
  label2="No Slicing",
  starting_position=50,
)

#cv2.imwrite("2x2_no_overlapping.jpg", bbox_annotator.annotate(scene=image.copy(), detections=detections))


Great! We’ve detected 726 people, up from the 185 we initially detected without image slicing. The model is still detecting people from different angles, but it continues to struggle with detecting people located in the farther parts of the plaza. It’s time to increase the number of tiles—in other words, zoom in so the model can capture more details of the small heads of people.

![Missing detections](https://raw.githubusercontent.com/ediardo/notebooks/main/sahi/detections.png)

### Increasing Tile Density: Moving to a 5x5 Grid

Now that we’ve seen improvements with a 2x2 grid, it’s time to push the model further. By increasing the number of tiles to a 5x5 grid, we effectively zoom in on the image, allowing the model to capture finer details, such as smaller and more distant features that might have been missed before. This approach will help us understand how well the model performs with even more zoomed-in images. Let’s explore how this change affects our detection accuracy and overall performance.

In [None]:
import time

def callback(image_slice: np.ndarray) -> sv.Detections:
  result = get_model(model_id=MODEL_ID, api_key=API_KEY).infer(image_slice )[0]
  return sv.Detections.from_inference(result)

tiles = (5,5) # The number of tiles you want
overlap_ratio_wh = (0.0, 0.0) # The overlap between tiles
slice_wh, overlap_wh = calculate_tile_size(image_wh, tiles, overlap_ratio_wh)
offsets = tile_image(image_wh, slice_wh, overlap_wh)

slicer = sv.InferenceSlicer(
  callback=callback,
  slice_wh=slice_wh,
  overlap_wh=overlap_wh,
  overlap_ratio_wh=None,
  thread_workers=4
)

start_time = time.time()
detections = slicer(image)
end_time = time.time()
elapsed_time = end_time - start_time

print(f"Image shape: {image_wh[0]}w x {image_wh[1]}h")
print(f"Tiles: {tiles}")
print(f"Tile size: {slice_wh[0]}w x {image_wh[1]}")
print(f"Overlap: {overlap_wh[0]}w x {overlap_wh[1]}h. Ratio {overlap_ratio_wh}")
print(f"Overlap filter: {sv.OverlapFilter.NON_MAX_SUPPRESSION}")
print(f"Found {len(detections)} people")
print(f"Inference time: {elapsed_time} seconds")

tiled_image_with_sahi_5x5 = draw_tiles(scene=image.copy(), offsets=offsets, show_index=True)
tiled_image_with_sahi_5x5 = bbox_annotator.annotate(scene=tiled_image_with_sahi_5x5, detections=detections)


# For visualization
leafmap.image_comparison(
  img1=cv2.cvtColor(tiled_image_with_sahi_5x5, cv2.COLOR_BGR2RGB),
  img2=cv2.cvtColor(tiled_image_with_sahi_2x2, cv2.COLOR_BGR2RGB),
  label1="Slicing 5x5",
  label2="Slicing 2x2",
  starting_position=50,
)

#cv2.imwrite("5x5_no_overlapping.jpg", bbox_annotator.annotate(scene=image.copy(), detections=detections))


We’ve just detected 1,494 people using a 25-tile grid (5 rows x 5 columns), a significant increase from the 726 people detected with the 4-tile (2x2) grid. However, as we increase the number of tiles, a new challenge arises: duplicate detections or missed detections along the edges of the tiles. This issue becomes evident in these examples, where overlapping or gaps between tiles lead to inaccuracies in our model’s detection.

| Example| Observations   |
|----|----| 
| ![Overlapping](https://github.com/ediardo/notebooks/blob/main/sahi/overlapping_1.png?raw=true "Overlapping") | False Negative, Incomplete bbox  | 
| ![Overlapping](https://raw.githubusercontent.com/ediardo/notebooks/main/sahi/overlapping_2.png "Overlapping")| Double detection, Incomplete bbox|
| ![Overlapping](https://raw.githubusercontent.com/ediardo/notebooks/main/sahi/overlapping_3.png "Overlapping")| Incomplete bounding box|

## Improving Object Detection Near Boundaries with Overlapping

When objects, like people, appear at the edges of tiles, they might be detected twice or missed entirely if they span across two tiles. This can lead to inaccurate detection results. To solve this, we use overlapping tiles, allowing the model to see parts of adjacent tiles simultaneously. This overlap helps ensure that objects near the boundaries are fully captured, reducing duplicates and improving accuracy.

We’ll set the overlap ratio to 15% on the tile’s width and height. This overlap helps ensure that objects near the boundaries are fully captured, reducing duplicates and improving accuracy.

In [None]:
tiles = (5,5) # The number of tiles you want
overlap_ratio_wh = (0.05, 0.05) # Ratio of overlapping, width/height
slice_wh, overlap_wh = calculate_tile_size(image_wh, tiles, overlap_ratio_wh)
offsets = tile_image(image_wh, slice_wh, overlap_wh)

slicer = sv.InferenceSlicer(
  callback=callback,
  #overlap_filter=sv.OverlapFilter.NON_MAX_MERGE,
  overlap_filter=sv.OverlapFilter.NON_MAX_SUPPRESSION,
  iou_threshold=0.75,
  slice_wh=slice_wh,
  overlap_ratio_wh=None,
  overlap_wh=overlap_wh,
  thread_workers=4
)
start_time = time.time()
detections = slicer(image)
end_time = time.time()
elapsed_time = end_time - start_time

print(f"Image shape: {image_wh[0]}w x {image_wh[1]}h")
print(f"Tiles: {tiles}")
print(f"Tile size: {slice_wh[0]}w x {image_wh[1]}")
print(f"Overlap: {overlap_wh[0]}w x {overlap_wh[1]}h. Ratio {overlap_ratio_wh}")
print(f"Overlap Filter: {sv.OverlapFilter.NON_MAX_SUPPRESSION}")
print(f"Found {len(detections)} people")
print(f"Inference time: {elapsed_time} seconds")
#print_offsets(offsets)

tiled_image_with_overlapping_nms_5x5 = draw_tiles(scene=image.copy(), offsets=offsets, show_index=True)
tiled_image_with_overlapping_nms_5x5 = bbox_annotator.annotate(scene=tiled_image_with_overlapping_nms_5x5, detections=detections)

leafmap.image_comparison(
    img1=cv2.cvtColor(tiled_image_with_overlapping_nms_5x5, cv2.COLOR_BGR2RGB),
    img2=cv2.cvtColor(tiled_image_with_sahi_5x5, cv2.COLOR_BGR2RGB),
    label1="Tiles with overlapping",
    label2="Tiles with no overlapping",
    starting_position=50,
    make_responsive=True
)

#cv2.imwrite("5x5_with_overlapping_nms.jpg", bbox_annotator.annotate(scene=image.copy(), detections=detections))

## Non-Max Supression vs Non-Max Merge 

When dealing with overlapping detections, it’s essential to determine which detections represent the same object and which are unique. Non-Maximum Suppression (NMS) and Non-Maximum Merging (NMM) are two techniques commonly used to address this challenge. NMS works by eliminating redundant detections based on confidence scores, while NMM combines overlapping detections to enhance the representation of objects spanning multiple tiles. Understanding the difference between these methods helps optimize object detection, particularly near tile boundaries.

In `supervision`, the `overlap_filter` parameter allows us to specify the strategy for handling overlapping detections in slices. This parameter can take on two values:

- `sv.OverlapFilter.NON_MAX_SUPRESSION` (default): Eliminates redundant detections by keeping the one with the highest confidence score.
- `sv.OverlapFilter.NON_MAX_MERGE`: Combines overlapping detections to create a more comprehensive representation of objects spanning multiple tiles.

It’s important to note that this method is not perfect and may require further testing and fine-tuning to achieve optimal results in various use cases. You should validate the outputs and adjust parameters as needed to handle specific scenarios effectively.

In [None]:
tiles = (5,5) # The number of tiles you want
overlap_ratio_wh = (0.05, 0.05)
slice_wh, overlap_wh = calculate_tile_size(image_wh, tiles, overlap_ratio_wh)
offsets = tile_image(image_wh, slice_wh, overlap_wh)
slicer = sv.InferenceSlicer(
  callback=callback,
  overlap_filter=sv.OverlapFilter.NON_MAX_MERGE,
  #overlap_filter=sv.OverlapFilter.NON_MAX_SUPPRESSION,
  iou_threshold=0.75,
  slice_wh=slice_wh,
  overlap_ratio_wh=None,
  overlap_wh=overlap_wh,
  thread_workers=4
)
start_time = time.time()
detections = slicer(image)
end_time = time.time()
elapsed_time = end_time - start_time

print(f"Image shape: {image_wh[0]}w x {image_wh[1]}h")
print(f"Tile size: {slice_wh[0]}w x {image_wh[1]}")
print(f"Overlap: {overlap_wh[0]}w x {overlap_wh[1]}h. Ratio {overlap_ratio_wh}")
print(f"Overlap Filter: {sv.OverlapFilter.NON_MAX_MERGE}")
print(f"Found {len(detections)} people")
print(f"Inference time: {elapsed_time} seconds")
#print_offsets(offsets)

tiled_image_with_overlapping_nmm_5x5 = draw_tiles(scene=image.copy(), offsets=offsets, show_index=True)
tiled_image_with_overlapping_nmm_5x5 = bbox_annotator.annotate(scene=tiled_image_with_overlapping_nmm_5x5, detections=detections)


leafmap.image_comparison(
    img1=cv2.cvtColor(tiled_image_with_overlapping_nmm_5x5, cv2.COLOR_BGR2RGB),
    img2=cv2.cvtColor(tiled_image_with_overlapping_nms_5x5, cv2.COLOR_BGR2RGB),
    label1="Overlapping with Non-Max Merging",
    label2="Overlapping with Non-Max Supression",
    starting_position=50,
    make_responsive=True
)

# cv2.imwrite("5x5_with_overlapping_nmm.jpg", bbox_annotator.annotate(scene=image.copy(), detections=detections))

## Conclusion

In this cookbook, we’ve explored the advantages of using the SAHI technique for enhancing small object detection and the importance of experimenting with various tiling strategies to effectively zoom into images. By combining these approaches, we can improve the accuracy and reliability of object detection models, particularly in challenging scenarios where objects are small or located near the boundaries of tiles. These methods offer practical solutions to common challenges in computer vision, empowering developers to build more robust and precise detection systems.

!["Crowd Detection"](https://raw.githubusercontent.com/ediardo/notebooks/main/sahi/5x5_with_overlapping_nmm.jpg "Crowd Detection")


## More resources

- `InferenceSlicer`: https://supervision.roboflow.com/detection/tools/inference_slicer/
- Detect Small Objects https://supervision.roboflow.com/latest/how_to/detect_small_objects/
- What is Non-Max Merging?: https://blog.roboflow.com/non-max-merging/
- How to Detect Small Objects: A Guide https://blog.roboflow.com/detect-small-objects/
- How to Use SAHI to Detect Small Objects: https://blog.roboflow.com/how-to-use-sahi-to-detect-small-objects/
- SAHI paper: https://arxiv.org/abs/2202.06934
- C4W3L07 Nonmax Suppression, Andrew Ng: https://www.youtube.com/watch?v=VAo84c1hQX8