fix: is_empty() returns False for empty tracker arrays (issue #2195) by Zeesejo · Pull Request #2203 · roboflow/supervision

Zeesejo · 2026-04-07T13:12:16Z

Problem

sv.Detections.is_empty() returned False when tracker_id was set to an empty array np.array([]) instead of None. This happened because the previous implementation compared self == Detections.empty(), and Detections.empty() sets tracker_id=None — so the equality check failed for any instance where tracker_id=[].

Minimal repro (before fix):

import numpy as np
import supervision as sv

detections = sv.Detections(
    xyxy=np.empty((0, 4), dtype=np.float32),
    tracker_id=np.array([])  # empty array, not None
)
print(detections.is_empty())  # ❌ returned False

Fix

Replaced the equality-based check with a direct length check:

def is_empty(self) -> bool:
    return len(self) == 0

This is robust to any optional field (tracker_id, confidence, class_id, etc.) being an empty array rather than None, since __len__ is based solely on the number of bounding boxes (len(self.xyxy)).

Fixes #2195

Previously, is_empty() used equality comparison against Detections.empty() which sets tracker_id=None. When tracker_id was np.array([]) instead of None (e.g., after filtering a Detections object that had a tracker_id), the __eq__ check failed even though the detection set is genuinely empty. Fix: check len(self) == 0 directly, preserving data/metadata neutrality. Fixes roboflow#2195

CLAassistant · 2026-04-07T13:12:30Z

All committers have signed the CLA.

Copilot

Pull request overview

Fixes sv.Detections.is_empty() returning False for empty detections when optional fields (e.g. tracker_id) are present as zero-length arrays instead of None.

Changes:

Reimplemented Detections.is_empty() to return len(self) == 0 (based on xyxy length only).
Expanded the is_empty() docstring to clarify the new behavior.
Removed substantial docstring example blocks from from_lmm() / from_vlm() and adjusted the __getitem__ docstring example.

Copilot · 2026-04-07T13:15:54Z

    def is_empty(self) -> bool:
        """
-        Returns `True` if the `Detections` object is considered empty.
+        Returns `True` if the `Detections` object is considered empty,
+        i.e. contains no detections. This check is based solely on the
+        number of bounding boxes, making it robust to optional fields
+        (such as `tracker_id`) being empty arrays rather than `None`.
        """
-        empty_detections = Detections.empty()
-        empty_detections.data = self.data
-        empty_detections.metadata = self.metadata
-        return bool(self == empty_detections)
+        return len(self) == 0


Add regression tests for the updated Detections.is_empty() behavior (e.g., xyxy empty with tracker_id=np.array([]) and/or other optional fields as empty arrays) to ensure the original issue (#2195) is covered and doesn't regress.

Copilot · 2026-04-07T13:15:54Z

        Example:
            ```python
            import supervision as sv

-            detections = sv.Detections()
+            detections = sv.Detections(...)



The __getitem__ docstring example uses sv.Detections(...), which isn’t runnable and is inconsistent with other docstring examples in this module that provide concrete NumPy inputs. Consider replacing it with a minimal valid construction (e.g., a small xyxy array) so the example can be executed as documentation.

Copilot · 2026-04-07T13:15:55Z

    def from_lmm(
        cls, lmm: LMM | str, result: str | dict[str, Any], **kwargs: Any
    ) -> Detections:
        """
        !!! deprecated "Deprecated"
            `Detections.from_lmm` is **deprecated** and will be removed in `supervision-0.31.0`.
            Please use `Detections.from_vlm` instead.

        Creates a Detections object from the given result string based on the specified
        Large Multimodal Model (LMM).

        | Name                | Enum (sv.LMM)        | Tasks                   | Required parameters         | Optional parameters |
        |---------------------|----------------------|-------------------------|-----------------------------|---------------------|
        | PaliGemma           | `PALIGEMMA`          | detection               | `resolution_wh`             | `classes`           |
        | PaliGemma 2         | `PALIGEMMA`          | detection               | `resolution_wh`             | `classes`           |
        | Qwen2.5-VL          | `QWEN_2_5_VL`        | detection               | `resolution_wh`, `input_wh` | `classes`           |
        | Google Gemini 2.0   | `GOOGLE_GEMINI_2_0`  | detection               | `resolution_wh`             | `classes`           |
        | Google Gemini 2.5   | `GOOGLE_GEMINI_2_5`  | detection, segmentation | `resolution_wh`             | `classes`           |
        | Moondream           | `MOONDREAM`          | detection               | `resolution_wh`             |                     |
        | DeepSeek-VL2        | `DEEPSEEK_VL_2`      | detection               | `resolution_wh`             | `classes`           |

        Args:
            lmm: The type of LMM (Large Multimodal Model) to use.
            result: The result string containing the detection data.
            **kwargs: Additional keyword arguments required by the specified LMM.

        Returns:
            A new Detections object.

        Raises:
            ValueError: If the LMM is invalid, required arguments are missing, or
                disallowed arguments are provided.
            ValueError: If the specified LMM is not supported.
-
-        !!! example "PaliGemma"
-            ```python
-
-            import supervision as sv
-
-            paligemma_result = "<loc0256><loc0256><loc0768><loc0768> cat"
-            detections = sv.Detections.from_lmm(
-                sv.LMM.PALIGEMMA,
-                paligemma_result,
-                resolution_wh=(1000, 1000),
-                classes=['cat', 'dog']
-            )
-            detections.xyxy
-            # array([[250., 250., 750., 750.]])
-
-            detections.class_id
-            # array([0])
-
-            detections.data
-            # {'class_name': array(['cat'], dtype='<U10')}
-            ```
-
-        !!! example "Qwen2.5-VL"
-
-            ??? tip "Prompt engineering"
-
-                To get the best results from Qwen2.5-VL, use clear and descriptive prompts
-                that specify exactly what you want to detect.
-
-                **For general object detection, use this comprehensive prompt:**
-
-                ```
-                Detect all objects in the image and return their locations and labels.
-                ```
-
-                **For specific object detection with detailed descriptions:**
-
-                ```
-                Detect the red object that is leading in this image and return its location and label.
-                ```
-
-                **For simple, targeted detection:**
-
-                ```
-                leading blue truck
-                ```
-
-                **Additional effective prompts:**
-
-                ```
-                Find all people and vehicles in this scene
-                ```
-
-                ```
-                Locate all animals in the image
-                ```
-
-                ```
-                Identify traffic signs and their positions
-                ```
-
-                **Tips for better results:**
-
-                - Use descriptive language that clearly specifies what to look for
-                - Include color, size, or position descriptors when targeting specific objects
-                - Be specific about the type of objects you want to detect
-                - The model responds well to both detailed instructions and concise phrases
-                - Results are returned in JSON format with `bbox_2d` coordinates and `label` fields
-
-
-            ```python
-            import supervision as sv
-
-            qwen_2_5_vl_result = \"\"\"```json
-            [
-                {"bbox_2d": [139, 768, 315, 954], "label": "cat"},
-                {"bbox_2d": [366, 679, 536, 849], "label": "dog"}
-            ]
-            ```\"\"\"
-            detections = sv.Detections.from_lmm(
-                sv.LMM.QWEN_2_5_VL,
-                qwen_2_5_vl_result,
-                input_wh=(1000, 1000),
-                resolution_wh=(1000, 1000),
-                classes=['cat', 'dog'],
-            )
-            detections.xyxy
-            # array([[139., 768., 315., 954.], [366., 679., 536., 849.]])
-
-            detections.class_id
-            # array([0, 1])
-
-            detections.data
-            # {'class_name': array(['cat', 'dog'], dtype='<U10')}
-
-            detections.class_id
-            # array([0, 1])
-            ```
-
-        !!! example "Qwen3-VL"
-
-            ```python
-            import supervision as sv
-
-            qwen_3_vl_result = \"\"\"```json
-            [
-                {"bbox_2d": [139, 768, 315, 954], "label": "cat"},
-                {"bbox_2d": [366, 679, 536, 849], "label": "dog"}
-            ]
-            ```\"\"\"
-            detections = sv.Detections.from_lmm(
-                sv.LMM.QWEN_3_VL,
-                qwen_3_vl_result,
-                resolution_wh=(1000, 1000),
-                classes=['cat', 'dog'],
-            )
-            detections.xyxy
-            # array([[139., 768., 315., 954.], [366., 679., 536., 849.]])
-
-            detections.class_id
-            # array([0, 1])
-
-            detections.data
-            # {'class_name': array(['cat', 'dog'], dtype='<U10')}
-
-            detections.class_id
-            # array([0, 1])
-            ```
-
-        !!! example "Gemini 2.0"
-
-            ??? tip "Prompt engineering"
-
-                From Gemini 2.0 onwards, models are further trained to detect objects in
-                an image and get their bounding box coordinates. The coordinates,
-                relative to image dimensions, scale to [0, 1000]. You need to convert
-                these normalized coordinates back to pixel coordinates using your
-                original image size.
-
-                According to the Gemini API documentation on image prompts (see
-                https://ai.google.dev/gemini-api/docs/vision#image-input), when using a
-                single image with text, the recommended approach is to place the text
-                prompt after the image part in the contents array. This ordering has
-                been shown to produce significantly better results in practice.
-
-                For example, when calling the Gemini API directly, you can structure
-                the request like this, with the image part first and the text prompt
-                second in the `parts` list:
-
-                ```json
-                {
-                  "model": "models/gemini-2.0-flash",
-                  "contents": [
-                    {
-                      "role": "user",
-                      "parts": [
-                        {
-                          "inline_data": {
-                            "mime_type": "image/png",
-                            "data": "<BASE64_IMAGE_BYTES>"
-                          }
-                        },
-                        {
-                          "text": "Detect all the cats and dogs in the image..."
-                        }
-                      ]
-                    }
-                  ]
-                }
-                ```
-                To get the best results from Google Gemini 2.0, use the following prompt.
-
-                ```
-                Detect all the cats and dogs in the image. The box_2d should be
-                [ymin, xmin, ymax, xmax] normalized to 0-1000.
-                ```
-
-            ```python
-            import supervision as sv
-
-            gemini_response_text = \"\"\"```json
-                [
-                    {"box_2d": [543, 40, 728, 200], "label": "cat", "id": 1},
-                    {"box_2d": [653, 352, 820, 522], "label": "dog", "id": 2}
-                ]
-            ```\"\"\"
-
-            detections = sv.Detections.from_lmm(
-                sv.LMM.GOOGLE_GEMINI_2_0,
-                gemini_response_text,
-                resolution_wh=(1000, 1000),
-                classes=['cat', 'dog'],
-            )
-
-            detections.xyxy
-            # array([[543., 40., 728., 200.], [653., 352., 820., 522.]])
-
-            detections.data
-            # {'class_name': array(['cat', 'dog'], dtype='<U26')}
-
-            detections.class_id
-            # array([0, 1])
-            ```
-
-        !!! example "Gemini 2.5"
-
-            ??? tip "Prompt engineering"
-
-                To get the best results from Google Gemini 2.5, use the following prompt.
-
-                This prompt is designed to detect all visible objects in the image,
-                including small, distant, or partially visible ones, and to return
-                tight bounding boxes.
-
-                According to the Gemini API documentation on image prompts, when using
-                a single image with text, the recommended approach is to place the text
-                prompt after the image part in the `contents` array. See the official
-                Gemini vision docs for details:
-                https://ai.google.dev/gemini-api/docs/vision#multi-part-input
-
-                For example, using the `google-generativeai` client:
-
-                ```python
-                from google.generativeai import types
-
-                response = model.generate_content(
-                    contents=[
-                        types.Part.from_image(image_bytes),
-                        "Carefully examine this image and detect ALL visible objects, including "
-                        "small, distant, or partially visible ones.",
-                    ],
-                    generation_config=generation_config,
-                    safety_settings=safety_settings,
-                )
-                ```
-
-                This ordering (image first, then text) has been shown to produce
-                significantly better results in practice.
-
-                ```
-                Carefully examine this image and detect ALL visible objects, including
-                small, distant, or partially visible ones.
-
-                IMPORTANT: Focus on finding as many objects as possible, even if you are
-                only moderately confident.
-
-                Make sure each bounding box is as tight as possible.
-
-                Valid object classes: {class_list}
-
-                For each detected object, provide:
-                - "label": the exact class name from the list above
-                - "confidence": your certainty (between 0.0 and 1.0)
-                - "box_2d": the bounding box [ymin, xmin, ymax, xmax] normalized to 0-1000
-                - "mask": the binary mask of the object as a base64-encoded string
-
-                Detect everything that matches the valid classes. Do not be
-                conservative; include objects even with moderate confidence.
-
-                Return a JSON array, for example:
-                [
-                    {
-                        "label": "person",
-                        "confidence": 0.95,
-                        "box_2d": [100, 200, 300, 400],
-                        "mask": "..."
-                    },
-                    {
-                        "label": "kite",
-                        "confidence": 0.80,
-                        "box_2d": [50, 150, 250, 350],
-                        "mask": "..."
-                    }
-                ]
-                ```
-
-                When using the google-genai library, it is recommended to set
-                thinking_budget=0 in thinking_config for more direct and faster responses.
-
-                ```python
-                from google.generativeai import types
-
-                model.generate_content(
-                    ...,
-                    generation_config=generation_config,
-                    safety_settings=safety_settings,
-                    thinking_config=types.ThinkingConfig(
-                        thinking_budget=0
-                    )
-                )
-                ```
-
-                For a shorter prompt focused only on segmentation masks, you can use:
-
-                ```
-                Return a JSON list of segmentation masks. Each entry should include the
-                2D bounding box in the "box_2d" key, the segmentation mask in the "mask"
-                key, and the text label in the "label" key. Use descriptive labels.
-                ```
-
-            ```python
-            import supervision as sv
-
-            gemini_response_text = \"\"\"```json
-                [
-                    {"box_2d": [543, 40, 728, 200], "label": "cat", "id": 1},
-                    {"box_2d": [653, 352, 820, 522], "label": "dog", "id": 2}
-                ]
-            ```\"\"\"
-
-            detections = sv.Detections.from_lmm(
-                sv.LMM.GOOGLE_GEMINI_2_5,
-                gemini_response_text,
-                resolution_wh=(1000, 1000),
-                classes=['cat', 'dog'],
-            )
-
-            detections.xyxy
-            # array([[543., 40., 728., 200.], [653., 352., 820., 522.]])
-
-            detections.data
-            # {'class_name': array(['cat', 'dog'], dtype='<U26')}
-
-            detections.class_id
-            # array([0, 1])
-            ```
-
-        !!! example "Moondream"
-
-
-            ??? tip "Prompt engineering"
-
-                To get the best results from Moondream, use optimized prompts that leverage
-                its object detection capabilities effectively.
-
-                **For general object detection, use this simple prompt:**
-
-                ```
-                objects
-                ```
-
-                This single-word prompt instructs Moondream to detect all visible objects
-                and return them in the proper JSON format with normalized coordinates.
-
-
-            ```python
-            import supervision as sv
-
-            moondream_result = {
-                'objects': [
-                    {
-                        'x_min': 0.5704046934843063,
-                        'y_min': 0.20069346576929092,
-                        'x_max': 0.7049859315156937,
-                        'y_max': 0.3012596592307091
-                    },
-                    {
-                        'x_min': 0.6210969910025597,
-                        'y_min': 0.3300672620534897,
-                        'x_max': 0.8417936339974403,
-                        'y_max': 0.4961046129465103
-                    }
-                ]
-            }
-
-            detections = sv.Detections.from_lmm(
-                sv.LMM.MOONDREAM,
-                moondream_result,
-                resolution_wh=(1000, 1000),
-            )
-
-            detections.xyxy
-            # array([[1752.28,  818.82, 2165.72, 1229.14],
-            #        [1908.01, 1346.67, 2585.99, 2024.11]])
-            ```
-
-        !!! example "DeepSeek-VL2"
-
-
-            ??? tip "Prompt engineering"
-
-                To get the best results from DeepSeek-VL2, use optimized prompts that leverage
-                its object detection and visual grounding capabilities effectively.
-
-                **For general object detection, use the following user prompt:**
-
-                ```
-                <image>\\n<|ref|>The giraffe at the front<|/ref|>
-                ```
-
-                **For visual grounding, use the following user prompt:**
-
-                ```
-                <image>\\n<|grounding|>Detect the giraffes
-                ```
-
-            ```python
-            from PIL import Image
-            import supervision as sv
-
-            deepseek_vl2_result = "<|ref|>The giraffe at the back<|/ref|><|det|>[[580, 270, 999, 904]]<|/det|><|ref|>The giraffe at the front<|/ref|><|det|>[[26, 31, 632, 998]]<|/det|><|end▁of▁sentence|>"
-
-            detections = sv.Detections.from_vlm(
-                vlm=sv.VLM.DEEPSEEK_VL_2, result=deepseek_vl2_result, resolution_wh=image.size
-            )
-
-            detections.xyxy
-            # array([[ 420,  293,  724,  982],
-            #        [  18,   33,  458, 1084]])
-
-            detections.class_id
-            # array([0, 1])
-
-            detections.data
-            # {'class_name': array(['The giraffe at the back', 'The giraffe at the front'], dtype='<U24')}
-            ```
        """  # noqa: E501



This PR removes large docstring example blocks from from_lmm/from_vlm, but the PR description only discusses the is_empty() behavior change. If the documentation removal is intentional, it should be mentioned in the PR description (or split into a separate docs-focused PR) to avoid surprising downstream docs consumers.

Borda

This PR git too wild with removing a large portion of the docs, pls focus only on the described chnaged and add relevant tests

UNakade · 2026-04-09T10:42:45Z

Like I said in #2195, this isn't as simple as it looks. If people are relying on the information stored in empty arrays (like the expected shape of its elements), this PR will create breaking changes. The failing tests are some of the examples that rely on is_empty() returning False even if the sv.Detections object has len == 0.

Borda · 2026-04-09T14:19:39Z

Like I said in #2195, this isn't as simple as it looks. If people are relying on the information stored in empty arrays (like the expected shape of its elements), this PR will create breaking changes. The failing tests are some of the examples that rely on is_empty() returning False even if the sv.Detections object has len == 0.

That is fair, but still doesn't explain why you decided to remove that much docs without replacement and didn't add tests that would support correctness of your implementation...

UNakade · 2026-04-10T01:27:02Z

That is fair, but still doesn't explain why you decided to remove that much docs without replacement and didn't add tests that would support correctness of your implementation...

Dude, I didn't open this PR! I stopped working on this once I realised it could create breaking changes.

Zeesejo · 2026-04-10T07:17:17Z

Hey @Borda and @UNakade — thanks for the feedback, really appreciate you digging into this.

You're both right. I oversimplified the fix — the len == 0 approach doesn't account for cases where users might be relying on empty arrays still carrying shape/type information, and the failing tests make that pretty clear. I should've caught that before opening the PR.

On the docs removal — that was unintentional. I didn't mean to drop the from_lmm/from_vlm docstring examples, those got caught up in the diff by mistake. I'll restore those.

My plan to get this back on track:

Revert the unrelated doc changes
Add proper regression tests covering the original issue ([Bug]: Inconsistent behavior of sv.Detections.is_empty() if tracker_id is not None #2195) — specifically xyxy empty with tracker_id=np.array([]) and other optional fields as empty arrays
Revisit the is_empty() logic to handle the breaking change concern more carefully

Will push an update shortly. Sorry for the noise!

Borda · 2026-04-10T12:33:52Z

That is fair, but still doesn't explain why you decided to remove that much docs without replacement and didn't add tests that would support correctness of your implementation...

Dude, I didn't open this PR! I stopped working on this once I realised it could create breaking changes.

Sorry, I accidentally swapped you with the author since you reacted first...

Borda · 2026-04-13T10:08:55Z

@Zeesejo, it seems I cannot push to your branch to finish, so could you pls enable it or perform all the requested suggestions?

Borda · 2026-04-13T15:11:45Z

resolved in #2209

Zeesejo requested a review from SkalskiP as a code owner April 7, 2026 13:12

Copilot AI review requested due to automatic review settings April 7, 2026 13:12

Copilot started reviewing on behalf of Zeesejo April 7, 2026 13:12 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Borda requested changes Apr 8, 2026

View reviewed changes

Borda added bug Something isn't working waiting for author labels Apr 8, 2026

github-actions Bot added the has conflicts label Apr 13, 2026

Borda marked this pull request as draft April 13, 2026 10:08

Borda closed this Apr 13, 2026

Conversation

Zeesejo commented Apr 7, 2026

Problem

Fix

Uh oh!

CLAassistant commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Borda left a comment

Choose a reason for hiding this comment

Uh oh!

UNakade commented Apr 9, 2026

Uh oh!

Borda commented Apr 9, 2026

Uh oh!

UNakade commented Apr 10, 2026

Uh oh!

Zeesejo commented Apr 10, 2026

Uh oh!

Borda commented Apr 10, 2026

Uh oh!

Borda commented Apr 13, 2026

Uh oh!

Borda commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CLAassistant commented Apr 7, 2026 •

edited

Loading