fix: is_empty() returns False for empty tracker arrays (issue #2195)#2203
fix: is_empty() returns False for empty tracker arrays (issue #2195)#2203Zeesejo wants to merge 1 commit intoroboflow:developfrom
Conversation
Previously, is_empty() used equality comparison against Detections.empty() which sets tracker_id=None. When tracker_id was np.array([]) instead of None (e.g., after filtering a Detections object that had a tracker_id), the __eq__ check failed even though the detection set is genuinely empty. Fix: check len(self) == 0 directly, preserving data/metadata neutrality. Fixes roboflow#2195
There was a problem hiding this comment.
Pull request overview
Fixes sv.Detections.is_empty() returning False for empty detections when optional fields (e.g. tracker_id) are present as zero-length arrays instead of None.
Changes:
- Reimplemented
Detections.is_empty()to returnlen(self) == 0(based onxyxylength only). - Expanded the
is_empty()docstring to clarify the new behavior. - Removed substantial docstring example blocks from
from_lmm()/from_vlm()and adjusted the__getitem__docstring example.
| def is_empty(self) -> bool: | ||
| """ | ||
| Returns `True` if the `Detections` object is considered empty. | ||
| Returns `True` if the `Detections` object is considered empty, | ||
| i.e. contains no detections. This check is based solely on the | ||
| number of bounding boxes, making it robust to optional fields | ||
| (such as `tracker_id`) being empty arrays rather than `None`. | ||
| """ | ||
| empty_detections = Detections.empty() | ||
| empty_detections.data = self.data | ||
| empty_detections.metadata = self.metadata | ||
| return bool(self == empty_detections) | ||
| return len(self) == 0 |
There was a problem hiding this comment.
Add regression tests for the updated Detections.is_empty() behavior (e.g., xyxy empty with tracker_id=np.array([]) and/or other optional fields as empty arrays) to ensure the original issue (#2195) is covered and doesn't regress.
| Example: | ||
| ```python | ||
| import supervision as sv | ||
|
|
||
| detections = sv.Detections() | ||
| detections = sv.Detections(...) | ||
|
|
There was a problem hiding this comment.
The __getitem__ docstring example uses sv.Detections(...), which isn’t runnable and is inconsistent with other docstring examples in this module that provide concrete NumPy inputs. Consider replacing it with a minimal valid construction (e.g., a small xyxy array) so the example can be executed as documentation.
| def from_lmm( | ||
| cls, lmm: LMM | str, result: str | dict[str, Any], **kwargs: Any | ||
| ) -> Detections: | ||
| """ | ||
| !!! deprecated "Deprecated" | ||
| `Detections.from_lmm` is **deprecated** and will be removed in `supervision-0.31.0`. | ||
| Please use `Detections.from_vlm` instead. | ||
|
|
||
| Creates a Detections object from the given result string based on the specified | ||
| Large Multimodal Model (LMM). | ||
|
|
||
| | Name | Enum (sv.LMM) | Tasks | Required parameters | Optional parameters | | ||
| |---------------------|----------------------|-------------------------|-----------------------------|---------------------| | ||
| | PaliGemma | `PALIGEMMA` | detection | `resolution_wh` | `classes` | | ||
| | PaliGemma 2 | `PALIGEMMA` | detection | `resolution_wh` | `classes` | | ||
| | Qwen2.5-VL | `QWEN_2_5_VL` | detection | `resolution_wh`, `input_wh` | `classes` | | ||
| | Google Gemini 2.0 | `GOOGLE_GEMINI_2_0` | detection | `resolution_wh` | `classes` | | ||
| | Google Gemini 2.5 | `GOOGLE_GEMINI_2_5` | detection, segmentation | `resolution_wh` | `classes` | | ||
| | Moondream | `MOONDREAM` | detection | `resolution_wh` | | | ||
| | DeepSeek-VL2 | `DEEPSEEK_VL_2` | detection | `resolution_wh` | `classes` | | ||
|
|
||
| Args: | ||
| lmm: The type of LMM (Large Multimodal Model) to use. | ||
| result: The result string containing the detection data. | ||
| **kwargs: Additional keyword arguments required by the specified LMM. | ||
|
|
||
| Returns: | ||
| A new Detections object. | ||
|
|
||
| Raises: | ||
| ValueError: If the LMM is invalid, required arguments are missing, or | ||
| disallowed arguments are provided. | ||
| ValueError: If the specified LMM is not supported. | ||
|
|
||
| !!! example "PaliGemma" | ||
| ```python | ||
|
|
||
| import supervision as sv | ||
|
|
||
| paligemma_result = "<loc0256><loc0256><loc0768><loc0768> cat" | ||
| detections = sv.Detections.from_lmm( | ||
| sv.LMM.PALIGEMMA, | ||
| paligemma_result, | ||
| resolution_wh=(1000, 1000), | ||
| classes=['cat', 'dog'] | ||
| ) | ||
| detections.xyxy | ||
| # array([[250., 250., 750., 750.]]) | ||
|
|
||
| detections.class_id | ||
| # array([0]) | ||
|
|
||
| detections.data | ||
| # {'class_name': array(['cat'], dtype='<U10')} | ||
| ``` | ||
|
|
||
| !!! example "Qwen2.5-VL" | ||
|
|
||
| ??? tip "Prompt engineering" | ||
|
|
||
| To get the best results from Qwen2.5-VL, use clear and descriptive prompts | ||
| that specify exactly what you want to detect. | ||
|
|
||
| **For general object detection, use this comprehensive prompt:** | ||
|
|
||
| ``` | ||
| Detect all objects in the image and return their locations and labels. | ||
| ``` | ||
|
|
||
| **For specific object detection with detailed descriptions:** | ||
|
|
||
| ``` | ||
| Detect the red object that is leading in this image and return its location and label. | ||
| ``` | ||
|
|
||
| **For simple, targeted detection:** | ||
|
|
||
| ``` | ||
| leading blue truck | ||
| ``` | ||
|
|
||
| **Additional effective prompts:** | ||
|
|
||
| ``` | ||
| Find all people and vehicles in this scene | ||
| ``` | ||
|
|
||
| ``` | ||
| Locate all animals in the image | ||
| ``` | ||
|
|
||
| ``` | ||
| Identify traffic signs and their positions | ||
| ``` | ||
|
|
||
| **Tips for better results:** | ||
|
|
||
| - Use descriptive language that clearly specifies what to look for | ||
| - Include color, size, or position descriptors when targeting specific objects | ||
| - Be specific about the type of objects you want to detect | ||
| - The model responds well to both detailed instructions and concise phrases | ||
| - Results are returned in JSON format with `bbox_2d` coordinates and `label` fields | ||
|
|
||
|
|
||
| ```python | ||
| import supervision as sv | ||
|
|
||
| qwen_2_5_vl_result = \"\"\"```json | ||
| [ | ||
| {"bbox_2d": [139, 768, 315, 954], "label": "cat"}, | ||
| {"bbox_2d": [366, 679, 536, 849], "label": "dog"} | ||
| ] | ||
| ```\"\"\" | ||
| detections = sv.Detections.from_lmm( | ||
| sv.LMM.QWEN_2_5_VL, | ||
| qwen_2_5_vl_result, | ||
| input_wh=(1000, 1000), | ||
| resolution_wh=(1000, 1000), | ||
| classes=['cat', 'dog'], | ||
| ) | ||
| detections.xyxy | ||
| # array([[139., 768., 315., 954.], [366., 679., 536., 849.]]) | ||
|
|
||
| detections.class_id | ||
| # array([0, 1]) | ||
|
|
||
| detections.data | ||
| # {'class_name': array(['cat', 'dog'], dtype='<U10')} | ||
|
|
||
| detections.class_id | ||
| # array([0, 1]) | ||
| ``` | ||
|
|
||
| !!! example "Qwen3-VL" | ||
|
|
||
| ```python | ||
| import supervision as sv | ||
|
|
||
| qwen_3_vl_result = \"\"\"```json | ||
| [ | ||
| {"bbox_2d": [139, 768, 315, 954], "label": "cat"}, | ||
| {"bbox_2d": [366, 679, 536, 849], "label": "dog"} | ||
| ] | ||
| ```\"\"\" | ||
| detections = sv.Detections.from_lmm( | ||
| sv.LMM.QWEN_3_VL, | ||
| qwen_3_vl_result, | ||
| resolution_wh=(1000, 1000), | ||
| classes=['cat', 'dog'], | ||
| ) | ||
| detections.xyxy | ||
| # array([[139., 768., 315., 954.], [366., 679., 536., 849.]]) | ||
|
|
||
| detections.class_id | ||
| # array([0, 1]) | ||
|
|
||
| detections.data | ||
| # {'class_name': array(['cat', 'dog'], dtype='<U10')} | ||
|
|
||
| detections.class_id | ||
| # array([0, 1]) | ||
| ``` | ||
|
|
||
| !!! example "Gemini 2.0" | ||
|
|
||
| ??? tip "Prompt engineering" | ||
|
|
||
| From Gemini 2.0 onwards, models are further trained to detect objects in | ||
| an image and get their bounding box coordinates. The coordinates, | ||
| relative to image dimensions, scale to [0, 1000]. You need to convert | ||
| these normalized coordinates back to pixel coordinates using your | ||
| original image size. | ||
|
|
||
| According to the Gemini API documentation on image prompts (see | ||
| https://ai.google.dev/gemini-api/docs/vision#image-input), when using a | ||
| single image with text, the recommended approach is to place the text | ||
| prompt after the image part in the contents array. This ordering has | ||
| been shown to produce significantly better results in practice. | ||
|
|
||
| For example, when calling the Gemini API directly, you can structure | ||
| the request like this, with the image part first and the text prompt | ||
| second in the `parts` list: | ||
|
|
||
| ```json | ||
| { | ||
| "model": "models/gemini-2.0-flash", | ||
| "contents": [ | ||
| { | ||
| "role": "user", | ||
| "parts": [ | ||
| { | ||
| "inline_data": { | ||
| "mime_type": "image/png", | ||
| "data": "<BASE64_IMAGE_BYTES>" | ||
| } | ||
| }, | ||
| { | ||
| "text": "Detect all the cats and dogs in the image..." | ||
| } | ||
| ] | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
| To get the best results from Google Gemini 2.0, use the following prompt. | ||
|
|
||
| ``` | ||
| Detect all the cats and dogs in the image. The box_2d should be | ||
| [ymin, xmin, ymax, xmax] normalized to 0-1000. | ||
| ``` | ||
|
|
||
| ```python | ||
| import supervision as sv | ||
|
|
||
| gemini_response_text = \"\"\"```json | ||
| [ | ||
| {"box_2d": [543, 40, 728, 200], "label": "cat", "id": 1}, | ||
| {"box_2d": [653, 352, 820, 522], "label": "dog", "id": 2} | ||
| ] | ||
| ```\"\"\" | ||
|
|
||
| detections = sv.Detections.from_lmm( | ||
| sv.LMM.GOOGLE_GEMINI_2_0, | ||
| gemini_response_text, | ||
| resolution_wh=(1000, 1000), | ||
| classes=['cat', 'dog'], | ||
| ) | ||
|
|
||
| detections.xyxy | ||
| # array([[543., 40., 728., 200.], [653., 352., 820., 522.]]) | ||
|
|
||
| detections.data | ||
| # {'class_name': array(['cat', 'dog'], dtype='<U26')} | ||
|
|
||
| detections.class_id | ||
| # array([0, 1]) | ||
| ``` | ||
|
|
||
| !!! example "Gemini 2.5" | ||
|
|
||
| ??? tip "Prompt engineering" | ||
|
|
||
| To get the best results from Google Gemini 2.5, use the following prompt. | ||
|
|
||
| This prompt is designed to detect all visible objects in the image, | ||
| including small, distant, or partially visible ones, and to return | ||
| tight bounding boxes. | ||
|
|
||
| According to the Gemini API documentation on image prompts, when using | ||
| a single image with text, the recommended approach is to place the text | ||
| prompt after the image part in the `contents` array. See the official | ||
| Gemini vision docs for details: | ||
| https://ai.google.dev/gemini-api/docs/vision#multi-part-input | ||
|
|
||
| For example, using the `google-generativeai` client: | ||
|
|
||
| ```python | ||
| from google.generativeai import types | ||
|
|
||
| response = model.generate_content( | ||
| contents=[ | ||
| types.Part.from_image(image_bytes), | ||
| "Carefully examine this image and detect ALL visible objects, including " | ||
| "small, distant, or partially visible ones.", | ||
| ], | ||
| generation_config=generation_config, | ||
| safety_settings=safety_settings, | ||
| ) | ||
| ``` | ||
|
|
||
| This ordering (image first, then text) has been shown to produce | ||
| significantly better results in practice. | ||
|
|
||
| ``` | ||
| Carefully examine this image and detect ALL visible objects, including | ||
| small, distant, or partially visible ones. | ||
|
|
||
| IMPORTANT: Focus on finding as many objects as possible, even if you are | ||
| only moderately confident. | ||
|
|
||
| Make sure each bounding box is as tight as possible. | ||
|
|
||
| Valid object classes: {class_list} | ||
|
|
||
| For each detected object, provide: | ||
| - "label": the exact class name from the list above | ||
| - "confidence": your certainty (between 0.0 and 1.0) | ||
| - "box_2d": the bounding box [ymin, xmin, ymax, xmax] normalized to 0-1000 | ||
| - "mask": the binary mask of the object as a base64-encoded string | ||
|
|
||
| Detect everything that matches the valid classes. Do not be | ||
| conservative; include objects even with moderate confidence. | ||
|
|
||
| Return a JSON array, for example: | ||
| [ | ||
| { | ||
| "label": "person", | ||
| "confidence": 0.95, | ||
| "box_2d": [100, 200, 300, 400], | ||
| "mask": "..." | ||
| }, | ||
| { | ||
| "label": "kite", | ||
| "confidence": 0.80, | ||
| "box_2d": [50, 150, 250, 350], | ||
| "mask": "..." | ||
| } | ||
| ] | ||
| ``` | ||
|
|
||
| When using the google-genai library, it is recommended to set | ||
| thinking_budget=0 in thinking_config for more direct and faster responses. | ||
|
|
||
| ```python | ||
| from google.generativeai import types | ||
|
|
||
| model.generate_content( | ||
| ..., | ||
| generation_config=generation_config, | ||
| safety_settings=safety_settings, | ||
| thinking_config=types.ThinkingConfig( | ||
| thinking_budget=0 | ||
| ) | ||
| ) | ||
| ``` | ||
|
|
||
| For a shorter prompt focused only on segmentation masks, you can use: | ||
|
|
||
| ``` | ||
| Return a JSON list of segmentation masks. Each entry should include the | ||
| 2D bounding box in the "box_2d" key, the segmentation mask in the "mask" | ||
| key, and the text label in the "label" key. Use descriptive labels. | ||
| ``` | ||
|
|
||
| ```python | ||
| import supervision as sv | ||
|
|
||
| gemini_response_text = \"\"\"```json | ||
| [ | ||
| {"box_2d": [543, 40, 728, 200], "label": "cat", "id": 1}, | ||
| {"box_2d": [653, 352, 820, 522], "label": "dog", "id": 2} | ||
| ] | ||
| ```\"\"\" | ||
|
|
||
| detections = sv.Detections.from_lmm( | ||
| sv.LMM.GOOGLE_GEMINI_2_5, | ||
| gemini_response_text, | ||
| resolution_wh=(1000, 1000), | ||
| classes=['cat', 'dog'], | ||
| ) | ||
|
|
||
| detections.xyxy | ||
| # array([[543., 40., 728., 200.], [653., 352., 820., 522.]]) | ||
|
|
||
| detections.data | ||
| # {'class_name': array(['cat', 'dog'], dtype='<U26')} | ||
|
|
||
| detections.class_id | ||
| # array([0, 1]) | ||
| ``` | ||
|
|
||
| !!! example "Moondream" | ||
|
|
||
|
|
||
| ??? tip "Prompt engineering" | ||
|
|
||
| To get the best results from Moondream, use optimized prompts that leverage | ||
| its object detection capabilities effectively. | ||
|
|
||
| **For general object detection, use this simple prompt:** | ||
|
|
||
| ``` | ||
| objects | ||
| ``` | ||
|
|
||
| This single-word prompt instructs Moondream to detect all visible objects | ||
| and return them in the proper JSON format with normalized coordinates. | ||
|
|
||
|
|
||
| ```python | ||
| import supervision as sv | ||
|
|
||
| moondream_result = { | ||
| 'objects': [ | ||
| { | ||
| 'x_min': 0.5704046934843063, | ||
| 'y_min': 0.20069346576929092, | ||
| 'x_max': 0.7049859315156937, | ||
| 'y_max': 0.3012596592307091 | ||
| }, | ||
| { | ||
| 'x_min': 0.6210969910025597, | ||
| 'y_min': 0.3300672620534897, | ||
| 'x_max': 0.8417936339974403, | ||
| 'y_max': 0.4961046129465103 | ||
| } | ||
| ] | ||
| } | ||
|
|
||
| detections = sv.Detections.from_lmm( | ||
| sv.LMM.MOONDREAM, | ||
| moondream_result, | ||
| resolution_wh=(1000, 1000), | ||
| ) | ||
|
|
||
| detections.xyxy | ||
| # array([[1752.28, 818.82, 2165.72, 1229.14], | ||
| # [1908.01, 1346.67, 2585.99, 2024.11]]) | ||
| ``` | ||
|
|
||
| !!! example "DeepSeek-VL2" | ||
|
|
||
|
|
||
| ??? tip "Prompt engineering" | ||
|
|
||
| To get the best results from DeepSeek-VL2, use optimized prompts that leverage | ||
| its object detection and visual grounding capabilities effectively. | ||
|
|
||
| **For general object detection, use the following user prompt:** | ||
|
|
||
| ``` | ||
| <image>\\n<|ref|>The giraffe at the front<|/ref|> | ||
| ``` | ||
|
|
||
| **For visual grounding, use the following user prompt:** | ||
|
|
||
| ``` | ||
| <image>\\n<|grounding|>Detect the giraffes | ||
| ``` | ||
|
|
||
| ```python | ||
| from PIL import Image | ||
| import supervision as sv | ||
|
|
||
| deepseek_vl2_result = "<|ref|>The giraffe at the back<|/ref|><|det|>[[580, 270, 999, 904]]<|/det|><|ref|>The giraffe at the front<|/ref|><|det|>[[26, 31, 632, 998]]<|/det|><|end▁of▁sentence|>" | ||
|
|
||
| detections = sv.Detections.from_vlm( | ||
| vlm=sv.VLM.DEEPSEEK_VL_2, result=deepseek_vl2_result, resolution_wh=image.size | ||
| ) | ||
|
|
||
| detections.xyxy | ||
| # array([[ 420, 293, 724, 982], | ||
| # [ 18, 33, 458, 1084]]) | ||
|
|
||
| detections.class_id | ||
| # array([0, 1]) | ||
|
|
||
| detections.data | ||
| # {'class_name': array(['The giraffe at the back', 'The giraffe at the front'], dtype='<U24')} | ||
| ``` | ||
| """ # noqa: E501 | ||
|
|
There was a problem hiding this comment.
This PR removes large docstring example blocks from from_lmm/from_vlm, but the PR description only discusses the is_empty() behavior change. If the documentation removal is intentional, it should be mentioned in the PR description (or split into a separate docs-focused PR) to avoid surprising downstream docs consumers.
Borda
left a comment
There was a problem hiding this comment.
This PR git too wild with removing a large portion of the docs, pls focus only on the described chnaged and add relevant tests
|
Like I said in #2195, this isn't as simple as it looks. If people are relying on the information stored in empty arrays (like the expected shape of its elements), this PR will create breaking changes. The failing tests are some of the examples that rely on |
That is fair, but still doesn't explain why you decided to remove that much docs without replacement and didn't add tests that would support correctness of your implementation... |
Dude, I didn't open this PR! I stopped working on this once I realised it could create breaking changes. |
|
Hey @Borda and @UNakade — thanks for the feedback, really appreciate you digging into this. You're both right. I oversimplified the fix — the On the docs removal — that was unintentional. I didn't mean to drop the My plan to get this back on track:
Will push an update shortly. Sorry for the noise! |
Sorry, I accidentally swapped you with the author since you reacted first... |
|
@Zeesejo, it seems I cannot push to your branch to finish, so could you pls enable it or perform all the requested suggestions? |
|
resolved in #2209 |
Problem
sv.Detections.is_empty()returnedFalsewhentracker_idwas set to an empty arraynp.array([])instead ofNone. This happened because the previous implementation comparedself == Detections.empty(), andDetections.empty()setstracker_id=None— so the equality check failed for any instance wheretracker_id=[].Minimal repro (before fix):
Fix
Replaced the equality-based check with a direct length check:
This is robust to any optional field (
tracker_id,confidence,class_id, etc.) being an empty array rather thanNone, since__len__is based solely on the number of bounding boxes (len(self.xyxy)).Fixes #2195