Skip to content

Fix extreme memory usage when loading OBB datasets#2187

Merged
Borda merged 9 commits intodevelopfrom
copilot/fix-extreme-memory-obb-dataset
Mar 31, 2026
Merged

Fix extreme memory usage when loading OBB datasets#2187
Borda merged 9 commits intodevelopfrom
copilot/fix-extreme-memory-obb-dataset

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 30, 2026

OBB annotation lines have 9 values (class_id x1 y1 x2 y2 x3 y3 x4 y4), which caused _with_seg_mask to return True (it checks for > 5 values). This incorrectly set with_masks=True for OBB data, generating an (N, W, H) boolean mask array per image — tens of GBs for high-resolution datasets.

Changes

  • src/supervision/dataset/formats/yolo.py: Guard with_masks with not is_obb in load_yolo_annotations, so OBB annotations never trigger mask generation:

    # Before
    with_masks = force_masks or _with_seg_mask(lines=lines)
    # After
    with_masks = not is_obb and (force_masks or _with_seg_mask(lines=lines))
  • tests/dataset/formats/test_yolo.py: Added test_load_yolo_annotations_obb_does_not_generate_masks — creates a minimal OBB dataset on disk, loads it with is_obb=True, and asserts detection.mask is None.


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Copilot AI linked an issue Mar 30, 2026 that may be closed by this pull request
2 tasks
Copilot AI changed the title [WIP] Fix extreme memory usage loading OBB dataset Fix extreme memory usage when loading OBB datasets Mar 30, 2026
Copilot AI requested a review from Borda March 30, 2026 08:01
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77%. Comparing base (85ba8be) to head (0a22e09).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff           @@
##           develop   #2187   +/-   ##
=======================================
  Coverage       77%     77%           
=======================================
  Files           62      62           
  Lines         7637    7637           
=======================================
+ Hits          5903    5908    +5     
+ Misses        1734    1729    -5     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Borda and others added 3 commits March 31, 2026 02:00
- [resolve #1] RUF015: replace list(annotations.values())[0] with next(iter(annotations.values()))
- [resolve #2] E501: split assertion message to stay within 88-char limit
- [resolve #3] ruff format: committed formatted diff for test file

---
Co-authored-by: Claude Code <noreply@anthropic.com>
[resolve #6] /review finding by doc-scribe: add note to force_masks docstring
that it has no effect when is_obb=True, preventing caller confusion.

---
Co-authored-by: Claude Code <noreply@anthropic.com>
…egression

- [resolve #4] Add test asserting detection.mask is None when force_masks=True, is_obb=True
- [resolve #5] Add regression test confirming is_obb=False with polygon annotation still produces masks

---
Co-authored-by: Claude Code <noreply@anthropic.com>
@Borda Borda marked this pull request as ready for review March 31, 2026 06:05
@Borda Borda requested a review from SkalskiP as a code owner March 31, 2026 06:05
Copilot AI review requested due to automatic review settings March 31, 2026 06:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prevents extreme memory usage when loading YOLO OBB datasets by ensuring OBB annotation lines (which have >5 tokens) do not trigger segmentation-mask generation in load_yolo_annotations.

Changes:

  • Disable with_masks when is_obb=True to avoid allocating per-image (N, H, W) boolean mask arrays for OBB data.
  • Add regression tests confirming OBB loading never produces masks (even with force_masks=True) and that non-OBB segmentation still produces masks.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/supervision/dataset/formats/yolo.py Prevents OBB datasets from enabling mask generation based on token count.
tests/dataset/formats/test_yolo.py Adds coverage to prevent the OBB mask-memory regression and verify segmentation behavior remains intact.

Comment thread src/supervision/dataset/formats/yolo.py
Comment thread tests/dataset/formats/test_yolo.py Outdated
Borda and others added 3 commits March 31, 2026 08:13
@Borda Borda merged commit b4ae305 into develop Mar 31, 2026
27 checks passed
@Borda Borda deleted the copilot/fix-extreme-memory-obb-dataset branch March 31, 2026 08:09
Borda added a commit that referenced this pull request Mar 31, 2026
* fix: prevent mask generation for OBB annotations to avoid extreme memory usage
* docs: clarify force_masks is ignored when is_obb=True
* test: pin force_masks=True is ignored for OBB and segmentation mask regression

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Borda <6035284+Borda@users.noreply.github.com>
Co-authored-by: Claude Code <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@Borda Borda mentioned this pull request Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extreme memory usage loading OBB dataset

3 participants