Skip to content

Optimize polygon to mask conversion#2142

Open
AlbertvanHouten wants to merge 3 commits into
developfrom
albert/polygon-mask-conversion
Open

Optimize polygon to mask conversion#2142
AlbertvanHouten wants to merge 3 commits into
developfrom
albert/polygon-mask-conversion

Conversation

@AlbertvanHouten
Copy link
Copy Markdown
Contributor

@AlbertvanHouten AlbertvanHouten commented May 28, 2026

Problem
A significant bottleneck was reported for polygon to mask conversion in #2140. The slow polars operations mentioned in that issue are being resolved in this PR.

Root Cause
When creating instance masks with dtype=pl.Boolean(), each sample's flattened mask (4M+ boolean elements for 10 instances × 640×640) was wrapped in a pl.Series, then collected into a nested List(Boolean) Series via Polars' new_series_list, an operation that is orders of magnitude slower than the equivalent List(UInt8) path.

Changes in PR

  • Store masks as UInt8 internally during Polars Series construction, then cast to Boolean after collection, avoids the catastrophically slow List(Boolean) construction path
  • Pre-allocate a single (N, H, W) numpy array instead of creating N separate arrays and calling np.stack
  • Replace cv2.drawContours with cv2.fillPoly (faster for single-polygon fills)
  • Hoist repeated attribute lookups (polars_to_numpy_dtype, self.input_polygon.field.normalize) out of per-sample loops

Additionally, this PR publicly exposes the polygon to mask functions. This will allow downstream consumers to manually convert polygons to masks so that they can be generated in the correct size. Doing this automatically with datumaro's converters results in creating full image resolution masks which are often much larger than needed for the model. This caused significant slowdown when iterating samples.

resolves #2140

Checklist

  • I have added tests to cover my changes or documented any manual tests.
  • I have updated the documentation accordingly

…andling

Signed-off-by: Albert van Houten <albert.van.houten@intel.com>
@AlbertvanHouten AlbertvanHouten requested a review from a team as a code owner May 28, 2026 14:16
Copilot AI review requested due to automatic review settings May 28, 2026 14:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets the performance regression reported in #2140 by optimizing polygon-to-mask / polygon-to-instance-mask rasterization in the experimental converters, primarily by avoiding slow Polars List(Boolean) construction and reducing per-instance allocations.

Changes:

  • Store instance-mask data as UInt8 during Polars Series construction (then cast to Boolean after collection when needed) to avoid the slow List(Boolean) path.
  • Speed up rasterization by pre-allocating a single (N, H, W) array and using cv2.fillPoly instead of cv2.drawContours.
  • Hoist repeated dtype / normalize lookups out of per-sample loops.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/datumaro/experimental/converters/mask_converters.py Optimizes polygon→mask and polygon→instance-mask conversion paths (dtype handling, allocation strategy, OpenCV fill calls).
tests/unit/experimental/converters/test_mask_converters.py Adds unit tests covering UInt8 instance-mask output and multi-sample batch behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/datumaro/experimental/converters/mask_converters.py Outdated
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 28, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 93.75000% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...atumaro/experimental/converters/mask_converters.py 93.75% 2 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

…zation and add uint16 test

Signed-off-by: Albert van Houten <albert.van.houten@intel.com>
@AlbertvanHouten AlbertvanHouten marked this pull request as draft May 28, 2026 18:17
Signed-off-by: Albert van Houten <albert.van.houten@intel.com>
@AlbertvanHouten AlbertvanHouten marked this pull request as ready for review May 29, 2026 12:07
@AlbertvanHouten AlbertvanHouten requested a review from Copilot May 29, 2026 12:07

This comment was marked as off-topic.

@AlbertvanHouten AlbertvanHouten requested a review from Copilot May 29, 2026 12:11

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7.7x performance regression in experimental API polygon-to-mask conversion

3 participants