Optimize polygon to mask conversion#2142
Open
AlbertvanHouten wants to merge 3 commits into
Open
Conversation
…andling Signed-off-by: Albert van Houten <albert.van.houten@intel.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR targets the performance regression reported in #2140 by optimizing polygon-to-mask / polygon-to-instance-mask rasterization in the experimental converters, primarily by avoiding slow Polars List(Boolean) construction and reducing per-instance allocations.
Changes:
- Store instance-mask data as
UInt8during Polars Series construction (then cast toBooleanafter collection when needed) to avoid the slowList(Boolean)path. - Speed up rasterization by pre-allocating a single
(N, H, W)array and usingcv2.fillPolyinstead ofcv2.drawContours. - Hoist repeated dtype / normalize lookups out of per-sample loops.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/datumaro/experimental/converters/mask_converters.py |
Optimizes polygon→mask and polygon→instance-mask conversion paths (dtype handling, allocation strategy, OpenCV fill calls). |
tests/unit/experimental/converters/test_mask_converters.py |
Adds unit tests covering UInt8 instance-mask output and multi-sample batch behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
…zation and add uint16 test Signed-off-by: Albert van Houten <albert.van.houten@intel.com>
Signed-off-by: Albert van Houten <albert.van.houten@intel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
A significant bottleneck was reported for polygon to mask conversion in #2140. The slow polars operations mentioned in that issue are being resolved in this PR.
Root Cause
When creating instance masks with dtype=pl.Boolean(), each sample's flattened mask (4M+ boolean elements for 10 instances × 640×640) was wrapped in a pl.Series, then collected into a nested List(Boolean) Series via Polars' new_series_list, an operation that is orders of magnitude slower than the equivalent List(UInt8) path.
Changes in PR
Additionally, this PR publicly exposes the polygon to mask functions. This will allow downstream consumers to manually convert polygons to masks so that they can be generated in the correct size. Doing this automatically with datumaro's converters results in creating full image resolution masks which are often much larger than needed for the model. This caused significant slowdown when iterating samples.
resolves #2140
Checklist