Skip to content

fix: enable single-scale input support in SAM3 mask decoder#45990

Closed
TheShreyanshiDwivedi wants to merge 5 commits into
huggingface:mainfrom
TheShreyanshiDwivedi:fresh/sam3-single-scale-support
Closed

fix: enable single-scale input support in SAM3 mask decoder#45990
TheShreyanshiDwivedi wants to merge 5 commits into
huggingface:mainfrom
TheShreyanshiDwivedi:fresh/sam3-single-scale-support

Conversation

@TheShreyanshiDwivedi
Copy link
Copy Markdown

What does this PR do?

SAM3 and SAM3-Lite mask decoders rejected single-scale feature maps unconditionally. This fixes the forward path to accept single-scale inputs.

Root cause: Sam3MaskDecoder assumed multi-scale input with no fallback path for single-scale feature maps. Same issue propagated to Sam3LiteText mask decoder.

Changes

  • Enable single-scale input path in Sam3MaskDecoder
  • Propagate single-scale support to Sam3LiteText mask decoder
  • Fix docstring in Sam3LiteTextMaskDecoder to match corrected output shape

Files changed

  • src/transformers/models/sam3/modeling_sam3.py
  • src/transformers/models/sam3_lite_text/modeling_sam3_lite_text.py

Srijan Upadhyay and others added 5 commits May 15, 2026 15:17
Allow Sam3MaskDecoder.forward to receive a single tensor for
backbone_features in addition to the standard multi-scale list.
When a bare tensor is passed it is wrapped in a list internally
and a UserWarning is emitted so callers are aware they are using
the single-scale fallback path.

The existing _embed_pixels method and multi-scale FPN path are
unchanged — a single-element list already works correctly through
_embed_pixels since it operates on the last element of the list.

Adds test_mask_decoder_single_scale_input to verify:
- single tensor input produces valid output and emits UserWarning
- multi-scale list input continues to work unchanged

Fixes huggingface#43043
Sam3LiteTextMaskDecoder inherits from Sam3MaskDecoder via the modular
conversion system. Apply the same single-tensor normalization introduced
in the parent class so the generated file stays consistent with what
check_modular_conversion expects.

Fixes check_repository_consistency CI failure caused by stale generated
modeling_sam3_lite_text.py after the parent class was updated.
…tput

Align docstring wording with what check_modular_conversion generates from
the parent Sam3MaskDecoder class, resolving the remaining consistency check
failure.
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: sam3, sam3_lite_text

@Rocketknight1
Copy link
Copy Markdown
Member

Blocking for opening like 5 random Claude PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants