fix: pop output_* flags from kwargs in capture_outputs to prevent submodule leakage by s-zx · Pull Request #44922 · huggingface/transformers

s-zx · 2026-03-22T01:21:22Z

What does this PR do?

When output_hidden_states=True (or output_attentions=True) is passed to model.generate(), the @capture_outputs decorator reads the flag value but leaves it in **kwargs. These flags then propagate through **kwargs chains deep into sub-models — specifically, into vision encoder blocks and attention functions that don't expect them.

For the Qwen3.5 (and Qwen VL family) this causes garbled generation when output_hidden_states=True is set: the flag reaches Qwen3_5VisionBlock.attn via Qwen3_5Model.get_image_features(**kwargs) → self.visual(**kwargs) → blk(**kwargs) → self.attn(**kwargs), corrupting intermediate attention tensors and causing the model to generate repetitive image-pad tokens instead of meaningful text.

Root cause

In capture_outputs (in output_capturing.py), the decorator uses kwargs.get(...) to read the output flags — but it does not remove them from kwargs. The underlying func(self, *args, **kwargs) call therefore still sees output_hidden_states=True, which then leaks into every submodule called with **kwargs.

Fix

After reading the values for all capturable flags, immediately pop them from kwargs:

for k in capturable_flags:
    kwargs.pop(f"output_{k}", None)
if "cross_attentions" in capturable_flags or "mask_decoder_attentions" in capturable_flags:
    kwargs.pop("output_attentions", None)

Since @capture_outputs already captures the requested outputs through forward hooks, the underlying forward function (and all modules it calls) does not need to receive these flags. This pop has no effect on output correctness but prevents any downstream damage.

The fix applies to all models using @capture_outputs, not just Qwen3.5.

…module leakage When output_hidden_states=True (or output_attentions=True) is passed to generate(), the capture_outputs decorator reads the flags but leaves them in **kwargs. These flags then propagate via **kwargs chains deep into sub-models: vision encoder blocks, attention functions, etc. Modules that don't expect these flags may change their return type or produce incorrect outputs. For Qwen3.5 (and the whole Qwen VL family) this causes garbled generation when output_hidden_states=True is set, because the flag reaches the vision block attention and corrupts intermediate tensors. Since capture_outputs already captures the requested outputs through forward hooks, the underlying forward function does not need to receive the output_* flags. Pop them from kwargs right after reading their values, which has no effect on the hook-based capture but prevents any downstream damage. Fixes: transformers#44849

github-actions · 2026-03-22T01:33:46Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=44922&sha=5193b8

Rocketknight1 closed this Mar 23, 2026

Rocketknight1 added the Code agent slop label Mar 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: pop output_* flags from kwargs in capture_outputs to prevent submodule leakage#44922

fix: pop output_* flags from kwargs in capture_outputs to prevent submodule leakage#44922
s-zx wants to merge 1 commit intohuggingface:mainfrom
s-zx:fix/capture-outputs-pop-output-flags-v2

s-zx commented Mar 22, 2026

Uh oh!

github-actions bot commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

s-zx commented Mar 22, 2026

What does this PR do?

Root cause

Fix

Uh oh!

github-actions bot commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants