Skip to content

fix: pop output_* flags from kwargs in capture_outputs to prevent submodule leakage#44922

Closed
s-zx wants to merge 1 commit intohuggingface:mainfrom
s-zx:fix/capture-outputs-pop-output-flags-v2
Closed

fix: pop output_* flags from kwargs in capture_outputs to prevent submodule leakage#44922
s-zx wants to merge 1 commit intohuggingface:mainfrom
s-zx:fix/capture-outputs-pop-output-flags-v2

Conversation

@s-zx
Copy link
Copy Markdown

@s-zx s-zx commented Mar 22, 2026

What does this PR do?

Fixes #44849.

When output_hidden_states=True (or output_attentions=True) is passed to model.generate(), the @capture_outputs decorator reads the flag value but leaves it in **kwargs. These flags then propagate through **kwargs chains deep into sub-models — specifically, into vision encoder blocks and attention functions that don't expect them.

For the Qwen3.5 (and Qwen VL family) this causes garbled generation when output_hidden_states=True is set: the flag reaches Qwen3_5VisionBlock.attn via Qwen3_5Model.get_image_features(**kwargs)self.visual(**kwargs)blk(**kwargs)self.attn(**kwargs), corrupting intermediate attention tensors and causing the model to generate repetitive image-pad tokens instead of meaningful text.

Root cause

In capture_outputs (in output_capturing.py), the decorator uses kwargs.get(...) to read the output flags — but it does not remove them from kwargs. The underlying func(self, *args, **kwargs) call therefore still sees output_hidden_states=True, which then leaks into every submodule called with **kwargs.

Fix

After reading the values for all capturable flags, immediately pop them from kwargs:

for k in capturable_flags:
    kwargs.pop(f"output_{k}", None)
if "cross_attentions" in capturable_flags or "mask_decoder_attentions" in capturable_flags:
    kwargs.pop("output_attentions", None)

Since @capture_outputs already captures the requested outputs through forward hooks, the underlying forward function (and all modules it calls) does not need to receive these flags. This pop has no effect on output correctness but prevents any downstream damage.

The fix applies to all models using @capture_outputs, not just Qwen3.5.

…module leakage

When output_hidden_states=True (or output_attentions=True) is passed to
generate(), the capture_outputs decorator reads the flags but leaves them
in **kwargs.  These flags then propagate via **kwargs chains deep into
sub-models: vision encoder blocks, attention functions, etc.  Modules that
don't expect these flags may change their return type or produce incorrect
outputs.

For Qwen3.5 (and the whole Qwen VL family) this causes garbled generation
when output_hidden_states=True is set, because the flag reaches the vision
block attention and corrupts intermediate tensors.

Since capture_outputs already captures the requested outputs through
forward hooks, the underlying forward function does not need to receive
the output_* flags.  Pop them from kwargs right after reading their values,
which has no effect on the hook-based capture but prevents any downstream
damage.

Fixes: transformers#44849
@github-actions
Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=44922&sha=5193b8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Transformers Qwen3.5 had a bug when set output_hidden_states=True

3 participants