fix: pop output_* flags from kwargs in capture_outputs to prevent submodule leakage#44922
Closed
s-zx wants to merge 1 commit intohuggingface:mainfrom
Closed
fix: pop output_* flags from kwargs in capture_outputs to prevent submodule leakage#44922s-zx wants to merge 1 commit intohuggingface:mainfrom
s-zx wants to merge 1 commit intohuggingface:mainfrom
Conversation
…module leakage When output_hidden_states=True (or output_attentions=True) is passed to generate(), the capture_outputs decorator reads the flags but leaves them in **kwargs. These flags then propagate via **kwargs chains deep into sub-models: vision encoder blocks, attention functions, etc. Modules that don't expect these flags may change their return type or produce incorrect outputs. For Qwen3.5 (and the whole Qwen VL family) this causes garbled generation when output_hidden_states=True is set, because the flag reaches the vision block attention and corrupts intermediate tensors. Since capture_outputs already captures the requested outputs through forward hooks, the underlying forward function does not need to receive the output_* flags. Pop them from kwargs right after reading their values, which has no effect on the hook-based capture but prevents any downstream damage. Fixes: transformers#44849
Contributor
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=44922&sha=5193b8 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes #44849.
When
output_hidden_states=True(oroutput_attentions=True) is passed tomodel.generate(), the@capture_outputsdecorator reads the flag value but leaves it in**kwargs. These flags then propagate through**kwargschains deep into sub-models — specifically, into vision encoder blocks and attention functions that don't expect them.For the Qwen3.5 (and Qwen VL family) this causes garbled generation when
output_hidden_states=Trueis set: the flag reachesQwen3_5VisionBlock.attnviaQwen3_5Model.get_image_features(**kwargs)→self.visual(**kwargs)→blk(**kwargs)→self.attn(**kwargs), corrupting intermediate attention tensors and causing the model to generate repetitive image-pad tokens instead of meaningful text.Root cause
In
capture_outputs(inoutput_capturing.py), the decorator useskwargs.get(...)to read the output flags — but it does not remove them fromkwargs. The underlyingfunc(self, *args, **kwargs)call therefore still seesoutput_hidden_states=True, which then leaks into every submodule called with**kwargs.Fix
After reading the values for all capturable flags, immediately pop them from
kwargs:Since
@capture_outputsalready captures the requested outputs through forward hooks, the underlying forward function (and all modules it calls) does not need to receive these flags. This pop has no effect on output correctness but prevents any downstream damage.The fix applies to all models using
@capture_outputs, not just Qwen3.5.