Skip to content

feat: allow output_hidden_states and output_attensions to record outputs of specific layers#43213

Open
cloudhan wants to merge 2 commits intohuggingface:mainfrom
cloudhan:record-specified-layers-only
Open

feat: allow output_hidden_states and output_attensions to record outputs of specific layers#43213
cloudhan wants to merge 2 commits intohuggingface:mainfrom
cloudhan:record-specified-layers-only

Conversation

@cloudhan
Copy link
Copy Markdown
Contributor

@cloudhan cloudhan commented Jan 10, 2026

What does this PR do?

This PR enable model forward to record optional outputs at specified layers. This will be particularly useful for large model with long context when explorering the aesthetics of the attention maps to design sparse attention. Without it, with moderate size model (say 7B), it can easily OOM with only 1k level context.

outputs = model.forward(input_ids, output_hidden_states=10, output_attentions=[10])

now it only keeps outputs.attentions[10], outputs of other layers are set to None to save memory.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@cloudhan cloudhan force-pushed the record-specified-layers-only branch from 678fc00 to 9f9956c Compare January 10, 2026 15:17
@Rocketknight1
Copy link
Copy Markdown
Member

Hmmn - this adds some code complexity that may complicate maintenance for a relatively niche feature. Probably the easiest way to extract these without huge memory is just to make a function that extracts the layers you want and then compile it? Torch should be able to correctly drop the unwanted tensors and avoid OOM in that scenario.

@zucchini-nlp
Copy link
Copy Markdown
Member

Might be useful for certain use-cases as in #33698 ig. Depends on how much memory is being wasted on VLMs/diffusers, it's a common pattern for them to output hidden states

@Rocketknight1
Copy link
Copy Markdown
Member

I guess, but I think we should try to avoid adding features that need updates across all modeling code unless they're really important. Plus I think the compile() solution should work fine!

@zucchini-nlp
Copy link
Copy Markdown
Member

Yeah, we defi don't want to update all models. Anyway, just wanted to link related issues to have a full picture :)

@cloudhan
Copy link
Copy Markdown
Contributor Author

@Rocketknight1 this is actually a common pattern for this feature.

_can_record_outputs = {
"hidden_states": Qwen3DecoderLayer,
"attentions": Qwen3Attention,
}

and then it will be handled by the generic.py

Unless you want to support all models, you don't need to update all modeling. Only some legacy models are using per modeling pattern

output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
)

@Rocketknight1
Copy link
Copy Markdown
Member

Good point, I forgot we'd factored that out into _can_record_outputs for a lot of models. If you want to attempt a PR to update the generic code there to allow specific layer selection, I guess we could consider it! No guarantees, though, I'd need to talk with the rest of the team about whether the extra complexity is worth it

@github-actions
Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43213&sha=25a8fa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants