feat: allow output_hidden_states and output_attensions to record outputs of specific layers by cloudhan · Pull Request #43213 · huggingface/transformers

cloudhan · 2026-01-10T15:15:15Z

What does this PR do?

This PR enable model forward to record optional outputs at specified layers. This will be particularly useful for large model with long context when explorering the aesthetics of the attention maps to design sparse attention. Without it, with moderate size model (say 7B), it can easily OOM with only 1k level context.

outputs = model.forward(input_ids, output_hidden_states=10, output_attentions=[10])

now it only keeps outputs.attentions[10], outputs of other layers are set to None to save memory.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

…uts of specific layers

Rocketknight1 · 2026-01-12T15:29:37Z

Hmmn - this adds some code complexity that may complicate maintenance for a relatively niche feature. Probably the easiest way to extract these without huge memory is just to make a function that extracts the layers you want and then compile it? Torch should be able to correctly drop the unwanted tensors and avoid OOM in that scenario.

zucchini-nlp · 2026-01-12T15:56:10Z

Might be useful for certain use-cases as in #33698 ig. Depends on how much memory is being wasted on VLMs/diffusers, it's a common pattern for them to output hidden states

Rocketknight1 · 2026-01-12T17:14:15Z

I guess, but I think we should try to avoid adding features that need updates across all modeling code unless they're really important. Plus I think the compile() solution should work fine!

zucchini-nlp · 2026-01-13T08:37:37Z

Yeah, we defi don't want to update all models. Anyway, just wanted to link related issues to have a full picture :)

cloudhan · 2026-01-13T14:25:24Z

@Rocketknight1 this is actually a common pattern for this feature.

transformers/src/transformers/models/qwen3/modeling_qwen3.py

Lines 358 to 361 in b40b7a7

    
           _can_record_outputs = { 
        
               "hidden_states": Qwen3DecoderLayer, 
        
               "attentions": Qwen3Attention, 
        
           }

and then it will be handled by the generic.py

Unless you want to support all models, you don't need to update all modeling. Only some legacy models are using per modeling pattern

transformers/src/transformers/models/gpt_neo/modeling_gpt_neo.py

Lines 449 to 452 in b40b7a7

    
           output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions 
        
           output_hidden_states = ( 
        
               output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states 
        
           )

Rocketknight1 · 2026-01-14T15:19:14Z

Good point, I forgot we'd factored that out into _can_record_outputs for a lot of models. If you want to attempt a PR to update the generic code there to allow specific layer selection, I guess we could consider it! No guarantees, though, I'd need to talk with the rest of the team about whether the extra complexity is worth it

github-actions · 2026-01-22T15:10:20Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43213&sha=25a8fa

feat: allow output_hidden_states and output_attensions to record outp…

9f9956c

…uts of specific layers

cloudhan force-pushed the record-specified-layers-only branch from 678fc00 to 9f9956c Compare January 10, 2026 15:17

Merge branch 'main' into record-specified-layers-only

25a8fac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: allow output_hidden_states and output_attensions to record outputs of specific layers#43213

feat: allow output_hidden_states and output_attensions to record outputs of specific layers#43213
cloudhan wants to merge 2 commits intohuggingface:mainfrom
cloudhan:record-specified-layers-only

cloudhan commented Jan 10, 2026 •

edited

Loading

Uh oh!

Rocketknight1 commented Jan 12, 2026

Uh oh!

zucchini-nlp commented Jan 12, 2026

Uh oh!

Rocketknight1 commented Jan 12, 2026

Uh oh!

zucchini-nlp commented Jan 13, 2026

Uh oh!

cloudhan commented Jan 13, 2026

Uh oh!

Rocketknight1 commented Jan 14, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cloudhan commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Rocketknight1 commented Jan 12, 2026

Uh oh!

zucchini-nlp commented Jan 12, 2026

Uh oh!

Rocketknight1 commented Jan 12, 2026

Uh oh!

zucchini-nlp commented Jan 13, 2026

Uh oh!

cloudhan commented Jan 13, 2026

Uh oh!

Rocketknight1 commented Jan 14, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cloudhan commented Jan 10, 2026 •

edited

Loading