feat: allow output_hidden_states and output_attensions to record outputs of specific layers#43213
feat: allow output_hidden_states and output_attensions to record outputs of specific layers#43213cloudhan wants to merge 2 commits intohuggingface:mainfrom
Conversation
…uts of specific layers
678fc00 to
9f9956c
Compare
|
Hmmn - this adds some code complexity that may complicate maintenance for a relatively niche feature. Probably the easiest way to extract these without huge memory is just to make a function that extracts the layers you want and then compile it? Torch should be able to correctly drop the unwanted tensors and avoid OOM in that scenario. |
|
Might be useful for certain use-cases as in #33698 ig. Depends on how much memory is being wasted on VLMs/diffusers, it's a common pattern for them to output hidden states |
|
I guess, but I think we should try to avoid adding features that need updates across all modeling code unless they're really important. Plus I think the |
|
Yeah, we defi don't want to update all models. Anyway, just wanted to link related issues to have a full picture :) |
|
@Rocketknight1 this is actually a common pattern for this feature. transformers/src/transformers/models/qwen3/modeling_qwen3.py Lines 358 to 361 in b40b7a7 and then it will be handled by the generic.py
Unless you want to support all models, you don't need to update all modeling. Only some legacy models are using per modeling pattern transformers/src/transformers/models/gpt_neo/modeling_gpt_neo.py Lines 449 to 452 in b40b7a7 |
|
Good point, I forgot we'd factored that out into |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43213&sha=25a8fa |
What does this PR do?
This PR enable model forward to record optional outputs at specified layers. This will be particularly useful for large model with long context when explorering the aesthetics of the attention maps to design sparse attention. Without it, with moderate size model (say 7B), it can easily OOM with only 1k level context.
now it only keeps
outputs.attentions[10], outputs of other layers are set toNoneto save memory.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.