Skip to content

Conversation

gante
Copy link
Member

@gante gante commented Sep 9, 2025

What does this PR do?

(see title)

Fixes #40644

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@gante gante changed the title [generate] Use the right config to init encoder cache [generate] Always use decoder config to init encoder cache Sep 9, 2025
@gante gante changed the title [generate] Always use decoder config to init encoder cache [generate] Always use decoder config to init cache Sep 9, 2025
if requires_cross_attention_cache:
cross_attention_cache_kwargs = {
"config": self.config.get_text_config(encoder=True),
"config": self.config.get_text_config(decoder=True),
Copy link
Member Author

@gante gante Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙈 🙈 🙈 past self is derp

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry? This one should not be changed no?

Copy link
Member Author

@gante gante Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we want to use the decoder cache! I had the exact same thought [use encoder config] in a recent PR :D

In a nutshell: the config is used here to

  1. determine which type of layers are used on cross attention
  2. the number of layers of cross attention we have

Cross attention is a layer in the 👉 decoder 👈 model -- it's the attention between the encoder outputs and the data (tokens on llm) being decoded

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh indeed! Easy to get confused with the name of the cache (EncoderDecoderCache) haha!

@gante gante requested a review from Cyrilvallez September 9, 2025 16:24
# If a config is passed, use it to infer the layer types and initialize accordingly
if config is not None:
config = config.get_text_config()
config = config.get_text_config(decoder=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could that somehow clash if we pass an encoder config to it elsewhere? I.e. for encoder/decoder cache

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always want the decoder config for KV cache purposes

The encoder is only used once to get the encoder outputs, which are then autoregressivelly used in the decoder with cross attention. Both self-attention and cross-attention are decoder layers, and thus parameterized by the decoder.

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!! Indeed, thanks for the fix and the added explanation!
Merging it as an unrelated test is failing, and this is ready to go!

@Cyrilvallez Cyrilvallez merged commit 6eb3255 into huggingface:main Sep 12, 2025
21 of 23 checks passed
@gante gante deleted the use_right_config branch September 12, 2025 16:39
ErfanBaghaei pushed a commit to ErfanBaghaei/transformers that referenced this pull request Sep 25, 2025
vijayabhaskar-ev pushed a commit to vijayabhaskar-ev/transformers that referenced this pull request Oct 2, 2025
yuchenxie4645 pushed a commit to yuchenxie4645/transformers that referenced this pull request Oct 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BlenderbotForConditionalGeneration errors out with list index out of range
3 participants