Skip to content

Conversation

gante
Copy link
Member

@gante gante commented Sep 10, 2025

What does this PR do?

🎯 part of the effort to efforce better standardization

We have been migrating past_key_values from the old tuple of tuples of tensors format to the new Cache format. However, many type hints and docstrings were not updated accordingly -- our users are getting incorrect information from these annotations 😮

This PR aims to reduce incorrect information. A few notes:

  • I heavily relied on bulk changes, and I haven't double-checked all touched models to confirm they support Cache, the base class (as opposed to models like mamba). Nevertheless, even if there are a few inconsistencies, these models were previously annotated with the legacy format -- they are either models we didn't update due to low impact (and we'll likely deprecate soon), or the type hint was already incorrect to begin with 🤗
  • deprecated models also received bulk changes, I don't think it's worth to manually revert them 🙈
  • encoder-decoder models can have a more precise type hint and docs, I'll leave that for a future round. The updated docstring is also correct for them.

Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: aria, autoformer, aya_vision, bark, bart, bert, bert_generation, big_bird, bigbird_pegasus, biogpt, blenderbot, blenderbot_small, blip, bridgetower, camembert, chameleon

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't go over all files, ig they are all same. Thanks a lot for updating these, hopefully it will decrease the amount of issues we get about cache from users

past_key_values (`Cache`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
Tuple of `tuple(torch.FloatTensor)` of length `config.n_layers`, with each tuple having 2 tensors of shape
`(batch_size, num_heads, sequence_length, embed_size_per_head)`)
It is a [`~cache_utils.Cache`] instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, my prev bulk update logic was incorrect and left out a lot apparently 😆

cross_attn_layer_head_mask (`torch.FloatTensor`): mask for cross-attention heads in a given layer of
size `(decoder_attention_heads,)`.
past_key_values (`Tuple(torch.FloatTensor)`): cached past key and value projection states
past_key_values (`Cache`): cached past key and value projection states
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized the description is so random from model to model. Would be nice to consolidate all with "auto-doctsring" decorator. I like the cache docs in there, it has more details about correct usage

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!

I also noticed that we don't have perfect inheritance on some output classes (e.g. VLM output classes), which leads to redundant docstings. I started by changing them, but got errors -- decided to not do them in this PR, to keep the two different changes isolated. More autodocstrings is something we can definitely work on.

encoder_attention_mask: Optional[torch.Tensor] = None,
labels: Optional[torch.Tensor] = None,
past_key_values: Optional[list[torch.Tensor]] = None,
past_key_values: Optional[Cache] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small note on some pretrained model classes: we technically support old cache until 4.58 on them and then convert to new cache format in the base model. Though it will be painful to revert only these classes, so ig we can keep as is

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I thought it would be best to only document the non-deprecated case, to save us work 🤗

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! Very soon, we won't see anything related to those tuples anymore 😍
Merging as we still have those flaky tests!

@Cyrilvallez Cyrilvallez merged commit 93f810e into huggingface:main Sep 15, 2025
21 of 23 checks passed
@gante gante deleted the kv_type_docstring branch September 15, 2025 13:22
ErfanBaghaei pushed a commit to ErfanBaghaei/transformers that referenced this pull request Sep 25, 2025
…alues` (huggingface#40803)

* some fixes

* nits

* indentation

* indentation

* a bunch of type hints

* bulk changes
vijayabhaskar-ev pushed a commit to vijayabhaskar-ev/transformers that referenced this pull request Oct 2, 2025
…alues` (huggingface#40803)

* some fixes

* nits

* indentation

* indentation

* a bunch of type hints

* bulk changes
yuchenxie4645 pushed a commit to yuchenxie4645/transformers that referenced this pull request Oct 4, 2025
…alues` (huggingface#40803)

* some fixes

* nits

* indentation

* indentation

* a bunch of type hints

* bulk changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants