-
Notifications
You must be signed in to change notification settings - Fork 30.6k
[docstrings / type hints] Update outdated annotations for past_key_values
#40803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[For maintainers] Suggested jobs to run (before merge) run-slow: aria, autoformer, aya_vision, bark, bart, bert, bert_generation, big_bird, bigbird_pegasus, biogpt, blenderbot, blenderbot_small, blip, bridgetower, camembert, chameleon |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't go over all files, ig they are all same. Thanks a lot for updating these, hopefully it will decrease the amount of issues we get about cache from users
past_key_values (`Cache`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`): | ||
Tuple of `tuple(torch.FloatTensor)` of length `config.n_layers`, with each tuple having 2 tensors of shape | ||
`(batch_size, num_heads, sequence_length, embed_size_per_head)`) | ||
It is a [`~cache_utils.Cache`] instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, my prev bulk update logic was incorrect and left out a lot apparently 😆
cross_attn_layer_head_mask (`torch.FloatTensor`): mask for cross-attention heads in a given layer of | ||
size `(decoder_attention_heads,)`. | ||
past_key_values (`Tuple(torch.FloatTensor)`): cached past key and value projection states | ||
past_key_values (`Cache`): cached past key and value projection states |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized the description is so random from model to model. Would be nice to consolidate all with "auto-doctsring" decorator. I like the cache docs in there, it has more details about correct usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed!
I also noticed that we don't have perfect inheritance on some output classes (e.g. VLM output classes), which leads to redundant docstings. I started by changing them, but got errors -- decided to not do them in this PR, to keep the two different changes isolated. More autodocstrings is something we can definitely work on.
encoder_attention_mask: Optional[torch.Tensor] = None, | ||
labels: Optional[torch.Tensor] = None, | ||
past_key_values: Optional[list[torch.Tensor]] = None, | ||
past_key_values: Optional[Cache] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small note on some pretrained model classes: we technically support old cache until 4.58 on them and then convert to new cache format in the base model. Though it will be painful to revert only these classes, so ig we can keep as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I thought it would be best to only document the non-deprecated case, to save us work 🤗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! Very soon, we won't see anything related to those tuples anymore 😍
Merging as we still have those flaky tests!
…alues` (huggingface#40803) * some fixes * nits * indentation * indentation * a bunch of type hints * bulk changes
…alues` (huggingface#40803) * some fixes * nits * indentation * indentation * a bunch of type hints * bulk changes
…alues` (huggingface#40803) * some fixes * nits * indentation * indentation * a bunch of type hints * bulk changes
What does this PR do?
🎯 part of the effort to efforce better standardization
We have been migrating
past_key_values
from the old tuple of tuples of tensors format to the newCache
format. However, many type hints and docstrings were not updated accordingly -- our users are getting incorrect information from these annotations 😮This PR aims to reduce incorrect information. A few notes:
Cache
, the base class (as opposed to models likemamba
). Nevertheless, even if there are a few inconsistencies, these models were previously annotated with the legacy format -- they are either models we didn't update due to low impact (and we'll likely deprecate soon), or the type hint was already incorrect to begin with 🤗deprecated
models also received bulk changes, I don't think it's worth to manually revert them 🙈