cache: store StaticLayer.cumulative_length as a 0-dim scalar tensor#45997
Open
joaquinhuigomez wants to merge 1 commit into
Open
cache: store StaticLayer.cumulative_length as a 0-dim scalar tensor#45997joaquinhuigomez wants to merge 1 commit into
joaquinhuigomez wants to merge 1 commit into
Conversation
StaticLayer.cumulative_length was initialised as torch.tensor([0]) (shape-(1,)), so StaticCache.get_seq_length() returned a shape-(1,) tensor instead of a value consistent with DynamicCache.get_seq_length(), which returns a plain int. The two cache types weren't safely interchangeable, and downstream 'int - past_len' arithmetic promoted to shape-(1,) tensors that propagated into slicing logic. Storing the cumulative length as a 0-dim scalar tensor preserves the compile-friendly tensor semantics that the existing comment calls out (the value still mutates in-place via add_(), and is still attached to the static address for torch.compile), but makes arithmetic against ints behave the same as with DynamicCache. Fixes huggingface#45987
Contributor
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45997&sha=519287 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
StaticLayer.cumulative_lengthwas initialised astorch.tensor([0])(shape-(1,)), soStaticCache.get_seq_length()returned a shape-(1,) tensor instead of a value consistent withDynamicCache.get_seq_length(), which returns a plainint. The two cache types weren't safely interchangeable, and downstreamint - past_lenarithmetic promoted to shape-(1,) tensors that can propagate into slicing logic.Storing the cumulative length as a 0-dim scalar tensor preserves the compile-friendly tensor semantics the existing comment calls out (the value still mutates in-place via
add_()and is still attached to the static address fortorch.compile), but makes arithmetic against ints behave the same as withDynamicCache.Fixes #45987