-
Notifications
You must be signed in to change notification settings - Fork 31k
Fix static generation when compiling! #28937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+85
−87
Merged
Changes from all commits
Commits
Show all changes
42 commits
Select commit
Hold shift + click to select a range
2187685
wow I was scared!
ArthurZucker 4922c92
fix everything
ArthurZucker 56768a0
nits
ArthurZucker b565051
make it BC?
ArthurZucker 99afd1a
add todo
ArthurZucker edc498f
nits
ArthurZucker 651c4bd
is_tracing should still be used to pass tracing tests
ArthurZucker f69626e
nits
ArthurZucker 96136ac
some nits to make sure genration works with static cache uncompiled
ArthurZucker d5ebd80
fix sdpa
ArthurZucker 70adcf6
fix FA2 for both static and dynamic in a better way?
ArthurZucker 61ed4cb
style
ArthurZucker fedc563
fix-copies
ArthurZucker 0195d58
fix fix copies
ArthurZucker 07f3adb
fix sequential beam searcg
ArthurZucker 9402c25
style
ArthurZucker 86303c4
use `keys_to_ignore`
ArthurZucker fb9e907
nit
ArthurZucker 9aa667e
correct dtype inference when init
ArthurZucker 68a5f29
:( the fix for FA2 is still not optimal to investigate!
ArthurZucker 3b9969b
styling
ArthurZucker 162ab87
Merge branch 'main' of github.com:huggingface/transformers into fix-s…
ArthurZucker 914b0d7
nits
ArthurZucker e79f79f
nit
ArthurZucker ee2317d
this might work better
ArthurZucker 93b2691
add comment
ArthurZucker 3619ed3
Update src/transformers/models/llama/modeling_llama.py
ArthurZucker c23cdc4
"position_ids" -> "cache_position"
ArthurZucker 717a8e7
style
ArthurZucker 7fe0964
Merge branch 'main' of github.com:huggingface/transformers into fix-s…
ArthurZucker 464c463
Merge branch 'main' of github.com:huggingface/transformers into fix-s…
ArthurZucker 80148ab
nit
ArthurZucker c9f3c82
Remove changes that should no be propagatted just yet
ArthurZucker 5f54d84
Apply suggestions from code review
ArthurZucker b3fc042
Styling
ArthurZucker 5fdb2da
make sure we raise an errir for static cache with FA2 enabled
ArthurZucker 03edf91
move to the bottom of the signature
ArthurZucker b762304
style
ArthurZucker 9fbe901
Update src/transformers/models/llama/modeling_llama.py
ArthurZucker 7afe7d9
Update src/transformers/models/llama/modeling_llama.py
ArthurZucker 3772d1c
nit in the name
ArthurZucker cf0bc32
Merge branches 'fix-static-kv-cache' and 'fix-static-kv-cache' of git…
ArthurZucker File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4776,8 +4776,9 @@ def _split_model_inputs( | |
| # Here we can have four types of values: tensors, tuples of tensors and booleans, and encoder_outputs which is a | ||
| # ModelOutput object. | ||
| # bool should not be split but replicated for each split | ||
| bool_keys = [k for k in keys if isinstance(model_input[k], bool)] | ||
| non_bool_keys = [k for k in keys if not isinstance(model_input[k], bool) and not k == "encoder_outputs"] | ||
| bool_keys = [k for k in keys if isinstance(model_input[k], bool) or k == "cache_position"] | ||
| keys_to_ignore = ["cache_position", "encoder_outputs"] | ||
| non_bool_keys = [k for k in keys if not isinstance(model_input[k], bool) and k not in keys_to_ignore] | ||
|
Comment on lines
-4779
to
+4781
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. beam search will split the cache positions otherwise |
||
|
|
||
| # we split the tensors and tuples of tensors | ||
| data_split_list = [ | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.