-
Notifications
You must be signed in to change notification settings - Fork 78
Implementing HF Padding-Free and GraniteLM Support #257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
37de2be
only compute lengths in the token dataset when it's not already prese…
aldo-pareja bf7e86f
Refactor padding function to support position_ids for FlashAttention
aldo-pareja 77a3965
logging the global gradnorm now
aldo-pareja be0cc71
fixing deepspeed because it's not working with the scheduler we want
aldo-pareja 014e5e4
fixing accelerate lr_scheduler
aldo-pareja bf5b25e
fixing accelerate lr_scheduler
aldo-pareja 4ac74f4
samples seen was broken because now the samples are a single line
aldo-pareja 18182e1
find packing is wrong because when flash attention is supported paddi…
aldo-pareja 70abd41
black formatting
aldo-pareja 538e506
it should not fail on granite 8b models anymore
aldo-pareja 208f396
linting
aldo-pareja 5ed04dc
linting
aldo-pareja eda2641
bug on padding when creating the multipack sampler
aldo-pareja d8c3ac1
linter
aldo-pareja 377d9a2
linter
aldo-pareja 5c05b0a
Change old padding-free and granite flags to use_dolomite
Maxusmusti 9c73f27
Add safeguards and checks for flash attention when enabled/disabled
Maxusmusti 6a21d8d
Rework flash attention checks for better modularity
Maxusmusti 4c431a2
Fix arg name
Maxusmusti 8fff855
Update transformers to a version with Granite model class
Maxusmusti 4288b28
Adding stateguards for dolomite and granite and model path check
Maxusmusti 7a6f567
Missing update
Maxusmusti 8e7c86a
Clean up early validation checks and move to utils
Maxusmusti 3cd8597
Fix spelling mistake
Maxusmusti 710ae92
Include AMD in flash attn check
Maxusmusti 27959c7
Red-add is_padding_free with deprecation warning
Maxusmusti ef19e26
Make use_dolomite default false
Maxusmusti f777d45
this is needed because the tag <MASK> is too common and some datasets…
aldo-pareja 041a856
added a warning in case the special tokens used for data processing a…
aldo-pareja f03427b
added a warning in case the special tokens used for data processing a…
aldo-pareja 4d095c3
Update valid data filter
Maxusmusti a45e82b
Fix ruff formatting
Maxusmusti d5fe4d8
Apply review feedback
Maxusmusti 0b46e83
Added comments
Maxusmusti File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this affect existing models? Or is this purely for training-time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah only relevant during training