Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
* deepspeed version hotfix * Update setup.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * resolving the issue! yay 🤗 * resolving circular dependency issue 😅 Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix a few minor issues with example code in docs - enumerate is not actually used - variable name "labels" does nto match - prepare method should be called * Apply style
* Support not passing in args to launch
sgugger
approved these changes
Jul 26, 2022
Collaborator
sgugger
left a comment
There was a problem hiding this comment.
Thanks for diving into this. The durations should be in the github actions only. And further work could save them as artifacts (like in Transformers) instead of just displaying them.
torch_ccl rename
* add some useful decorators * make on_(local_)main_process member of Accelerator * update examples * add on_process and on_local_process * fixes wrong name for `on_local_process` * Update src/accelerate/accelerator.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/accelerate/accelerator.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/accelerate/accelerator.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix: saving model weights checkpointing not saving model weights if calling `accelerator.prepare_model` instead of `accelerator.prepare` resolves issue: #555 * fix: saveing model weights for optimizer and scheduler
* checkpointing enhancements and fixes for FSDP and DeepSpeed * resolving comments 1. Adding deprecation args and warnings in launcher for FSDP 2. Handling old configs to work with new launcher args wrt FSDP. 3. Reverting changes to public methods in `checkpointing.py` and handling it in `Accelerator` 4. Explicitly writing the defaults of various FSDP options in `dataclasses` for readability. * fixes 1. FSDP wrapped model being added to the `_models`. 2. Not passing the env variables when args are None. * resolving comments * adding FSDP for all the collective operations * adding deepspeed and fsdp tests 1. Removes mrpc datafiles and directly relies on HF datasets as it was throwing `file not found` error when running from within `tests` folder. Updating `moke_dataloaders` as a result. 2. adding `test_performance.py`, `test_memory.py` and `test_checkpointing.py` for multi-gpu FSDP and DeepSpeed tests * reverting `mocked_dataloader` changes * adding FSDP tests * data files revert * excluding fsdp tests from `tests_core` * try 2 * adding time delay to avoid `torchrun` from crashing at times leading which causing flaky behaviour * reducing the time of tests * fixes * fix * fixes and reduce time further * reduce time further and minor fixes * adding a deepspeed basic e2e test for single gpu setup
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Noticed the CI was taking 6.5 minutes or so even with the cached process. This reduces the time it takes for tests to pass by limiting the number of epochs the example scripts run for to two epochs, following a similar setup to how we check the mocked dataloaders
Old time: 6.5min
New time: 4.5min