Skip to content

Speed up CI more#548

Closed
muellerzr wants to merge 35 commits intomainfrom
timing
Closed

Speed up CI more#548
muellerzr wants to merge 35 commits intomainfrom
timing

Conversation

@muellerzr
Copy link
Contributor

@muellerzr muellerzr commented Jul 21, 2022

Noticed the CI was taking 6.5 minutes or so even with the cached process. This reduces the time it takes for tests to pass by limiting the number of epochs the example scripts run for to two epochs, following a similar setup to how we check the mocked dataloaders

Old time: 6.5min
New time: 4.5min

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jul 21, 2022

The documentation is not available anymore as the PR was closed or merged.

muellerzr and others added 13 commits July 21, 2022 15:23
* deepspeed version hotfix

* Update setup.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* resolving the issue! yay 🤗

* resolving circular dependency issue 😅

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix a few minor issues with example code in docs

- enumerate is not actually used
- variable name "labels" does nto match
- prepare method should be called

* Apply style
* Support not passing in args to launch
@muellerzr muellerzr requested a review from sgugger July 22, 2022 15:29
@muellerzr muellerzr added the enhancement New feature or request label Jul 22, 2022
@muellerzr muellerzr marked this pull request as ready for review July 22, 2022 15:29
@muellerzr muellerzr changed the title [Do not merge] More CI bits Speed up CI more Jul 22, 2022
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for diving into this. The durations should be in the github actions only. And further work could save them as artifacts (like in Transformers) instead of just displaying them.

KimBioInfoStudio and others added 9 commits July 26, 2022 13:07
torch_ccl rename
* add some useful decorators

* make on_(local_)main_process member of Accelerator

* update examples

* add on_process and on_local_process

* fixes wrong name for `on_local_process`

* Update src/accelerate/accelerator.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/accelerate/accelerator.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/accelerate/accelerator.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix: saving model weights

checkpointing not saving model weights if calling `accelerator.prepare_model` instead of `accelerator.prepare`
resolves issue: #555

* fix: saveing model weights for optimizer and scheduler
* checkpointing enhancements and fixes for FSDP and DeepSpeed

* resolving comments

1. Adding deprecation args and warnings in launcher for FSDP
2. Handling old configs to work with new launcher args wrt FSDP.
3. Reverting changes to public methods in `checkpointing.py` and handling it in `Accelerator`
4. Explicitly writing the defaults of various FSDP options in `dataclasses` for readability.

* fixes

1. FSDP wrapped model being added to the `_models`.
2. Not passing the env variables when args are None.

* resolving comments

* adding FSDP for all the collective operations

* adding deepspeed and fsdp tests

1. Removes mrpc datafiles and directly relies on HF datasets as it was throwing `file not found` error when running from within `tests` folder. Updating `moke_dataloaders` as a result.
2. adding `test_performance.py`, `test_memory.py` and `test_checkpointing.py` for multi-gpu FSDP and DeepSpeed tests

* reverting `mocked_dataloader` changes

* adding FSDP tests

* data files revert

* excluding fsdp tests from `tests_core`

* try 2

* adding time delay to avoid `torchrun` from crashing at times leading which causing flaky behaviour

* reducing the time of tests

* fixes

* fix

* fixes and reduce time further

* reduce time further and minor fixes

* adding a deepspeed basic e2e test for single gpu setup
@muellerzr muellerzr closed this Jul 26, 2022
@muellerzr muellerzr mentioned this pull request Jul 26, 2022
@muellerzr muellerzr deleted the timing branch July 31, 2022 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants