Speed up CI more by muellerzr · Pull Request #548 · huggingface/accelerate

muellerzr · 2022-07-21T19:11:45Z

Noticed the CI was taking 6.5 minutes or so even with the cached process. This reduces the time it takes for tests to pass by limiting the number of epochs the example scripts run for to two epochs, following a similar setup to how we check the mocked dataloaders

Old time: 6.5min
New time: 4.5min

HuggingFaceDocBuilderDev · 2022-07-21T19:14:46Z

The documentation is not available anymore as the PR was closed or merged.

* deepspeed version hotfix * Update setup.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * resolving the issue! yay 🤗 * resolving circular dependency issue 😅 Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix a few minor issues with example code in docs - enumerate is not actually used - variable name "labels" does nto match - prepare method should be called * Apply style

* Support not passing in args to launch

sgugger

Thanks for diving into this. The durations should be in the github actions only. And further work could save them as artifacts (like in Transformers) instead of just displaying them.

Makefile

torch_ccl rename

* add some useful decorators * make on_(local_)main_process member of Accelerator * update examples * add on_process and on_local_process * fixes wrong name for `on_local_process` * Update src/accelerate/accelerator.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/accelerate/accelerator.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/accelerate/accelerator.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix: saving model weights checkpointing not saving model weights if calling `accelerator.prepare_model` instead of `accelerator.prepare` resolves issue: #555 * fix: saveing model weights for optimizer and scheduler

* checkpointing enhancements and fixes for FSDP and DeepSpeed * resolving comments 1. Adding deprecation args and warnings in launcher for FSDP 2. Handling old configs to work with new launcher args wrt FSDP. 3. Reverting changes to public methods in `checkpointing.py` and handling it in `Accelerator` 4. Explicitly writing the defaults of various FSDP options in `dataclasses` for readability. * fixes 1. FSDP wrapped model being added to the `_models`. 2. Not passing the env variables when args are None. * resolving comments * adding FSDP for all the collective operations * adding deepspeed and fsdp tests 1. Removes mrpc datafiles and directly relies on HF datasets as it was throwing `file not found` error when running from within `tests` folder. Updating `moke_dataloaders` as a result. 2. adding `test_performance.py`, `test_memory.py` and `test_checkpointing.py` for multi-gpu FSDP and DeepSpeed tests * reverting `mocked_dataloader` changes * adding FSDP tests * data files revert * excluding fsdp tests from `tests_core` * try 2 * adding time delay to avoid `torchrun` from crashing at times leading which causing flaky behaviour * reducing the time of tests * fixes * fix * fixes and reduce time further * reduce time further and minor fixes * adding a deepspeed basic e2e test for single gpu setup

…to timing

Add durations

0fb43e9

muellerzr and others added 13 commits July 21, 2022 15:23

Reduce test by one epoch

5756bbe

Should speed up the rest

2521f63

Disable example diff for now

23a7f37

fix test?

9a91ff0

Rename all the caches

63682ab

Don't cache models

f92aac6

Store site packages

b7c18b5

Fix a few minor issues with example code in docs (#551)

b08ae97

* Fix a few minor issues with example code in docs - enumerate is not actually used - variable name "labels" does nto match - prepare method should be called * Apply style

Create good defaults in accelerate launch (#553)

6c4edc3

* Support not passing in args to launch

Speed up ci

e703cdf

Change hash

81b805a

Clean

d02e37c

muellerzr requested a review from sgugger July 22, 2022 15:29

muellerzr added the enhancement New feature or request label Jul 22, 2022

muellerzr marked this pull request as ready for review July 22, 2022 15:29

muellerzr changed the title ~~[Do not merge] More CI bits~~ Speed up CI more Jul 22, 2022

unpin datasets (#563)

5391412

sgugger approved these changes Jul 26, 2022

View reviewed changes

Makefile Outdated Show resolved Hide resolved

KimBioInfoStudio and others added 9 commits July 26, 2022 13:07

Update imports.py (#554)

f90ec52

torch_ccl rename

Fix wrong indentation

cc10071

fix: saving model weights (#556)

91ff425

* fix: saving model weights checkpointing not saving model weights if calling `accelerator.prepare_model` instead of `accelerator.prepare` resolves issue: #555 * fix: saveing model weights for optimizer and scheduler

Remove timings

076cf1b

Fix clean (#569)

5e25edd

Add durations

75fd158

Reduce test by one epoch

ba5d7eb

muellerzr added 11 commits July 26, 2022 09:28

Should speed up the rest

fc11d13

Disable example diff for now

8b1df02

fix test?

8a78a37

Rename all the caches

a10cf6e

Don't cache models

0985c15

Store site packages

89f83d8

Speed up ci

e077c7c

Change hash

5299b16

Clean

ebea01e

Remove timings

ec2f4c7

Merge branch 'timing' of https://github.com/huggingface/accelerate in…

99948a6

…to timing

muellerzr closed this Jul 26, 2022

muellerzr mentioned this pull request Jul 26, 2022

Speed up main CI #571

Merged

muellerzr deleted the timing branch July 31, 2022 18:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up CI more#548

Speed up CI more#548
muellerzr wants to merge 35 commits intomainfrom
timing

muellerzr commented Jul 21, 2022 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jul 21, 2022 •

edited

Loading

Uh oh!

sgugger left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

muellerzr commented Jul 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jul 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

muellerzr commented Jul 21, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 21, 2022 •

edited

Loading