Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-gpu testing to test_algorithm_resumption #2016

Merged
merged 11 commits into from
Mar 1, 2023

Conversation

eracah
Copy link
Contributor

@eracah eracah commented Feb 28, 2023

What does this PR do?

Resuming some model-surgery algorithms using DDP in multi-gpu settings can cause issues due to module renaming.
There is a unit test (test_algorithm_resumption.py::test_algorithm_resumption) that checks for every algorithm whether you can train a model with that algorithm, save a checkpoint, then resume the training from that checkpoint with the algorithm. However, that unit test only checks when using one gpu.

In this PR, we:

  • extend test_algorithm_resumption.py::test_algorithm_resumption to include a 2 gpu setting.
  • To enable that setting, we also modify algorithm_settings.get_alg_dataloader to include support for distributed samplers
  • Add an xfail for the algorithm SWA for the 2-gpu setting since it has a known issue that we won't fix at the moment

What issue(s) does this change relate to?

fix CO-1848

@eracah eracah marked this pull request as ready for review February 28, 2023 22:41
@eracah eracah requested a review from dakinggg February 28, 2023 23:11
Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left a few nits on error messages to make them a little cleaner

composer/algorithms/gyro_dropout/gyro_dropout.py Outdated Show resolved Hide resolved
composer/algorithms/sam/sam.py Outdated Show resolved Hide resolved
composer/algorithms/stochastic_depth/stochastic_depth.py Outdated Show resolved Hide resolved
composer/algorithms/swa/swa.py Outdated Show resolved Hide resolved
Copy link
Contributor

@nik-mosaic nik-mosaic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work!

composer/algorithms/gyro_dropout/gyro_dropout.py Outdated Show resolved Hide resolved
eracah and others added 5 commits February 28, 2023 16:48
Co-authored-by: nik-mosaic <101217697+nik-mosaic@users.noreply.github.com>
Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>
Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>
Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com>
@mvpatel2000 mvpatel2000 merged commit 431723b into mosaicml:dev Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants