[feat] Gossip/SlowMo #378

blefaudeux · 2021-02-11T01:55:52Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
Did you read the contributor guideline?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

Disclaimer: I (@lefaudeux) am no the author, Vinayak (@vtantia) is. Just testing the CI and putting up a draft PR

TODOs:

Write documentation
Fix the licensing
Make sure that the unit tests run with the global pytest runner
Factorize the unit tests a little, cleanup/autogenerate
Add tutorial

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

…a/stochastic_gradient_push

Need to maybe automate this part with pre-commit

CHANGELOG.md

odelalleau

Looking good overall, just a bunch of small comments to clarify a few points.

docs/source/deep_dive/slowmo_ddp.rst

odelalleau

Minor polish suggestions

docs/source/deep_dive/slowmo_ddp.rst

odelalleau · 2021-11-05T19:11:44Z

docs/source/tutorials/slowmo_ddp.rst

                outputs = model(data)
                loss = loss_fn(outputs, target)
                loss.backward()
                optimizer.step()
                if use_slowmo:
+                    model.zero_grad()


I assume it is important to have it here for SlowMo, but I don't remember why: it would be good to explain it in the docstring of perform_slowmo() and refer to this doc here.

Thanks for the update, a couple of follow-up comments on this point:

Minor: would it be possible for the doc link to directly point to the perform_slowmo part of the page? (no big deal if not possible)

How does this save memory? According to the doc (https://pytorch.org/docs/stable/generated/torch.nn.Module.html) it won't flush the tensors unless set_to_none is set to True

Have fixed this. The link is a little ugly but it has very little chance of breaking in the future, so it might be good to go ahead with

Ahh nice catch, I've fixed that. In the fairseq repo, setting to None was the default behavior of zero_grad so I got confused about that

docs/source/tutorials/slowmo_ddp.rst

odelalleau

Thanks, looking good! I have a few small suggestions below -- feel free to pick those you like and drop the other ones

(the main point is to simplify a bit the example by keep the zero_grad() call in the same place)

docs/source/tutorials/slowmo_ddp.rst

anj-s

Thank you for the PR @vtantia !

blefaudeux · 2021-11-08T17:20:50Z

Congrats on the PR @vtantia, and thank you !

Summary: - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? SlowMo is being moved to [Fairscale](https://fairscale.readthedocs.io/en/latest/). This commit updates the implementation of SlowMo to the Fairscale version. It also adds tests for SlowMo. Note: This PR is currently for review. It will be merged at a later date once SlowMo has been updated to Fairscale. SlowMo is being merged to Fairscale as part of [a PR](facebookresearch/fairscale#378). So, once that PR is merged to Fairscale, this PR on Fairseq will be ready for merge ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: #3996 Reviewed By: dianaml0 Differential Revision: D32280163 Pulled By: vtantia fbshipit-source-id: 70c97b04a7cdc90ada7099375c2a31b0c978ba70

vtantia added 30 commits February 9, 2021 14:07

Add latest version of gossip code from branch latest_master of vtanti…

268f2f8

…a/stochastic_gradient_push

Add code for importing GossipDataParallel in fairscale

f152379

Add tests (currently in wrong location so will need to be moved)

bbeab4a

Remove extra ad_psgd file

4616722

Add change in gitignore to ignore vscode config

ed7b866

Perform formatting (black, isort, flake8)

89d865f

Need to maybe automate this part with pre-commit

Add scripts to load environment and format code

8157603

Add stubs for fairscale script

5d458f9

[Temp] Comment out a line in stubs to fix error message

9fdd823

Remove remaining adpsgd code

3a09576

Remove unnecessary function

96fec9e

Add mypy typing to GossipDataParallel

d32d384

Fix formatting

015537f

Make format.sh a script

9b5aff7

Make flaky test log message clearer

cc83b84

Fix minor bug in mypy implementation

9c78976

Add tests for SGP

3068a34

Minor mypy changes

dbb4eb3

Fix errors with multiple process groups by synchronizing appropriately

4b4c373

Remove deprecated file

ab01f16

Fix mypy in utils/helpers.py

993c6ff

Finish mypy typing for distributed.py

5e30d5d

Add typing to and format test files

00c1ff2

Fix mypy errors including those for switching to Python 3.6

7d75ab2

Temporary commit - cleaning up parameters

7c1e998

Remove single process support to make code cleaner

92aef32

Change localsgd to be set as an option

0e6f6ea

Refactor perform_additional_optimizer_actions function

98f9d36

Clean up

9427e12

Factor out sgp_int

b1c66c7

vtantia added 4 commits November 3, 2021 10:07

Minor renaming in docs

a19323e

Merge branch 'main' into slowmo_ben

306dbef

Minor addition to CHANGELOG.md

39383bd

Merge branch 'main' into slowmo_ben

5bd07f9

odelalleau suggested changes Nov 4, 2021

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

vtantia self-assigned this Nov 4, 2021

Add deep dive for SlowMo

c6d0273

odelalleau suggested changes Nov 4, 2021

View reviewed changes

vtantia added 4 commits November 5, 2021 09:50

Modify deep dive and tutorial to address recommendations in code review

b59a835

Minor refactor - name change

d9765a7

Modify deep dive to make condition for using SlowMo clearer

122e082

MModification to CHANGELOG.md to address review comments

b325371

vtantia requested a review from odelalleau November 5, 2021 18:05

odelalleau suggested changes Nov 5, 2021

View reviewed changes

Add changes in documentation to address code review

45830c1

vtantia requested a review from odelalleau November 5, 2021 19:42

vtantia added 3 commits November 5, 2021 12:44

Fix minor linter error

c7242de

Fix missing parameter in docs

22efbaa

Fix link in docs

68ff8f1

vtantia force-pushed the slowmo_ben branch from 3abb0ee to 4f90d23 Compare November 5, 2021 20:39

Fix missing parameter in docs

d0d94d0

vtantia force-pushed the slowmo_ben branch from 4f90d23 to d0d94d0 Compare November 5, 2021 20:39

odelalleau approved these changes Nov 5, 2021

View reviewed changes

anj-s approved these changes Nov 5, 2021

View reviewed changes

vtantia added 2 commits November 7, 2021 21:19

Modification to tutorials to address code review comments

67f6003

Merge branch 'main' into slowmo_ben

9cf9153

vtantia merged commit 21464e0 into main Nov 8, 2021

min-xu-ai deleted the slowmo_ben branch September 23, 2022 21:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Gossip/SlowMo #378

[feat] Gossip/SlowMo #378

blefaudeux commented Feb 11, 2021 •

edited by vtantia

odelalleau left a comment

odelalleau left a comment

odelalleau Nov 5, 2021

odelalleau Nov 5, 2021

vtantia Nov 5, 2021

odelalleau left a comment •

edited

anj-s left a comment

blefaudeux commented Nov 8, 2021

[feat] Gossip/SlowMo #378

[feat] Gossip/SlowMo #378

Conversation

blefaudeux commented Feb 11, 2021 • edited by vtantia

Before submitting

What does this PR do?

PR review

Did you have fun?

odelalleau left a comment

Choose a reason for hiding this comment

odelalleau left a comment

Choose a reason for hiding this comment

odelalleau Nov 5, 2021

Choose a reason for hiding this comment

odelalleau Nov 5, 2021

Choose a reason for hiding this comment

vtantia Nov 5, 2021

Choose a reason for hiding this comment

odelalleau left a comment • edited

Choose a reason for hiding this comment

anj-s left a comment

Choose a reason for hiding this comment

blefaudeux commented Nov 8, 2021

blefaudeux commented Feb 11, 2021 •

edited by vtantia

odelalleau left a comment •

edited