New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] Gossip/SlowMo #378
[feat] Gossip/SlowMo #378
Conversation
…a/stochastic_gradient_push
Need to maybe automate this part with pre-commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good overall, just a bunch of small comments to clarify a few points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor polish suggestions
docs/source/tutorials/slowmo_ddp.rst
Outdated
outputs = model(data) | ||
loss = loss_fn(outputs, target) | ||
loss.backward() | ||
optimizer.step() | ||
if use_slowmo: | ||
model.zero_grad() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume it is important to have it here for SlowMo, but I don't remember why: it would be good to explain it in the docstring of perform_slowmo()
and refer to this doc here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update, a couple of follow-up comments on this point:
-
Minor: would it be possible for the doc link to directly point to the
perform_slowmo
part of the page? (no big deal if not possible) -
How does this save memory? According to the doc (https://pytorch.org/docs/stable/generated/torch.nn.Module.html) it won't flush the tensors unless
set_to_none
is set to True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Have fixed this. The link is a little ugly but it has very little chance of breaking in the future, so it might be good to go ahead with
-
Ahh nice catch, I've fixed that. In the fairseq repo, setting to None was the default behavior of zero_grad so I got confused about that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looking good! I have a few small suggestions below -- feel free to pick those you like and drop the other ones
(the main point is to simplify a bit the example by keep the zero_grad() call in the same place)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @vtantia !
Congrats on the PR @vtantia, and thank you ! |
Summary: - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements) - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)? - [x] Did you make sure to update the docs? - [x] Did you write any new necessary tests? ## What does this PR do? SlowMo is being moved to [Fairscale](https://fairscale.readthedocs.io/en/latest/). This commit updates the implementation of SlowMo to the Fairscale version. It also adds tests for SlowMo. Note: This PR is currently for review. It will be merged at a later date once SlowMo has been updated to Fairscale. SlowMo is being merged to Fairscale as part of [a PR](facebookresearch/fairscale#378). So, once that PR is merged to Fairscale, this PR on Fairseq will be ready for merge ## PR review Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged. ## Did you have fun? Make sure you had fun coding � Pull Request resolved: #3996 Reviewed By: dianaml0 Differential Revision: D32280163 Pulled By: vtantia fbshipit-source-id: 70c97b04a7cdc90ada7099375c2a31b0c978ba70
Before submitting
What does this PR do?
Disclaimer: I (@lefaudeux) am no the author, Vinayak (@vtantia) is. Just testing the CI and putting up a draft PR
TODOs:
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃