Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Gossip/SlowMo #378

Merged
merged 123 commits into from Nov 8, 2021
Merged

[feat] Gossip/SlowMo #378

merged 123 commits into from Nov 8, 2021

Conversation

blefaudeux
Copy link
Contributor

@blefaudeux blefaudeux commented Feb 11, 2021

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you read the contributor guideline?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Disclaimer: I (@lefaudeux) am no the author, Vinayak (@vtantia) is. Just testing the CI and putting up a draft PR

TODOs:

  • Write documentation
  • Fix the licensing
  • Make sure that the unit tests run with the global pytest runner
  • Factorize the unit tests a little, cleanup/autogenerate
  • Add tutorial

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

Need to maybe automate this part with pre-commit
CHANGELOG.md Outdated Show resolved Hide resolved
@vtantia vtantia self-assigned this Nov 4, 2021
Copy link

@odelalleau odelalleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good overall, just a bunch of small comments to clarify a few points.

docs/source/deep_dive/slowmo_ddp.rst Outdated Show resolved Hide resolved
docs/source/deep_dive/slowmo_ddp.rst Outdated Show resolved Hide resolved
docs/source/deep_dive/slowmo_ddp.rst Outdated Show resolved Hide resolved
docs/source/deep_dive/slowmo_ddp.rst Outdated Show resolved Hide resolved
docs/source/deep_dive/slowmo_ddp.rst Outdated Show resolved Hide resolved
docs/source/deep_dive/slowmo_ddp.rst Show resolved Hide resolved
docs/source/deep_dive/slowmo_ddp.rst Outdated Show resolved Hide resolved
docs/source/deep_dive/slowmo_ddp.rst Outdated Show resolved Hide resolved
docs/source/deep_dive/slowmo_ddp.rst Outdated Show resolved Hide resolved
docs/source/deep_dive/slowmo_ddp.rst Outdated Show resolved Hide resolved
Copy link

@odelalleau odelalleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor polish suggestions

docs/source/deep_dive/slowmo_ddp.rst Outdated Show resolved Hide resolved
outputs = model(data)
loss = loss_fn(outputs, target)
loss.backward()
optimizer.step()
if use_slowmo:
model.zero_grad()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it is important to have it here for SlowMo, but I don't remember why: it would be good to explain it in the docstring of perform_slowmo() and refer to this doc here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, a couple of follow-up comments on this point:

  1. Minor: would it be possible for the doc link to directly point to the perform_slowmo part of the page? (no big deal if not possible)

  2. How does this save memory? According to the doc (https://pytorch.org/docs/stable/generated/torch.nn.Module.html) it won't flush the tensors unless set_to_none is set to True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Have fixed this. The link is a little ugly but it has very little chance of breaking in the future, so it might be good to go ahead with

  2. Ahh nice catch, I've fixed that. In the fairseq repo, setting to None was the default behavior of zero_grad so I got confused about that

docs/source/tutorials/slowmo_ddp.rst Outdated Show resolved Hide resolved
Copy link

@odelalleau odelalleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looking good! I have a few small suggestions below -- feel free to pick those you like and drop the other ones

(the main point is to simplify a bit the example by keep the zero_grad() call in the same place)

docs/source/tutorials/slowmo_ddp.rst Outdated Show resolved Hide resolved
docs/source/tutorials/slowmo_ddp.rst Outdated Show resolved Hide resolved
docs/source/tutorials/slowmo_ddp.rst Outdated Show resolved Hide resolved
docs/source/tutorials/slowmo_ddp.rst Outdated Show resolved Hide resolved
docs/source/tutorials/slowmo_ddp.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@anj-s anj-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @vtantia !

@vtantia vtantia merged commit 21464e0 into main Nov 8, 2021
@blefaudeux
Copy link
Contributor Author

Congrats on the PR @vtantia, and thank you !

facebook-github-bot pushed a commit to facebookresearch/fairseq that referenced this pull request Nov 9, 2021
Summary:
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [x] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?
SlowMo is being moved to [Fairscale](https://fairscale.readthedocs.io/en/latest/). This commit updates the implementation of SlowMo to the Fairscale version. It also adds tests for SlowMo.
Note: This PR is currently for review. It will be merged at a later date once SlowMo has been updated to Fairscale. SlowMo is being merged to Fairscale as part of [a PR](facebookresearch/fairscale#378). So, once that PR is merged to Fairscale, this PR on Fairseq will be ready for merge

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
Make sure you had fun coding �

Pull Request resolved: #3996

Reviewed By: dianaml0

Differential Revision: D32280163

Pulled By: vtantia

fbshipit-source-id: 70c97b04a7cdc90ada7099375c2a31b0c978ba70
@min-xu-ai min-xu-ai deleted the slowmo_ben branch September 23, 2022 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feat] Integration of SlowMo into FairScale
10 participants