Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fairseq[-hydra]-train torchrun compatibility: default device_id set to LOCAL_RANK if exists #4351

Conversation

colinclement
Copy link
Contributor

@colinclement colinclement commented Apr 13, 2022

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you read the contributor guideline?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Fixes #4302 (issue).

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

I had fun when I figured out why torchrun was failing :)

Copy link

@kiukchung kiukchung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for doing this!

@facebook-github-bot
Copy link
Contributor

@dianaml0 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@colinclement colinclement deleted the dev/colin/infer-device-id-for-torchrun branch May 4, 2022 19:13
lzzk pushed a commit to lzzk/fairseq that referenced this pull request Jul 24, 2022
…o LOCAL_RANK if exists (facebookresearch#4351)

Summary:
# Before submitting

- [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
- [x] Did you make sure to update the docs?
- [x] Did you write any new necessary tests?

## What does this PR do?
Fixes facebookresearch#4302 (issue).

## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

## Did you have fun?
I had fun when I figured out why torchrun was failing :)

Pull Request resolved: facebookresearch#4351

Reviewed By: shruti-bh

Differential Revision: D35784181

Pulled By: dianaml0

fbshipit-source-id: 560c7af12b2f9278cba6c85711b98b9e043d0ec9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make fairseq_cli.train compatible with torch.distributed.run
4 participants