Skip to content
This repository has been archived by the owner on Mar 19, 2024. It is now read-only.

Fix warm up all reduce call #382

Closed

Conversation

mannatsingh
Copy link
Contributor

Summary: The warm up dist.all_reduce() call was happening before setting the CUDA device, which meant all workers were using device 0. This resulted in crashes / hangs as mentioned in https://fb.workplace.com/groups/1309000715937050/permalink/1621428588027593/

Differential Revision: D30005438

Summary: The warm up `dist.all_reduce()` call was happening before setting the CUDA device, which meant all workers were using device 0. This resulted in crashes / hangs as mentioned in https://fb.workplace.com/groups/1309000715937050/permalink/1621428588027593/

Differential Revision: D30005438

fbshipit-source-id: ba4fd28bdb9e6142dd0a077ab69c903730ce2353
@facebook-github-bot facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Jul 30, 2021
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D30005438

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 8800989.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants