Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reproduce UCC error at exit #28

Closed
wants to merge 2 commits into from

Conversation

kingchc
Copy link
Contributor

@kingchc kingchc commented Sep 29, 2021

Summary:
[This is for debug purpose, DO NOT MERGE]

Error happens when there are more than 2 PGs created and not destroying them explicitly/properly at exit

sample reproducer: mpirun -np 16 <any MPI variables> python comms.py --backend ucc --device cuda --collective all_to_allv --e 1M --collective-pair all_to_allv --collective-pair-size 8K --overlap-pair-pgs 1 --pair 1

Differential Revision: D31291835

Summary:
fix runtime error when reporting pair collective
  - fix typo when recording latency

Differential Revision: D31291699

fbshipit-source-id: d51f1c7cc981efba06d374b0e3bc6b1237273cfa
Summary:
[This is for debug purpose, **DO NOT MERGE**]

Error happens when there are more than 2 PGs created and not destroying them explicitly/properly at exit

sample reproducer: `mpirun -np 16 <any MPI variables> python comms.py --backend ucc --device cuda --collective all_to_allv --e 1M --collective-pair all_to_allv --collective-pair-size 8K --overlap-pair-pgs 1 --pair 1`

Differential Revision: D31291835

fbshipit-source-id: 9f298d44d9e1e596f244d42e37ea31f3cdf6ef42
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D31291835

@kingchc kingchc added bug Something isn't working DO NOT MERGE labels Sep 29, 2021
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 30, 2021
@kingchc kingchc closed this Dec 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. DO NOT MERGE fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants