Skip to content

Conversation

kwen2501
Copy link
Contributor

@kwen2501 kwen2501 commented Nov 27, 2023

@pytorch-bot pytorch-bot bot added the release notes: distributed (c10d) release notes category label Nov 27, 2023
Copy link

pytorch-bot bot commented Nov 27, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114597

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3eb9cc0 with merge base 624f202 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but can we add a test of sorts (is there distributed C++ tests, that just build?)

Please note, that this flag would be set to false for extensions, which probably defeats the purpose.
Please consider dependening on the macros defined in say https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/cuda/CUDAConfig.h.in or in https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/Config.h.in (former is preferable imo)

@kwen2501
Copy link
Contributor Author

Thanks for the pointers @malfet !
The DebugInfoWriter and NCCLTraceBuffer were recently added to for use by ProcessGroupNCCL only. And it was moved out to TraceUtil.h for code cleanness of ProcessGroupNCCL.cpp. (TraceUtil.h was primarily used by ProcessGroupNCCL only as well.) Indeed there would be some more work if we were to generalize the infra in TraceUtil.h for other backends' use.

@kwen2501
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 27, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@albanD albanD added oncall: distributed Add this issue/PR to distributed oncall triage queue and removed module: distributed labels Dec 8, 2023
@github-actions github-actions bot deleted the fix_114575 branch February 19, 2024 02:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Third-party devices implement can't find symbols when include TraceUtils.h

4 participants