[Kineto][NCCL][5/n] Populate in/out split size info for all_to_all from CPU to CUDA kernel #112308

yoyoyocmu · 2023-10-28T00:30:40Z

Summary: This diff populates all_to_all input and out split size from CPU op to GPU kernel when valid.

Test Plan:
Trace example:

For non all_to_all collective functions: https://fburl.com/perfdoctor/4nobsu15
https://pxl.cl/3GNVb
For all_to_all: https://fburl.com/perfdoctor/f418goys

https://pxl.cl/3H2nd

Differential Revision: D50762093

pytorch-bot · 2023-10-28T00:30:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112308

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b53fc36 with merge base a50f6d3 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2023-10-28T00:30:49Z

This pull request was exported from Phabricator. Differential Revision: D50762093

aaronenyeshi · 2023-10-28T15:07:31Z

torch/csrc/profiler/util.cpp

@@ -365,12 +366,18 @@ std::unordered_map<std::string, std::string> saveNcclMeta(
      kDtype, fmt::format("\"{}\"", c10::toString(debugInfo->getDType())));
  map.emplace(kInMsgSize, std::to_string(debugInfo->getInMessageSize()));
  map.emplace(kOutMsgSize, std::to_string(debugInfo->getOutMessageSize()));
-  map.emplace(
+  auto& inSplitSizes = debugInfo->getInputSplitSizes();
+  if (!inSplitSizes.empty() && inSplitSizes.size() <= kTruncatLength) {


Should we still record the first kTruncatLength in the list?

Good point, i was thinking should we keep the first 1 or 30 elements in the list or just skip recording if the length is too long. Any recommended rule to follow?

I updated the code to record the first element when the total length > 30.

It would make sense to record the first 30 elements, and then add a ... when the length is greater than 30. So then we can claim to show all elements up to the first 30.

We can address this in the future, if someone is interested.

pytorch#822) Summary: X-link: pytorch/pytorch#112308 This diff populates all_to_all input and out split size from CPU op to GPU kernel when valid. Reviewed By: idning Differential Revision: D50762093

…om CPU to CUDA kernel (pytorch#112308) Summary: X-link: pytorch/kineto#822 This diff populates all_to_all input and out split size from CPU op to GPU kernel when valid. Test Plan: **Trace example**: - For non all_to_all collective functions: https://fburl.com/perfdoctor/4nobsu15 https://pxl.cl/3GNVb - For all_to_all: https://fburl.com/perfdoctor/f418goys https://pxl.cl/3H2nd Reviewed By: idning Differential Revision: D50762093

facebook-github-bot · 2023-10-30T17:14:01Z

This pull request was exported from Phabricator. Differential Revision: D50762093

aaronenyeshi

LGTM!

pytorch#822) Summary: X-link: pytorch/pytorch#112308 This diff populates all_to_all input and out split size from CPU op to GPU kernel when valid. Reviewed By: aaronenyeshi, idning Differential Revision: D50762093

…om CPU to CUDA kernel (pytorch#112308) Summary: X-link: pytorch/kineto#822 This diff populates all_to_all input and out split size from CPU op to GPU kernel when valid. Test Plan: **Trace example**: - For non all_to_all collective functions: https://fburl.com/perfdoctor/4nobsu15 https://pxl.cl/3GNVb - For all_to_all: https://fburl.com/perfdoctor/f418goys https://pxl.cl/3H2nd Reviewed By: aaronenyeshi, idning Differential Revision: D50762093

facebook-github-bot · 2023-10-30T20:35:01Z

This pull request was exported from Phabricator. Differential Revision: D50762093

…om CPU to CUDA kernel (pytorch#112308) Summary: X-link: pytorch/kineto#822 This diff populates all_to_all input and out split size from CPU op to GPU kernel when valid. Test Plan: **Trace example**: - For non all_to_all collective functions: https://fburl.com/perfdoctor/4nobsu15 https://pxl.cl/3GNVb - For all_to_all: https://fburl.com/perfdoctor/f418goys https://pxl.cl/3H2nd Reviewed By: aaronenyeshi, idning Differential Revision: D50762093

pytorch#822) Summary: X-link: pytorch/pytorch#112308 This diff populates all_to_all input and out split size from CPU op to GPU kernel when valid. Reviewed By: aaronenyeshi, idning Differential Revision: D50762093

facebook-github-bot · 2023-11-06T17:35:25Z

This pull request was exported from Phabricator. Differential Revision: D50762093

#822) Summary: X-link: pytorch/pytorch#112308 Pull Request resolved: #822 This diff populates all_to_all input and out split size from CPU op to GPU kernel when valid. bypass-github-pytorch-ci-checks Reviewed By: aaronenyeshi, idning Differential Revision: D50762093 fbshipit-source-id: a118b9e2623ca0ac6b5f9e30cd554666a4c01a12

facebook-github-bot · 2023-11-07T09:48:21Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2023-11-07T09:50:12Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…om CPU to CUDA kernel (pytorch#112308) Summary: This diff populates all_to_all input and out split size from CPU op to GPU kernel when valid. Test Plan: **Trace example**: - For non all_to_all collective functions: https://fburl.com/perfdoctor/4nobsu15 https://pxl.cl/3GNVb - For all_to_all: https://fburl.com/perfdoctor/f418goys https://pxl.cl/3H2nd Differential Revision: D50762093 Pull Request resolved: pytorch#112308 Approved by: https://github.com/aaronenyeshi

yoyoyocmu requested a review from aaronenyeshi as a code owner October 28, 2023 00:30

facebook-github-bot added the fb-exported label Oct 28, 2023

aaronenyeshi added the release notes: profiler release notes category label Oct 28, 2023

aaronenyeshi requested changes Oct 28, 2023

View reviewed changes

yoyoyocmu force-pushed the export-D50762093 branch from 6f6be94 to 75e5fe0 Compare October 30, 2023 17:13

aaronenyeshi approved these changes Oct 30, 2023

View reviewed changes

yoyoyocmu force-pushed the export-D50762093 branch from 75e5fe0 to 1ac8eea Compare October 30, 2023 20:34

yoyoyocmu force-pushed the export-D50762093 branch from 1ac8eea to b53fc36 Compare November 6, 2023 17:35

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 7, 2023

pytorchmergebot added the merging label Nov 7, 2023

pytorchmergebot added Merged and removed merging labels Nov 7, 2023

pytorchmergebot closed this in 52e2b87 Nov 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kineto][NCCL][5/n] Populate in/out split size info for all_to_all from CPU to CUDA kernel #112308

[Kineto][NCCL][5/n] Populate in/out split size info for all_to_all from CPU to CUDA kernel #112308

yoyoyocmu commented Oct 28, 2023

pytorch-bot bot commented Oct 28, 2023 •

edited

facebook-github-bot commented Oct 28, 2023

aaronenyeshi Oct 28, 2023

yoyoyocmu Oct 30, 2023 •

edited

aaronenyeshi Oct 30, 2023

aaronenyeshi Oct 30, 2023

facebook-github-bot commented Oct 30, 2023

aaronenyeshi left a comment

facebook-github-bot commented Oct 30, 2023

facebook-github-bot commented Nov 6, 2023

facebook-github-bot commented Nov 7, 2023

pytorchmergebot commented Nov 7, 2023

[Kineto][NCCL][5/n] Populate in/out split size info for all_to_all from CPU to CUDA kernel #112308

[Kineto][NCCL][5/n] Populate in/out split size info for all_to_all from CPU to CUDA kernel #112308

Conversation

yoyoyocmu commented Oct 28, 2023

pytorch-bot bot commented Oct 28, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112308

✅ No Failures

facebook-github-bot commented Oct 28, 2023

aaronenyeshi Oct 28, 2023

Choose a reason for hiding this comment

yoyoyocmu Oct 30, 2023 • edited

Choose a reason for hiding this comment

aaronenyeshi Oct 30, 2023

Choose a reason for hiding this comment

aaronenyeshi Oct 30, 2023

Choose a reason for hiding this comment

facebook-github-bot commented Oct 30, 2023

aaronenyeshi left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Oct 30, 2023

facebook-github-bot commented Nov 6, 2023

facebook-github-bot commented Nov 7, 2023

pytorchmergebot commented Nov 7, 2023

Merge started

pytorch-bot bot commented Oct 28, 2023 •

edited

yoyoyocmu Oct 30, 2023 •

edited