fix gloo cuda sparse_allreduce dispatch #111485

H-Huang · 2023-10-18T17:19:32Z

allreduce_sparse_cuda gets dispatched to allreduce_sparse which doesnt exist for gloo. However, gloo has an existing implementation so this is just fixing the dispatching to that.

The reason CI didn't catch this is because we are calling the backend directly. Added a test which calls the public API (dist.XYZ) and goes through the dispatcher

pytorch-bot · 2023-10-18T17:19:36Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/111485

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 02946df with merge base 971f67c ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

H-Huang · 2023-10-19T17:49:54Z

@pytorchbot rebase -s

pytorchmergebot · 2023-10-19T17:52:18Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2023-10-19T17:52:23Z

Successfully rebased main onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout main && git pull --rebase)

fduwjj · 2023-10-19T17:58:42Z

test/distributed/test_c10d_gloo.py

+            self.rank, self.world_size, num_inputs=1
+        )
+        for (inputs, outputs) in tests:
+            tensors = inputs[-1].clone().cuda()


N00b question, does this mean cuda tensors can work on gloo? So we first move tensors from GPU to CPU and do the communication and move it back to CUDA?

gloo supports all_reduce and broadcast for cuda tensors (https://pytorch.org/docs/master/distributed.html#backends). In ProcessGroupGloo implementation for all_reduce it will copy the CUDA tensors to pinned CPU tensors then performs the allreduce. So pg_gloo.all_reduce(cpu_tensor) and pg_gloo.all_reduce(cuda_tensor) are both supported.

fduwjj

Just have a n00b question for my own learning, otherwise it looks good to me.

H-Huang · 2023-10-19T18:52:25Z

@pytorchbot merge

pytorchmergebot · 2023-10-19T18:54:19Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fixes pytorch#111422 allreduce_sparse_cuda gets dispatched to allreduce_sparse which doesnt exist for gloo. However, gloo has an existing implementation so this is just fixing the dispatching to that. The reason CI didn't catch this is because we are calling the backend directly. Added a test which calls the public API (dist.XYZ) and goes through the dispatcher Pull Request resolved: pytorch#111485 Approved by: https://github.com/fduwjj

H-Huang requested review from mrshenli, zhaojuanmao, rohan-varma, awgu, kwen2501, wanchaol, fegin, fduwjj, wz337, kiukchung and d4l3k as code owners October 18, 2023 17:19

pytorch-bot bot added the release notes: distributed (c10d) release notes category label Oct 18, 2023

H-Huang force-pushed the main branch from 309f5eb to 39a4d34 Compare October 18, 2023 17:33

H-Huang added the topic: bug fixes topic category label Oct 18, 2023

H-Huang force-pushed the main branch from 39a4d34 to 7b40ba1 Compare October 18, 2023 20:16

H-Huang added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 19, 2023

fix gloo cuda sparse_allreduce dispatch

02946df

pytorchmergebot force-pushed the main branch from 7b40ba1 to 02946df Compare October 19, 2023 17:52

fduwjj reviewed Oct 19, 2023

View reviewed changes

fduwjj approved these changes Oct 19, 2023

View reviewed changes

pytorchmergebot added the merging label Oct 19, 2023

pytorchmergebot added Merged and removed merging labels Oct 19, 2023

pytorchmergebot closed this in 7a3c3d6 Oct 19, 2023

BLOrange-AMD mentioned this pull request May 8, 2024

Supported allreduce sparse ROCm/pytorch#1409

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix gloo cuda sparse_allreduce dispatch #111485

fix gloo cuda sparse_allreduce dispatch #111485

H-Huang commented Oct 18, 2023

pytorch-bot bot commented Oct 18, 2023 •

edited

H-Huang commented Oct 19, 2023

pytorchmergebot commented Oct 19, 2023

pytorchmergebot commented Oct 19, 2023

fduwjj Oct 19, 2023

H-Huang Oct 19, 2023

fduwjj left a comment

H-Huang commented Oct 19, 2023

pytorchmergebot commented Oct 19, 2023

fix gloo cuda sparse_allreduce dispatch #111485

fix gloo cuda sparse_allreduce dispatch #111485

Conversation

H-Huang commented Oct 18, 2023

pytorch-bot bot commented Oct 18, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/111485

✅ No Failures

H-Huang commented Oct 19, 2023

pytorchmergebot commented Oct 19, 2023

pytorchmergebot commented Oct 19, 2023

fduwjj Oct 19, 2023

Choose a reason for hiding this comment

H-Huang Oct 19, 2023

Choose a reason for hiding this comment

fduwjj left a comment

Choose a reason for hiding this comment

H-Huang commented Oct 19, 2023

pytorchmergebot commented Oct 19, 2023

Merge started

pytorch-bot bot commented Oct 18, 2023 •

edited