[GPU] Use NCCL user buffers for collective permute and all-to-all #8874

trevor-m · 2024-01-26T22:24:52Z

This PR enables XLA to take advantage of NCCL user buffers for ncclSend/ncclRecv when --xla_gpu_enable_nccl_user_buffers=true is used. Requires NCCL 2.20

kamaljeeti · 2024-02-15T05:54:14Z

Hi @cheshire , can you look into this once? Thanks.

kamaljeeti · 2024-03-12T05:51:44Z

Hi @cheshire , there is an internal CI build failing can you look into this once? Thanks.

…to-all Imported from GitHub PR openxla/xla#8874 This PR enables XLA to take advantage of NCCL user buffers for ncclSend/ncclRecv when `--xla_gpu_enable_nccl_user_buffers=true` is used. Requires NCCL 2.20 Copybara import of the project: -- 8de2786d3242c76bed385235b5655156ee187e5f by Trevor Morris <tmorris@nvidia.com>: Use NCCL user buffers for ncclSend/ncclRecv ops -- 56ceecb1b7fc1606dd00b514bbdb7d039e787b8c by Trevor Morris <tmorris@nvidia.com>: Include memory space in buffers for collective permute and send/recv -- 64711757e48b619b9e2d322fc49714a94194d8f1 by Trevor Morris <tmorris@nvidia.com>: Don't offload send, recv Merging this change closes #8874 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#8874 from trevor-m:p2p-user-buffers 64711757e48b619b9e2d322fc49714a94194d8f1 PiperOrigin-RevId: 615104094

akuegel · 2024-03-14T11:42:50Z

xla/service/gpu/gpu_memory_space_assignment.h

+        // opcode or async wrapped opcode is in kSupportedOpcodes.
+        if (kSupportedOpcodes->contains(alias->instruction()->opcode()) ||
+            (alias->instruction()->opcode() == HloOpcode::kAsyncStart ||
+             alias->instruction()->opcode() == HloOpcode::kAsyncDone) &&


This causes a warning which we treat as error:

error: '&&' within '||' [-Werror,-Wlogical-op-parentheses]

CC @ddunl for reconciling warnings (given that we use Clang in both places now, why can't we have an identical set of warnings?)

Thanks for letting me know, I fixed the conditional.

…to-all Imported from GitHub PR openxla/xla#8874 This PR enables XLA to take advantage of NCCL user buffers for ncclSend/ncclRecv when `--xla_gpu_enable_nccl_user_buffers=true` is used. Requires NCCL 2.20 Copybara import of the project: -- 8de2786d3242c76bed385235b5655156ee187e5f by Trevor Morris <tmorris@nvidia.com>: Use NCCL user buffers for ncclSend/ncclRecv ops -- 56ceecb1b7fc1606dd00b514bbdb7d039e787b8c by Trevor Morris <tmorris@nvidia.com>: Include memory space in buffers for collective permute and send/recv -- b3e776cb8486f2952dcb60a753dcea3c11da4d87 by Trevor Morris <tmorris@nvidia.com>: Don't offload send, recv Merging this change closes #8874 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#8874 from trevor-m:p2p-user-buffers b3e776cb8486f2952dcb60a753dcea3c11da4d87 PiperOrigin-RevId: 615104094

…to-all Imported from GitHub PR openxla/xla#8874 This PR enables XLA to take advantage of NCCL user buffers for ncclSend/ncclRecv when `--xla_gpu_enable_nccl_user_buffers=true` is used. Requires NCCL 2.20 Copybara import of the project: -- 98acdf27d4eba6b19652a76d3f7dcd6630349fc5 by Trevor Morris <tmorris@nvidia.com>: Use NCCL user buffers for ncclSend/ncclRecv ops -- bcc289b49bcf2086b50a86a2381ea1b80acd3dd2 by Trevor Morris <tmorris@nvidia.com>: Include memory space in buffers for collective permute and send/recv -- 4a83d8906b6b5e305dad23fc1d8b9a5069637279 by Trevor Morris <tmorris@nvidia.com>: Don't offload send, recv -- 0083a418c4ab119ed5a0eb061113104980476943 by Trevor Morris <tmorris@nvidia.com>: Fix conditional Merging this change closes #8874 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#8874 from trevor-m:p2p-user-buffers 0083a418c4ab119ed5a0eb061113104980476943 PiperOrigin-RevId: 615104094

…to-all Imported from GitHub PR openxla/xla#8874 This PR enables XLA to take advantage of NCCL user buffers for ncclSend/ncclRecv when `--xla_gpu_enable_nccl_user_buffers=true` is used. Requires NCCL 2.20 Copybara import of the project: -- 98acdf27d4eba6b19652a76d3f7dcd6630349fc5 by Trevor Morris <tmorris@nvidia.com>: Use NCCL user buffers for ncclSend/ncclRecv ops -- bcc289b49bcf2086b50a86a2381ea1b80acd3dd2 by Trevor Morris <tmorris@nvidia.com>: Include memory space in buffers for collective permute and send/recv -- 4a83d8906b6b5e305dad23fc1d8b9a5069637279 by Trevor Morris <tmorris@nvidia.com>: Don't offload send, recv -- 0083a418c4ab119ed5a0eb061113104980476943 by Trevor Morris <tmorris@nvidia.com>: Fix conditional Merging this change closes #8874 PiperOrigin-RevId: 617140675

cheshire · 2024-03-21T20:15:55Z

I'm actually seeing crashes from this: it checks layout on recv, but recv shape is tuple, which doesn't have layout.

…nd all-to-all Imported from GitHub PR openxla#8874 This PR enables XLA to take advantage of NCCL user buffers for ncclSend/ncclRecv when `--xla_gpu_enable_nccl_user_buffers=true` is used. Requires NCCL 2.20 Copybara import of the project: -- 98acdf2 by Trevor Morris <tmorris@nvidia.com>: Use NCCL user buffers for ncclSend/ncclRecv ops -- bcc289b by Trevor Morris <tmorris@nvidia.com>: Include memory space in buffers for collective permute and send/recv -- 4a83d89 by Trevor Morris <tmorris@nvidia.com>: Don't offload send, recv -- 0083a41 by Trevor Morris <tmorris@nvidia.com>: Fix conditional Merging this change closes openxla#8874 COPYBARA_INTEGRATE_REVIEW=openxla#8874 from trevor-m:p2p-user-buffers 0083a41 PiperOrigin-RevId: 617140675

github-actions bot added the kokoro:force-run Forces CI to rerun label Jan 26, 2024

github-actions bot assigned kamaljeeti and xla-rotation Jan 26, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label Jan 26, 2024

kamaljeeti requested review from cheshire and jvstokes January 29, 2024 05:12

trevor-m changed the title ~~WIP: [GPU] Use NCCL user buffers for ncclSend/ncclRecv ops (Requires NCCL 2.20)~~ WIP: [GPU] Use NCCL user buffers for ncclSend/ncclRecv ops Jan 29, 2024

trevor-m force-pushed the p2p-user-buffers branch from dfe2c44 to a13ff43 Compare February 6, 2024 19:04

github-actions bot added the kokoro:force-run Forces CI to rerun label Feb 6, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label Feb 6, 2024

github-actions bot added the kokoro:force-run Forces CI to rerun label Feb 9, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label Feb 9, 2024

github-actions bot added the kokoro:force-run Forces CI to rerun label Feb 9, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label Feb 9, 2024

trevor-m changed the title ~~WIP: [GPU] Use NCCL user buffers for ncclSend/ncclRecv ops~~ WIP: [GPU] Use NCCL user buffers for collective permute and all-to-all Feb 9, 2024

trevor-m changed the title ~~WIP: [GPU] Use NCCL user buffers for collective permute and all-to-all~~ [GPU] Use NCCL user buffers for collective permute and all-to-all Feb 15, 2024

cheshire approved these changes Mar 5, 2024

View reviewed changes

cheshire added the kokoro:force-run Forces CI to rerun label Mar 12, 2024

trevor-m force-pushed the p2p-user-buffers branch 2 times, most recently from 4cb441a to 6471175 Compare March 12, 2024 17:32

kokoro-team removed the kokoro:force-run Forces CI to rerun label Mar 12, 2024

github-actions bot added the kokoro:force-run Forces CI to rerun label Mar 12, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label Mar 12, 2024

copybara-service bot mentioned this pull request Mar 12, 2024

PR #8874: [GPU] Use NCCL user buffers for collective permute and all-to-all tensorflow/tensorflow#63537

Merged

trevor-m force-pushed the p2p-user-buffers branch from 6471175 to b3e776c Compare March 12, 2024 18:14

github-actions bot added the kokoro:force-run Forces CI to rerun label Mar 12, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label Mar 12, 2024

akuegel reviewed Mar 14, 2024

View reviewed changes

ddunl approved these changes Mar 15, 2024

View reviewed changes

github-actions bot added the kokoro:force-run Forces CI to rerun label Mar 18, 2024

trevor-m added 4 commits March 18, 2024 14:06

Use NCCL user buffers for ncclSend/ncclRecv ops

98acdf2

Include memory space in buffers for collective permute and send/recv

bcc289b

Don't offload send, recv

4a83d89

Fix conditional

0083a41

trevor-m force-pushed the p2p-user-buffers branch from cc2522d to 0083a41 Compare March 18, 2024 21:07

kokoro-team removed the kokoro:force-run Forces CI to rerun label Mar 18, 2024

ddunl approved these changes Mar 18, 2024

View reviewed changes

copybara-service bot closed this in 49e6dba Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] Use NCCL user buffers for collective permute and all-to-all #8874

[GPU] Use NCCL user buffers for collective permute and all-to-all #8874

trevor-m commented Jan 26, 2024 •

edited

Loading

kamaljeeti commented Feb 15, 2024

kamaljeeti commented Mar 12, 2024

akuegel Mar 14, 2024

cheshire Mar 15, 2024

trevor-m Mar 15, 2024

cheshire commented Mar 21, 2024

[GPU] Use NCCL user buffers for collective permute and all-to-all #8874

[GPU] Use NCCL user buffers for collective permute and all-to-all #8874

Conversation

trevor-m commented Jan 26, 2024 • edited Loading

kamaljeeti commented Feb 15, 2024

kamaljeeti commented Mar 12, 2024

akuegel Mar 14, 2024

Choose a reason for hiding this comment

cheshire Mar 15, 2024

Choose a reason for hiding this comment

trevor-m Mar 15, 2024

Choose a reason for hiding this comment

cheshire commented Mar 21, 2024

trevor-m commented Jan 26, 2024 •

edited

Loading