[Traceable FSDP2] Add all_gather_into_tensor out variant #126334

yf225 · 2024-05-15T21:45:49Z

This PR adds torch.ops._c10d_functional.all_gather_into_tensor_out.

It's important for tracing FSDP2, because FSDP2 pre-allocates the output buffer of AllGather, and makes input buffer an alias of the output buffer, and expects both of them to be used to achieve lower memory usage. If we don't preserve this behavior and instead functionalize the AllGather op, AllGather op will then create a brand-new output buffer (instead of reusing), thus significantly increasing the memory usage.

The expectation is that we will "re-inplace" the AllGather op by switching to the out variant in Inductor post-grad stage via an FX pass, so this API is not expected to be directly used by users.

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @chauhang @d4l3k

pytorch-bot · 2024-05-15T21:45:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126334

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6722c5c with merge base c312cd8 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

wanchaol · 2024-05-15T21:55:16Z

torch/csrc/distributed/c10d/Functional.cpp

@@ -321,6 +334,13 @@ TORCH_LIBRARY(_c10d_functional, m) {
          c10::DispatchKey::CompositeExplicitAutograd, ::all_reduce_coalesced_),
      {at::Tag::pt2_compliant_tag});

+  m.def(
+      "all_gather_into_tensor_(Tensor(a!) output, Tensor input, int group_size, str group_name) -> Tensor(a!)",


I would really love this to be an out-variant API, this is not a inplace allgather from operator prospective, it's a out variant op, we should follow the aten naming convention to make the op be an actual out-variant

Sounds great! Updated.

yifuwang · 2024-05-16T00:00:26Z

torch/csrc/distributed/c10d/Functional.cpp

@@ -321,6 +334,13 @@ TORCH_LIBRARY(_c10d_functional, m) {
          c10::DispatchKey::CompositeExplicitAutograd, ::all_reduce_coalesced_),
      {at::Tag::pt2_compliant_tag});

+  m.def(
+      "all_gather_into_tensor_out(Tensor input, int group_size, str group_name, *, Tensor(a!) out) -> Tensor(a!)",


Nice! We can establish the convention that:

Collectives that modify the input are postfixed with _

Out-variant of collectives are postfixed with _out, and the out argument is keyword-only

yifuwang

Looks great!

wanchaol

lgtm!

yf225 · 2024-05-16T05:40:32Z

@pytorchbot merge

pytorchmergebot · 2024-05-16T05:42:34Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

) This PR adds `torch.ops._c10d_functional.all_gather_into_tensor_out`. It's important for tracing FSDP2, because FSDP2 pre-allocates the output buffer of AllGather, and makes input buffer an alias of the output buffer, and expects both of them to be used to achieve lower memory usage. If we don't preserve this behavior and instead functionalize the AllGather op, AllGather op will then create a brand-new output buffer (instead of reusing), thus significantly increasing the memory usage. The expectation is that we will "re-inplace" the AllGather op by switching to the out variant in Inductor post-grad stage via an FX pass, so this API is not expected to be directly used by users. Pull Request resolved: pytorch#126334 Approved by: https://github.com/yifuwang, https://github.com/wanchaol

yf225 requested a review from yifuwang May 15, 2024 21:45

pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category labels May 15, 2024

Add inplace all_gather_into_tensor

172bc95

yf225 force-pushed the inductor_inplace_ag branch from 5f18b32 to 172bc95 Compare May 15, 2024 21:49

wanchaol reviewed May 15, 2024

View reviewed changes

yf225 added 2 commits May 15, 2024 15:06

change to out-variant of all_gather_into_tensor

ca9d805

comment

070e933

yf225 changed the title ~~Add inplace all_gather_into_tensor~~ [Compile FSDP2] Add inplace all_gather_into_tensor May 15, 2024

yf225 changed the title ~~[Compile FSDP2] Add inplace all_gather_into_tensor~~ [Compile FSDP2] Add all_gather_into_tensor out variant May 15, 2024

yf225 changed the title ~~[Compile FSDP2] Add all_gather_into_tensor out variant~~ [Traceable FSDP2] Add all_gather_into_tensor out variant May 15, 2024

yifuwang reviewed May 16, 2024

View reviewed changes

yifuwang approved these changes May 16, 2024

View reviewed changes

wanchaol approved these changes May 16, 2024

View reviewed changes

fix lint

6722c5c

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 16, 2024

yf225 added the topic: not user facing topic category label May 16, 2024

pytorchmergebot added the merging label May 16, 2024

pytorchmergebot added the Merged label May 16, 2024

pytorchmergebot closed this in 4333e12 May 16, 2024

pytorchmergebot removed the merging label May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Traceable FSDP2] Add all_gather_into_tensor out variant #126334

[Traceable FSDP2] Add all_gather_into_tensor out variant #126334

yf225 commented May 15, 2024 •

edited

Loading

pytorch-bot bot commented May 15, 2024 •

edited

Loading

wanchaol May 15, 2024

yf225 May 15, 2024

yifuwang May 16, 2024

yifuwang left a comment

wanchaol left a comment

yf225 commented May 16, 2024

pytorchmergebot commented May 16, 2024

[Traceable FSDP2] Add all_gather_into_tensor out variant #126334

[Traceable FSDP2] Add all_gather_into_tensor out variant #126334

Conversation

yf225 commented May 15, 2024 • edited Loading

pytorch-bot bot commented May 15, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126334

✅ No Failures

wanchaol May 15, 2024

Choose a reason for hiding this comment

yf225 May 15, 2024

Choose a reason for hiding this comment

yifuwang May 16, 2024

Choose a reason for hiding this comment

yifuwang left a comment

Choose a reason for hiding this comment

wanchaol left a comment

Choose a reason for hiding this comment

yf225 commented May 16, 2024

pytorchmergebot commented May 16, 2024

Merge started

yf225 commented May 15, 2024 •

edited

Loading

pytorch-bot bot commented May 15, 2024 •

edited

Loading