[dtensor][4/n] don't use make_fx for strategy propagation #108262

wanchaol · 2023-08-30T18:02:03Z

Stack from ghstack (oldest at bottom):

We were using make_fx for strategy based propagation so that we can get
a graph and the shape related metadata, this becomes too much overkill
for the sharding propagation purpose. This change refactors the strategy
propagation to remove the graph based propagation, instead just use the
op to index to the strategy functions.

We also just use a fake shape prop instead of relying on fx tracing for
the shape/stride propagation.

for a future possible decomposed propagation, we will exercise different
codepath to enable that

NOTE that this would also greatly reduce the latency of:

first time dtensor operations when populating the cache, the first
iter would become faster again!
greatly reduce the test_dtensor_ops.py time again, right now the
whole test finished within 2-3 mins again.

We were using make_fx for strategy based propagation so that we can get a graph and the shape related metadata, this becomes too much overkill for the sharding propagation purpose. This change refactors the strategy propagation to remove the graph based propagation, instead just use the op to index to the strategy functions. We also just use a fake shape prop instead of relying on fx tracing for the shape/stride propagation. for a future possible decomposed propagation, we will exercise different codepath to enable that NOTE that this would also greatly reduce the latency of: 1. first time dtensor operations when populating the cache, the first iter would become faster again! 2. greatly reduce the test_dtensor_ops.py time again, right now the whole test finished within 2-3 mins again. [ghstack-poisoned]

pytorch-bot · 2023-08-30T18:17:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108262

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 5bdc291 with merge base 6dc56d3 ():

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

linux-focal-cuda11.8-py3.9-gcc9 / test (multigpu, 1, 1, linux.g5.12xlarge.nvidia.gpu, unstable) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

We were using make_fx for strategy based propagation so that we can get a graph and the shape related metadata, this becomes too much overkill for the sharding propagation purpose. This change refactors the strategy propagation to remove the graph based propagation, instead just use the op to index to the strategy functions. We also just use a fake shape prop instead of relying on fx tracing for the shape/stride propagation. for a future possible decomposed propagation, we will exercise different codepath to enable that NOTE that this would also greatly reduce the latency of: 1. first time dtensor operations when populating the cache, the first iter would become faster again! 2. greatly reduce the test_dtensor_ops.py time again, right now the whole test finished within 2-3 mins again. [ghstack-poisoned]

fduwjj · 2023-09-11T20:18:48Z

torch/distributed/_tensor/sharding_prop.py

+            return None
+
+    def _wrap_output_spec_tensor_meta(
+        self, output_spec: OutputSpecType, output_tensor_meta: object


Does this mean output_spec is optional?

I think it probably won't be None. I'll submit a follow up PR to do assertion here instead.

fduwjj · 2023-09-11T20:25:54Z

torch/distributed/_tensor/sharding_prop.py

+        with FakeTensorMode():
+            fake_args = op_schema.gen_fake_args()
+            fake_kwargs = op_schema.gen_fake_kwargs()
+            fake_out = op_schema.op(*fake_args, **fake_kwargs)


This will help view,split and chunk op a lot.

fduwjj

LGTM

We were using make_fx for strategy based propagation so that we can get a graph and the shape related metadata, this becomes too much overkill for the sharding propagation purpose. This change refactors the strategy propagation to remove the graph based propagation, instead just use the op to index to the strategy functions. We also just use a fake shape prop instead of relying on fx tracing for the shape/stride propagation. for a future possible decomposed propagation, we will exercise different codepath to enable that NOTE that this would also greatly reduce the latency of: 1. first time dtensor operations when populating the cache, the first iter would become faster again! 2. greatly reduce the test_dtensor_ops.py time again, right now the whole test finished within 2-3 mins again. [ghstack-poisoned]

wanchaol · 2023-09-13T01:14:18Z

@pytorchbot merge

pytorchmergebot · 2023-09-13T01:15:56Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

wanchaol requested review from mrshenli, zhaojuanmao, rohan-varma, H-Huang, awgu, kwen2501, fegin, fduwjj, kiukchung, d4l3k and wz337 as code owners August 30, 2023 18:02

wanchaol added the release notes: distributed (dtensor) release notes category label Aug 30, 2023

fduwjj reviewed Sep 11, 2023

View reviewed changes

fduwjj approved these changes Sep 11, 2023

View reviewed changes

This was referenced Sep 12, 2023

[dtensor][7/n] remove reduction rule #109144

Closed

[dtensor][8/N] Introduce cost model for sharding #109145

Closed

wanchaol added ciflow/trunk Trigger trunk jobs on your pull request ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR labels Sep 12, 2023

pytorchmergebot added the merging label Sep 13, 2023

pytorchmergebot added Merged and removed merging labels Sep 13, 2023

pytorchmergebot closed this in 375d2ca Sep 13, 2023

facebook-github-bot deleted the gh/wanchaol/351/head branch September 16, 2023 14:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dtensor][4/n] don't use make_fx for strategy propagation #108262

[dtensor][4/n] don't use make_fx for strategy propagation #108262

wanchaol commented Aug 30, 2023 •

edited

Loading

pytorch-bot bot commented Aug 30, 2023 •

edited

Loading

fduwjj Sep 11, 2023

wanchaol Sep 12, 2023

fduwjj Sep 11, 2023

fduwjj left a comment

wanchaol commented Sep 13, 2023

pytorchmergebot commented Sep 13, 2023

[dtensor][4/n] don't use make_fx for strategy propagation #108262

[dtensor][4/n] don't use make_fx for strategy propagation #108262

Conversation

wanchaol commented Aug 30, 2023 • edited Loading

pytorch-bot bot commented Aug 30, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108262

✅ You can merge normally! (1 Unrelated Failure)

fduwjj Sep 11, 2023

Choose a reason for hiding this comment

wanchaol Sep 12, 2023

Choose a reason for hiding this comment

fduwjj Sep 11, 2023

Choose a reason for hiding this comment

fduwjj left a comment

Choose a reason for hiding this comment

wanchaol commented Sep 13, 2023

pytorchmergebot commented Sep 13, 2023

Merge started

wanchaol commented Aug 30, 2023 •

edited

Loading

pytorch-bot bot commented Aug 30, 2023 •

edited

Loading