Fix inductor <> ddp_optimizer issue #108081

bdhirsh · 2023-08-28T18:47:42Z

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @anijain2305 @shunting314 this is a tentative fix for #107362. More discussion at #107362 (comment).

@wconstab pointed out that inductor found a graph with 6 input mutations and only 1 output, and seemed to be (incorrectly) chopping off the first "6" outputs from the graph (even though there is only 1). It looks like this is because:

(1) AOTAutograd has special handling for input mutations in inference vs. training graphs. In a training graph, whenever AOTAutograd sees an input mutation, it will add an extra output to the graph, corresponding to the updated input (and then at runtime, it will grab the updated input, and perform the actual mutation outside of the graph).

In inference, AOTAutograd is smarter and can leave the input mutations directly in the graph for inductor to optimize (doing this in training is harder). In inference, AOTAutograd will not add any extra graph outputs for input mutations.

It looks like inductor was unconditionally assuming that input mutations counted as extra outputs in the graph, which is wrong for the inference case.

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov

[ghstack-poisoned]

pytorch-bot · 2023-08-28T18:47:44Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108081

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Merge Blocking SEVs

There is 1 active merge blocking SEVs. Please view them below:

(merge blocking) GH API issues are preventing multiple ci jobs to start.

If you must merge, use @pytorchbot merge -f.

✅ No Failures

As of commit c33fed0 with merge base a16b0aa ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 3404582932e2237cabc5c88c46eb896a11c81a0e Pull Request resolved: #108081

wconstab

thanks for PR'ing this fix! would be good to add a test, i can turn the orig PR repro into one in a follow up.

bdhirsh · 2023-08-28T18:51:37Z

Sounds good - let me try to make a simpler tests (should be doable just by using inductor + inference graph + input mutations + layout optimization)

bdhirsh · 2023-08-28T19:24:30Z

Oh I'll also add an assertion that we don't go beyond the number of graph outputs (which would also have caught this issue instead of silently not preserving strides for certain outputs)

bdhirsh · 2023-08-28T21:46:04Z

Hmm I tried creating a repro by taking the example from the linked issue, removing the DDP code, and manually inserting a graph break at the same place that the ddp optimizer would. I confirmed that between the original repro and my modified version the AOTAutograd graphs that we send to inductor are identical, but for some reason I can't trigger the issue.

I'll try inspecting the generating inductor code and see if it's any different

shunting314 · 2023-08-28T22:31:56Z

@bdhirsh that maybe expected since the graph break caused by DDPOptimizer has subtile difference compared to other graph breaks in dynamo. Check #102591 for a bit more details.

bdhirsh · 2023-08-29T00:12:45Z

hmm @shunting314 - will pointed out to me that the issue you linked only affects code that's run under TORCHINDUCTOR_KEEP_OUTPUT_STRIDE=0. But my repro isn't using that env var - so that issue is probably unrelated?

shunting314 · 2023-08-29T06:13:03Z

@bdhirsh so my point is that for the manually injected graph breaks, downstream graph already get the output of OPTIMIZED upstream graph, whether we succeed to decide the correct output tensors to keep their strides does not matter. Even if we the compiler change the output tensor's stride, the downstream graph will get tensors with those updated stride. Just want to explain why you could not repro earlier and I think we have to use DDPOptimizer to repro the issue.

wconstab · 2023-08-29T16:04:46Z

i'm not sure how brian did the repro, but i would expect you could repro without DDPOptimizer and without graph-breaks, just by

creating a model with some input mutations
running its forward inside no_grad, so inductor incorrectly ignores preserving output strides
asserting that the output tensors from the model have the same strides as eager had when running the model

Why is DDPoptimizer needed for the repro?

I do understand that if graph-breaks are introduced by dynamo, dynamo works around compilation, ensuring it will match expectations.

cc shunting314 this is a tentative fix for #107362. More discussion at #107362 (comment). wconstab pointed out that inductor found a graph with 6 input mutations and only 1 output, and seemed to be (incorrectly) chopping off the first "6" outputs from the graph (even though there is only 1). It looks like this is because: (1) AOTAutograd has special handling for input mutations in inference vs. training graphs. In a training graph, whenever AOTAutograd sees an input mutation, it will add an **extra** output to the graph, corresponding to the updated input (and then at runtime, it will grab the updated input, and perform the actual mutation outside of the graph). In inference, AOTAutograd is smarter and can leave the input mutations directly in the graph for inductor to optimize (doing this in training is harder). In inference, AOTAutograd will **not** add any extra graph outputs for input mutations. It looks like inductor was unconditionally assuming that input mutations counted as extra outputs in the graph, which is wrong for the inference case. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

bdhirsh · 2023-08-30T15:01:51Z

test/dynamo/test_repros.py

@@ -3270,6 +3270,29 @@ def f():

        self.assertEqual(f(), _GLOBAL_CPU_TENSOR + _GLOBAL_CPU_TENSOR)

+    @torch._inductor.config.patch("layout_optimization", True)


@wconstab got a repro! It just took a bit more work than I thought in order to get inductor to actually change the output stride of the convolution. I needed:

layout_optimization = True

channels dim to be > 64 (from the inductor code here)

great, thanks @bdhirsh!!

yea, we may disable layout optimization on the fly if certain patterns are found.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov anijain2305 shunting314 this is a tentative fix for #107362. More discussion at #107362 (comment). wconstab pointed out that inductor found a graph with 6 input mutations and only 1 output, and seemed to be (incorrectly) chopping off the first "6" outputs from the graph (even though there is only 1). It looks like this is because: (1) AOTAutograd has special handling for input mutations in inference vs. training graphs. In a training graph, whenever AOTAutograd sees an input mutation, it will add an **extra** output to the graph, corresponding to the updated input (and then at runtime, it will grab the updated input, and perform the actual mutation outside of the graph). In inference, AOTAutograd is smarter and can leave the input mutations directly in the graph for inductor to optimize (doing this in training is harder). In inference, AOTAutograd will **not** add any extra graph outputs for input mutations. It looks like inductor was unconditionally assuming that input mutations counted as extra outputs in the graph, which is wrong for the inference case. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

@wconstab

…stride=False together (#108235) From talking to @wconstab, we agreed that because of the way DDPOptimizer is written, it is (sort of) incompatible with inductor's `keep_output_stride=False` optimizations (and will cause silent correctness problems if you use them ogether). Added an assertion. Pull Request resolved: #108235 Approved by: https://github.com/wconstab ghstack dependencies: #108081

Fixes #105327. The problem is that `lift_fresh_copy()`'s functionalization implementation currently assumes that the input is always not functional. This is apparently too limiting: when you have "user" code like this (which can potentially come from exporting a model and then running compile on the resulting graph): ``` tensor_constant0 = torch.tensor(2) lift_fresh = torch.ops.aten.lift_fresh_copy.default(tensor_constant0) ``` When we run this through AOTAutograd, the first call (torch.tensor(2)) will **already** be lifted into a functional tensor wrapper - so the `lift_fresh_copy` call doesn't need to do any "lifting" anymore - it just needs to do a clone. Pull Request resolved: #108243 Approved by: https://github.com/albanD ghstack dependencies: #108081, #108235

Fix inductor <> ddp_optimizer issue

37c2e39

[ghstack-poisoned]

github-actions bot added module: inductor ciflow/inductor labels Aug 28, 2023

github-actions bot requested review from albanD and antoniojkim August 28, 2023 18:47

bdhirsh added a commit that referenced this pull request Aug 28, 2023

Fix inductor <> ddp_optimizer issue

c98d60c

ghstack-source-id: 3404582932e2237cabc5c88c46eb896a11c81a0e Pull Request resolved: #108081

github-actions bot requested review from ezyang, miladm, SherlockNoMad, voznesenskym and wconstab August 28, 2023 18:47

wconstab approved these changes Aug 28, 2023

View reviewed changes

bdhirsh mentioned this pull request Aug 28, 2023

[Inductor] Run compiled model failed on 2023_08_17 nightly #107362

Closed

bdhirsh mentioned this pull request Aug 30, 2023

error when using _dynamo.optimize_ddp=True and _inductor.keep_output_stride=False together #108235

Closed

github-actions bot added the module: dynamo label Aug 30, 2023

bdhirsh commented Aug 30, 2023

View reviewed changes

bdhirsh mentioned this pull request Aug 30, 2023

fix issue with lift_fresh_copy when using export + compile #108243

Closed

bdhirsh added 2 commits August 30, 2023 08:58

bdhirsh mentioned this pull request Sep 5, 2023

subclasses <> compile <> dynamic shapes: assume only first inner tensor gets dynamic dims #108570

Closed

bdhirsh added the release notes: distributed (ddp) release notes category label Sep 5, 2023

pytorchmergebot added the Merged label Sep 5, 2023

pytorchmergebot closed this in def33d4 Sep 5, 2023

facebook-github-bot deleted the gh/bdhirsh/452/head branch September 9, 2023 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix inductor <> ddp_optimizer issue #108081

Fix inductor <> ddp_optimizer issue #108081

bdhirsh commented Aug 28, 2023 •

edited

pytorch-bot bot commented Aug 28, 2023 •

edited

wconstab left a comment

bdhirsh commented Aug 28, 2023

bdhirsh commented Aug 28, 2023

bdhirsh commented Aug 28, 2023

shunting314 commented Aug 28, 2023

bdhirsh commented Aug 29, 2023

shunting314 commented Aug 29, 2023

wconstab commented Aug 29, 2023

bdhirsh Aug 30, 2023

wconstab Aug 30, 2023

shunting314 Aug 30, 2023

		@@ -3270,6 +3270,29 @@ def f():

		self.assertEqual(f(), _GLOBAL_CPU_TENSOR + _GLOBAL_CPU_TENSOR)

		@torch._inductor.config.patch("layout_optimization", True)

Fix inductor <> ddp_optimizer issue #108081

Fix inductor <> ddp_optimizer issue #108081

Conversation

bdhirsh commented Aug 28, 2023 • edited

pytorch-bot bot commented Aug 28, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108081

❗ 1 Merge Blocking SEVs

✅ No Failures

wconstab left a comment

Choose a reason for hiding this comment

bdhirsh commented Aug 28, 2023

bdhirsh commented Aug 28, 2023

bdhirsh commented Aug 28, 2023

shunting314 commented Aug 28, 2023

bdhirsh commented Aug 29, 2023

shunting314 commented Aug 29, 2023

wconstab commented Aug 29, 2023

bdhirsh Aug 30, 2023

Choose a reason for hiding this comment

wconstab Aug 30, 2023

Choose a reason for hiding this comment

shunting314 Aug 30, 2023

Choose a reason for hiding this comment

bdhirsh commented Aug 28, 2023 •

edited

pytorch-bot bot commented Aug 28, 2023 •

edited