Add min cut partitioner for AOT+nvFuser #88204

IvanYashchuk · 2022-11-01T16:04:30Z

Here we mark most of torch.ops.nvprims as something that can be recomputed in the backward passes (and hopefully fused).

TODO:

Add a test after Add a basic test for "nvprims_nvfuser" Dynamo backend #88186 is merged

cc @kevinstephano @jjsjann123 @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire

pytorch-bot · 2022-11-01T16:04:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88204

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 91c148b:

The following jobs have failed:

macos-12-py3-arm64-mps / Run MPS tests

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jjsjann123

LGTM, minor comments

jjsjann123 · 2022-11-02T05:18:35Z

torch/_dynamo/optimizations/training.py

+
+    # First we trace the graph conditionally decomposing nodes
+    # that can be sent to the nvfuser executor
+    with TorchRefsNvfuserCapabilityMode():


Does this mean that later we are speculatively lowering the partitioned graph to nvprim again in prims_executor?

should we just skip the speculative lowering, or is the second lowering there to catch some other decomposed op?

Unfortunately yes.

Initially, I did leave this only in the partitioner code, but unfortunately, AOT has two different code paths for the case when at least one of the inputs requires grad and none requires grad, for the latter case "partition_fn" is not used at all, and there's no way to pass that information further without modifying AOT code. All inputs to the "fw_compiler" functions are detached and do not require grad, so it's not possible to determine this from within there.

Okay with a bit of monkey patching unnecessary lowering step is avoided in 2a1103c

Looks good~ thx for patching this~

A quick follow up question.

So now we are not lowering to nvprims after aot_autograd, does this mean post-autograd decomposition won't be lowered to nvprims now 🤯

I don't know exactly when post-autograd decomposition happens.. Hopefully it's not the case and we can still use it.
Can we get a CI test to guard/verify that behavior?

We don't use decompositions (aot_module_simplified(decompositions=None)). Even if we used it, it would happen before calling the partitioning function.

jjsjann123 · 2022-11-02T05:21:31Z

torch/_dynamo/optimizations/training.py

+        prim_gm = make_fx(func)(*joint_inputs)
+
+    # all nvprims for now
+    recomputable_ops = {


Wondering if we should define this recomputable_ops inside nvfuser_prims.py, where new nvprims are added? Just to avoid accidentally adding more normalization/reduction ops and having them default in recompute.

Or maybe add compulsory markers to nvprims: this function is a reduction, and this one is a normalization, and so on.

That sounds like a better idea. So let's put a TODO here and we can clean it up afterwards.

…n-cut

IvanYashchuk · 2022-11-04T16:40:57Z

@jansel can you please approve to help merge changes to Dynamo's nvprims_nvfuser backend?

…n-cut

IvanYashchuk · 2022-11-08T09:36:27Z

@pytorchbot merge -g

IvanYashchuk · 2022-11-08T21:15:21Z

@pytorchbot merge

pytorchmergebot · 2022-11-08T21:17:05Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2022-11-08T21:37:18Z

Merge failed

Reason: 2 additional jobs have failed, first few of them are: trunk ,trunk / macos-12-py3-arm64-mps / Run MPS tests

Details for Dev Infra team

Raised by workflow job

IvanYashchuk · 2022-11-09T12:55:13Z

@pytorchbot merge -f "mps failure is unrelated"

pytorchmergebot · 2022-11-09T12:56:51Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Here we mark most of `torch.ops.nvprims` as something that can be recomputed in the backward passes (and hopefully fused). TODO: - [x] Add a test after pytorch#88186 is merged Pull Request resolved: pytorch#88204 Approved by: https://github.com/jjsjann123, https://github.com/jansel

Add min cut partitioner for AOT+nvFuser

b6539b8

IvanYashchuk added the module: nvfuser label Nov 1, 2022

IvanYashchuk requested review from jjsjann123 and kevinstephano November 1, 2022 16:04

github-actions bot added ciflow/inductor module: dynamo labels Nov 1, 2022

pytorchbot added the open source label Nov 1, 2022

jjsjann123 approved these changes Nov 2, 2022

View reviewed changes

IvanYashchuk marked this pull request as ready for review November 2, 2022 11:13

IvanYashchuk added 5 commits November 2, 2022 13:13

Merge remote-tracking branch 'upstream/master' into nvfuser-min-cut

e8cdfae

Add test_min_cut

0e4bb13

Don't transform the graph to nvprims if it's already processed

2a1103c

Move recomputable list to nvfuser_prims

0873702

Merge remote-tracking branch 'upstream/viable/strict' into nvfuser-mi…

5091680

…n-cut

IvanYashchuk requested a review from jansel November 4, 2022 16:41

jansel approved these changes Nov 6, 2022

View reviewed changes

Merge remote-tracking branch 'upstream/viable/strict' into nvfuser-mi…

91c148b

…n-cut

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 8, 2022

pytorchmergebot added the Merged label Nov 9, 2022

pytorchmergebot closed this in 69b2352 Nov 9, 2022

Add min cut partitioner for AOT+nvFuser #88204

Add min cut partitioner for AOT+nvFuser #88204

Uh oh!

Conversation

IvanYashchuk commented Nov 1, 2022 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88204

❌ 1 Failures

Uh oh!

jjsjann123 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IvanYashchuk commented Nov 4, 2022

Uh oh!

IvanYashchuk commented Nov 8, 2022

Uh oh!

IvanYashchuk commented Nov 8, 2022

Uh oh!

pytorchmergebot commented Nov 8, 2022

Merge started

Uh oh!

pytorchmergebot commented Nov 8, 2022

Merge failed

Uh oh!

IvanYashchuk commented Nov 9, 2022

Uh oh!

pytorchmergebot commented Nov 9, 2022

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

IvanYashchuk commented Nov 1, 2022 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 1, 2022 •

edited

Loading