[ca][aot] use GraphModule CodeGen instead of GmWrapper for inputs flattening #141641

xmfan · 2024-11-27T01:26:12Z

Stack from ghstack (oldest at bottom):

cc @ezyang @SherlockNoMad @EikanWang @jgong5 @wenzhe-nrv @voznesenskym @penguinwu @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @jiayisunx @chenyang78 @kadeng @chauhang @amjames @yf225

…ttening [ghstack-poisoned]

pytorch-bot · 2024-11-27T01:26:15Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141641

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit dd3e09f with merge base 78491d6 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
detectron2_fcos_r_50_fpn

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

inductor / cuda12.4-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#141498)
convnext_base

This comment was automatically generated by Dr. CI and updates every 15 minutes.

… inputs flattening" cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames yf225 [ghstack-poisoned]

…ttening ghstack-source-id: ce1ade7 Pull Request resolved: #141641

… inputs flattening" cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames yf225 [ghstack-poisoned]

torch/_functorch/aot_autograd.py

ezyang · 2024-11-28T01:57:34Z

torch/_dynamo/utils.py

+
+def mark_compiled_autograd_gm(gm: torch.fx.GraphModule):
+    assert "_compiled_autograd" not in gm.meta
+    gm.meta["_compiled_autograd"] = True


A bit uncertain why this is needed

CA previously skipped try_get_metadata_from_dynamo, because the lookup logic couldn't find the graph module fields on the GmWrapper. With this PR, we pass the try_get_metadata_from_dynamo checks, but this is an issue because it doesn't understand the dynamo graph's first input is boxed, and the aot graph's first inputs are unpacked

Other than to split out try_get_metadata_from_dynamo CA support, we have a couple of places in the stack using in_compiled_autograd_region like skipping aot bw's cache, actualizing aot bw's lazy module, etc. which should be replaced to be on a graph level to work well with reentrant autograd

… inputs flattening" cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames yf225 [ghstack-poisoned]

…ttening ghstack-source-id: 2c45e08 Pull Request resolved: #141641

xmfan · 2024-12-02T17:45:25Z

torch/_dynamo/output_graph.py

+            assert type(gm.graph._codegen) is torch.fx.graph.CodeGen
+            assert gm.graph._codegen._body_transformer is None
+            boxed_inputs_count = len(self.example_inputs()[0])
+            gm.graph._codegen = torch.fx.graph._CompiledAutogradCodeGen(


Design wise, is it true that there should be no mutations to the CodeGen object once it is set?

I guess I would have expected something like, instead of mark_compiled_autograd_gm, just test if the codegen is _CompiledAutogradCodeGen

That works for passing the graph from dynamo to aotdispatch.

I'm not sure if it works from CA to dynamo, my current approach is to let dynamo tracing know about the boxed input so that it generates the post-graph bytecode properly for the grad mutations. _CompiledAutogradCodeGen would flatten the inputs when we're tracing.

# inputs is a list of activations and params gradients = graph(inputs, ...) # inputs is an empty list inputs[i].grad = gradients[i] # need this to work

and currently we have some aliases generated in the bytecode

# inputs is a list of activations and params inputs_ref_0 = inputs[0] gradients = graph(inputs, ...) # inputs is an empty list inputs_ref_0.grad = gradients[i]

bdhirsh · 2024-12-02T18:24:16Z

torch/_functorch/aot_autograd.py

-        # https://github.com/pytorch/pytorch/pull/122535/files#r1560096481
+    if isinstance(
+        mod, torch._dynamo.utils.GmWrapper
+    ) or torch._dynamo.utils.is_compiled_autograd_gm(mod):


what's the new invariant now? Are these two conditions always guaranteed to be both true / both false?

They're not related. GmWrapper is used any time there's bumpy inputs e.g. from a non-dynamo frontend. The only time dynamo allows bumpy inputs is with a compiled autograd graph.

ezyang · 2024-12-03T16:21:52Z

torch/fx/graph.py

+        # However, we still want compile-time analysis to be done
+        # on unpacked inputs as we don't have first class support
+        # for lists. Hence, we unflatten the inputs here.
+        return (args[: self._boxed_inputs_count], *args[self._boxed_inputs_count :])


Why aren't all inputs just boxed, wouldn't that be simpler?

it's easier to handle a list of only tensors in dynamo variables, the rest of the inputs are symints, python callables

More Q for my information: do you end up with an actual ListVariable in Dynamo when you trace the boxed input?

yep, but we don't use the ListVariable after we unpack it into TensorVariables

ezyang · 2024-12-03T16:25:27Z

torch/_dynamo/utils.py

    ]

-    if torch._dynamo.compiled_autograd.in_compiled_autograd_region:
+    if is_compiled_autograd_gm(gm):


This is more of an "educate me about what the code used to do situation". Let's suppose I graph break in the middle of a compiled autograd region. When I resume compilation on the resumption function, I imagine I wouldn't have a compiled autograd gm anymore, right? So is it OK to not go into this condition? Maybe the argument is that the boxed arguments only occur on entry to the very top of the compiled autograd graph? But then when I graph break and resume, I will have a lot of intermediate stack entries that will get fed in as non-boxed inputs, will these get promptly deallocated?

Like maybe something like #122512 ?

When I resume compilation on the resumption function, I imagine I wouldn't have a compiled autograd gm anymore, right? So is it OK to not go into this condition?

The resume function takes non-boxed inputs, so we don't call this flatten_graph_inputs altogether. Even if we did change the resume function to take boxed inputs, it shouldn't be possible to have a graph break happen before we unpack the boxed inputs (first nodes after placeholder in the graph)

when I graph break and resume, I will have a lot of intermediate stack entries that will get fed in as non-boxed inputs, will these get promptly deallocated?

No these won't deallocate until the end of that graph

ezyang · 2024-12-03T16:28:28Z

torch/_functorch/aot_autograd.py

-        # For overhead reasons, this is not the default wrapper, see comment:
-        # https://github.com/pytorch/pytorch/pull/122535/files#r1560096481
+    if isinstance(
+        mod, torch._dynamo.utils.GmWrapper


I guess you gonna delete GmWrapper later?

We can't unless we deprecate non-dynamo frontends who produce graphs with inputs that aren't flat

ezyang · 2024-12-03T16:30:16Z

torch/_dynamo/output_graph.py

+            gm.graph._codegen = torch.fx.graph._CompiledAutogradCodeGen(
+                boxed_inputs_count
+            )
+            mark_compiled_autograd_gm(gm)


I'm not entirely sure what's going on in this block of code.

ezyang

This seems fine but this PR has mostly told me I don't really understand how the CA specific Dynamo carveouts work lol

github-actions · 2025-02-03T21:34:01Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

[ca][aot] use GraphModule CodeGen instead of GmWrapper for inputs fla…

d662cee

…ttening [ghstack-poisoned]

xmfan mentioned this pull request Nov 27, 2024

[dynamo][aot] cleanup dynamic fields on GraphModule: _dynamo_source, _param_name_to_source, _source_to_user_stacks #141617

Closed

pytorch-bot bot added ciflow/inductor module: compiled autograd compiled_autograd module: dynamo release notes: AO frontend labels Nov 27, 2024

facebook-github-bot added the fx label Nov 27, 2024

Update on "[ca][aot] use GraphModule CodeGen instead of GmWrapper for…

a3ae851

… inputs flattening" cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames yf225 [ghstack-poisoned]

xmfan mentioned this pull request Nov 27, 2024

[dynamo][aot] cleanup dynamic fields on GraphModule: _param_name_to_source, _source_to_user_stacks #141643

Closed

xmfan added a commit that referenced this pull request Nov 27, 2024

[ca][aot] use GraphModule CodeGen instead of GmWrapper for inputs fla…

32d6adc

…ttening ghstack-source-id: ce1ade7 Pull Request resolved: #141641

Update on "[ca][aot] use GraphModule CodeGen instead of GmWrapper for…

97c3a6f

… inputs flattening" cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames yf225 [ghstack-poisoned]

Update on "[ca][aot] use GraphModule CodeGen instead of GmWrapper for…

f75d524

… inputs flattening" cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames yf225 [ghstack-poisoned]

Update on "[ca][aot] use GraphModule CodeGen instead of GmWrapper for…

c8e75ae

… inputs flattening" cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames yf225 [ghstack-poisoned]

Update on "[ca][aot] use GraphModule CodeGen instead of GmWrapper for…

1d36cbd

… inputs flattening" cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames yf225 [ghstack-poisoned]

Update on "[ca][aot] use GraphModule CodeGen instead of GmWrapper for…

1f4673d

… inputs flattening" cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames yf225 [ghstack-poisoned]

xmfan marked this pull request as ready for review November 28, 2024 01:50

xmfan requested a review from bdhirsh as a code owner November 28, 2024 01:50

xmfan requested a review from jansel November 28, 2024 01:50

ezyang reviewed Nov 28, 2024

View reviewed changes

torch/_functorch/aot_autograd.py Show resolved Hide resolved

ezyang reviewed Nov 28, 2024

View reviewed changes

Update on "[ca][aot] use GraphModule CodeGen instead of GmWrapper for…

dd3e09f

… inputs flattening" cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames yf225 [ghstack-poisoned]

xmfan mentioned this pull request Nov 28, 2024

[aot] refactor dynamo source and cudagraphs static idx logic #141748

Closed

xmfan added a commit that referenced this pull request Nov 28, 2024

[ca][aot] use GraphModule CodeGen instead of GmWrapper for inputs fla…

a90eabf

…ttening ghstack-source-id: 2c45e08 Pull Request resolved: #141641

xmfan commented Dec 2, 2024

View reviewed changes

bdhirsh reviewed Dec 2, 2024

View reviewed changes

ezyang reviewed Dec 3, 2024

View reviewed changes

ezyang approved these changes Dec 3, 2024

View reviewed changes

github-actions bot added the Stale label Feb 3, 2025

jansel removed their request for review February 18, 2025 19:47

github-actions bot closed this Mar 20, 2025

github-actions bot deleted the gh/xmfan/140/head branch April 20, 2025 02:18

[ca][aot] use GraphModule CodeGen instead of GmWrapper for inputs flattening #141641

[ca][aot] use GraphModule CodeGen instead of GmWrapper for inputs flattening #141641

Uh oh!

Conversation

xmfan commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141641

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xmfan Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 3, 2025

Uh oh!

Uh oh!

xmfan commented Nov 27, 2024 •

edited

Loading

pytorch-bot bot commented Nov 27, 2024 •

edited

Loading

xmfan Dec 2, 2024 •

edited

Loading