Skip to content

Conversation

Copy link

pytorch-bot bot commented Nov 27, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141641

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit dd3e09f with merge base 78491d6 (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

… inputs flattening"

cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames yf225

[ghstack-poisoned]
… inputs flattening"

cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames yf225

[ghstack-poisoned]
… inputs flattening"

cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames yf225

[ghstack-poisoned]
… inputs flattening"

cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames yf225

[ghstack-poisoned]
… inputs flattening"

cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames yf225

[ghstack-poisoned]
… inputs flattening"

cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames yf225

[ghstack-poisoned]
@xmfan xmfan marked this pull request as ready for review November 28, 2024 01:50
@xmfan xmfan requested a review from bdhirsh as a code owner November 28, 2024 01:50
@xmfan xmfan requested a review from jansel November 28, 2024 01:50

def mark_compiled_autograd_gm(gm: torch.fx.GraphModule):
assert "_compiled_autograd" not in gm.meta
gm.meta["_compiled_autograd"] = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit uncertain why this is needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CA previously skipped try_get_metadata_from_dynamo, because the lookup logic couldn't find the graph module fields on the GmWrapper. With this PR, we pass the try_get_metadata_from_dynamo checks, but this is an issue because it doesn't understand the dynamo graph's first input is boxed, and the aot graph's first inputs are unpacked

Other than to split out try_get_metadata_from_dynamo CA support, we have a couple of places in the stack using in_compiled_autograd_region like skipping aot bw's cache, actualizing aot bw's lazy module, etc. which should be replaced to be on a graph level to work well with reentrant autograd

… inputs flattening"

cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames yf225

[ghstack-poisoned]
xmfan added a commit that referenced this pull request Nov 28, 2024
assert type(gm.graph._codegen) is torch.fx.graph.CodeGen
assert gm.graph._codegen._body_transformer is None
boxed_inputs_count = len(self.example_inputs()[0])
gm.graph._codegen = torch.fx.graph._CompiledAutogradCodeGen(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design wise, is it true that there should be no mutations to the CodeGen object once it is set?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I would have expected something like, instead of mark_compiled_autograd_gm, just test if the codegen is _CompiledAutogradCodeGen

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works for passing the graph from dynamo to aotdispatch.

I'm not sure if it works from CA to dynamo, my current approach is to let dynamo tracing know about the boxed input so that it generates the post-graph bytecode properly for the grad mutations. _CompiledAutogradCodeGen would flatten the inputs when we're tracing.

# inputs is a list of activations and params
gradients = graph(inputs, ...)
# inputs is an empty list
inputs[i].grad = gradients[i]  # need this to work

and currently we have some aliases generated in the bytecode

# inputs is a list of activations and params
inputs_ref_0 = inputs[0]
gradients = graph(inputs, ...)
# inputs is an empty list
inputs_ref_0.grad = gradients[i]

# https://github.com/pytorch/pytorch/pull/122535/files#r1560096481
if isinstance(
mod, torch._dynamo.utils.GmWrapper
) or torch._dynamo.utils.is_compiled_autograd_gm(mod):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the new invariant now? Are these two conditions always guaranteed to be both true / both false?

Copy link
Member Author

@xmfan xmfan Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're not related. GmWrapper is used any time there's bumpy inputs e.g. from a non-dynamo frontend. The only time dynamo allows bumpy inputs is with a compiled autograd graph.

# However, we still want compile-time analysis to be done
# on unpacked inputs as we don't have first class support
# for lists. Hence, we unflatten the inputs here.
return (args[: self._boxed_inputs_count], *args[self._boxed_inputs_count :])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't all inputs just boxed, wouldn't that be simpler?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's easier to handle a list of only tensors in dynamo variables, the rest of the inputs are symints, python callables

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More Q for my information: do you end up with an actual ListVariable in Dynamo when you trace the boxed input?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, but we don't use the ListVariable after we unpack it into TensorVariables

]

if torch._dynamo.compiled_autograd.in_compiled_autograd_region:
if is_compiled_autograd_gm(gm):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of an "educate me about what the code used to do situation". Let's suppose I graph break in the middle of a compiled autograd region. When I resume compilation on the resumption function, I imagine I wouldn't have a compiled autograd gm anymore, right? So is it OK to not go into this condition? Maybe the argument is that the boxed arguments only occur on entry to the very top of the compiled autograd graph? But then when I graph break and resume, I will have a lot of intermediate stack entries that will get fed in as non-boxed inputs, will these get promptly deallocated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like maybe something like #122512 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I resume compilation on the resumption function, I imagine I wouldn't have a compiled autograd gm anymore, right? So is it OK to not go into this condition?

The resume function takes non-boxed inputs, so we don't call this flatten_graph_inputs altogether. Even if we did change the resume function to take boxed inputs, it shouldn't be possible to have a graph break happen before we unpack the boxed inputs (first nodes after placeholder in the graph)

when I graph break and resume, I will have a lot of intermediate stack entries that will get fed in as non-boxed inputs, will these get promptly deallocated?

No these won't deallocate until the end of that graph

# For overhead reasons, this is not the default wrapper, see comment:
# https://github.com/pytorch/pytorch/pull/122535/files#r1560096481
if isinstance(
mod, torch._dynamo.utils.GmWrapper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you gonna delete GmWrapper later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't unless we deprecate non-dynamo frontends who produce graphs with inputs that aren't flat

gm.graph._codegen = torch.fx.graph._CompiledAutogradCodeGen(
boxed_inputs_count
)
mark_compiled_autograd_gm(gm)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure what's going on in this block of code.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine but this PR has mostly told me I don't really understand how the CA specific Dynamo carveouts work lol

Copy link
Contributor

github-actions bot commented Feb 3, 2025

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Feb 3, 2025
@jansel jansel removed their request for review February 18, 2025 19:47
@github-actions github-actions bot closed this Mar 20, 2025
@github-actions github-actions bot deleted the gh/xmfan/140/head branch April 20, 2025 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants