[aota] Allow some mutations in backward #128409

IvanKobzarev · 2024-06-11T12:09:55Z

Stack from ghstack (oldest at bottom):

-> [aota] Allow some mutations in backward #128409

Allow mutations in backward on forward inputs, if
1/ not mutationg metadata
Enforced at compilation time.

2/ if create_graph=True: mutated input does not require_grad
Enforced in runtime, when create_graph mode can be detected by checking torch.is_grad_enabled()

Adding input_joint_info to track mutations of inputs during joint.
Created a separate field in ViewAndMutationMeta as it is filled only after joint fn tracing.

[ghstack-poisoned]

pytorch-bot · 2024-06-11T12:09:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128409

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 61ed9b1 with merge base d71f922 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

inductor-periodic / cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (dynamic_aot_eager_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
yolov3
pull / linux-jammy-py3.8-gcc11 / test (distributed, 1, 2, linux.2xlarge) (gh) (similar failure)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

bdhirsh · 2024-06-11T14:09:43Z

torch/_functorch/_aot_autograd/schemas.py

    requires_grad: bool
    keep_input_mutations: bool
+    # JointFn Mutation Info, is filled only after jointFn tracing.
+    joint_mutates_data: Optional[bool] = None


We only actually need to support joint_mutates_data - if any of the other 3 types of mutation below happen during the backward, we should raise an error during tracing and say we don't support it (we technically could support it, but metadata mutation / storage mutation are all kind of niche, and supporting them in the backward adds complexity that we probably don't care about right now)

If these are only going to be set when we need autograd (i.e. trace_joint = True), what do you think about keeping them in an optional list of indices or bools instead of directly on InputInfo? That way you can avoid needing to assert that the joint_mutates_* is not none each time.

Agreed, let's just make it a (non-optional?) list of indices, similar to indices_of_inps_to_detach (code).

This is actually pretty important for runtime perf: models can have 100+ inputs to the forward and backwrd graph, and looping through all 100 of them can have noticeable runtime overhead.

we can add index i to the list only if input i gets a backward mutation and it requires grad. That way, our assertion at runtime only has to check if the list is non-empty to raise an error ( but it can still print the actual indices for a good error message).

bdhirsh · 2024-06-11T14:11:43Z

torch/_functorch/_aot_autograd/runtime_wrappers.py


+                # Backward with forward inputs mutations is not supported in double backward.
+                if torch.is_grad_enabled() and any(
+                    i.joint_mutates_data and not i.mutates_data


hmm two things about this assertion:

(1) if we have a graph that mutates a tensor that does not require grad during the backward, this assert will fail incorrectly (even if is_grad_enabled() == True during the backward and we see an input mutation, as long as that input does not require grad it is ok to keep it in the graph)

(2) this assert will never fire if we have a tensor that is mutated in both the forward and the backward. but that seems wrong.

About (2): Potentially in future we can retrace backward_module after partitioning and by version counter determine if bwd mutated tensor that was mutated in forward.

jamesjwu · 2024-06-11T14:18:09Z

torch/_functorch/_aot_autograd/schemas.py



 # This class tells us info about user inputs.
-@dataclass(frozen=True)


Could we keep this frozen and use dataclasses.replace to add these values instead? Unless the copy is going to be too much of a perf hit, the benefit of keeping the data structure immutable will be really helpful. Most pre compile wrappers create a new copy of ViewAndMutationData anyway.

Thanks, I did not know about dataclasses.replace - applied.

Moved fields from InputAliasInfo to ViewMutationsMeta field, keeping InputAliasInfo immutable

[ghstack-poisoned]

#127572 Allow mutations in backward on forward inputs, if 1/ not mutationg metadata Enforced at compilation time. 2/ if create_graph=True: mutated input does not require_grad Enforced in runtime, when create_graph mode can be detected by checking torch.is_grad_enabled() Adding joint_mutates_data field to InputAliasInfo, which is not set at creation of InputAliasInfo (after tracing just forward), but is set after tracing joint function => Using dataclasses.replace to update this field. [ghstack-poisoned]

ghstack-source-id: d685052 Pull Request resolved: #128409

bdhirsh · 2024-06-11T17:09:58Z

torch/_functorch/_aot_autograd/runtime_wrappers.py

                    return tuple(out)

+                # Backward with forward inputs mutations is not supported in double backward.
+                if torch.is_grad_enabled() and any(


this still has the runtime performance issues from before: we need to loop through the entire list at runtime (where # entries scales with number of inputs to the backward).

I would just make a new field on ViewAndMutationMeta that is something like indices_of_inputs_that_require_grad_with_mutations_in_bw: List[int], that only contains the indices of inputs that we might need to error on

#127572 Allow mutations in backward on forward inputs, if 1/ not mutationg metadata Enforced at compilation time. 2/ if create_graph=True: mutated input does not require_grad Enforced in runtime, when create_graph mode can be detected by checking torch.is_grad_enabled() Adding input_joint_info to track mutations of inputs during joint. Created a separate field in ViewAndMutationMeta as it is filled only after joint fn tracing. [ghstack-poisoned]

bdhirsh · 2024-06-11T19:23:56Z

torch/_functorch/_aot_autograd/schemas.py

    tokens: Dict[Any, torch.Tensor] = field(default_factory=dict)

+    # Filled after jointFn tracing.
+    # Kept for runtime checks when those mutations are allowed.


Can you add more to this comment - something like:

Only filled in if/when we trace the joint function If an input requires grad an is mutated in the backward, it is only safe to keep the mutation in the graph if gradients are disabled while the backward runs (grad mode is disabled by default when users run the backward, but can be turned on with create_graph=True) At runtime during the backward, we use this list of indices to error properly if we find out that it was not safe to include a backward mutation in the graph.

#127572 Allow mutations in backward on forward inputs, if 1/ not mutationg metadata Enforced at compilation time. 2/ if create_graph=True: mutated input does not require_grad Enforced in runtime, when create_graph mode can be detected by checking torch.is_grad_enabled() Adding input_joint_info to track mutations of inputs during joint. Created a separate field in ViewAndMutationMeta as it is filled only after joint fn tracing. [ghstack-poisoned]

ghstack-source-id: 043cba6 Pull Request resolved: #128409

bdhirsh · 2024-06-12T19:02:41Z

test/functorch/test_aotdispatch.py

-            AssertionError, "input that requires_grad and was mutated in the backward"
-        ):
-            self.verify_aot_autograd(f, inp_grad, test_mutation=True)
+        self.verify_aot_autograd(f, inp_grad, test_mutation=True)


nit: can you add a quick comment in the test explaining that we can properly handle keeping the backward mutation in the graph in this test because the backward is running under no_grad? (someone is gonna look at this test in a year and have to squint really hard to figure that out)

bdhirsh

nice!

IvanKobzarev · 2024-06-13T09:55:24Z

@pytorchbot merge

pytorchmergebot · 2024-06-13T09:58:18Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch#127572 Allow mutations in backward on forward inputs, if 1/ not mutationg metadata Enforced at compilation time. 2/ if create_graph=True: mutated input does not require_grad Enforced in runtime, when create_graph mode can be detected by checking torch.is_grad_enabled() Adding input_joint_info to track mutations of inputs during joint. Created a separate field in ViewAndMutationMeta as it is filled only after joint fn tracing. Pull Request resolved: pytorch#128409 Approved by: https://github.com/bdhirsh

[aota] Allow some mutations in backward

1e98920

[ghstack-poisoned]

pytorch-bot bot added the ciflow/inductor label Jun 11, 2024

Update on "[aota] Allow some mutations in backward"

964b932

[ghstack-poisoned]

IvanKobzarev requested review from Chillee and ezyang as code owners June 11, 2024 14:00

bdhirsh reviewed Jun 11, 2024

View reviewed changes

bdhirsh requested review from jamesjwu, tugsbayasgalan and zou3519 June 11, 2024 14:11

jamesjwu reviewed Jun 11, 2024

View reviewed changes

Update on "[aota] Allow some mutations in backward"

076ef2d

[ghstack-poisoned]

IvanKobzarev added a commit that referenced this pull request Jun 11, 2024

[aota] Allow some mutations in backward

72547bc

ghstack-source-id: d685052 Pull Request resolved: #128409

bdhirsh reviewed Jun 11, 2024

View reviewed changes

IvanKobzarev requested a review from bdhirsh June 12, 2024 10:42

IvanKobzarev added a commit that referenced this pull request Jun 12, 2024

[aota] Allow some mutations in backward

4e620ad

ghstack-source-id: 043cba6 Pull Request resolved: #128409

bdhirsh reviewed Jun 12, 2024

View reviewed changes

bdhirsh approved these changes Jun 12, 2024

View reviewed changes

IvanKobzarev added the topic: not user facing topic category label Jun 12, 2024

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 13, 2024

pytorchmergebot added the merging label Jun 13, 2024

pytorchmergebot added the Merged label Jun 13, 2024

pytorchmergebot closed this in 2b9465d Jun 13, 2024

pytorchmergebot removed the merging label Jun 13, 2024

github-actions bot deleted the gh/IvanKobzarev/45/head branch July 14, 2024 02:02



		# This class tells us info about user inputs.
		@dataclass(frozen=True)

[aota] Allow some mutations in backward #128409

[aota] Allow some mutations in backward #128409

Uh oh!

Conversation

IvanKobzarev commented Jun 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128409

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bdhirsh left a comment

Choose a reason for hiding this comment

Uh oh!

IvanKobzarev commented Jun 13, 2024

Uh oh!

pytorchmergebot commented Jun 13, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

IvanKobzarev commented Jun 11, 2024 •

edited

Loading

pytorch-bot bot commented Jun 11, 2024 •

edited

Loading