Fix decomp behaviour in export training IR #134801

tugsbayasgalan · 2024-08-29T22:47:29Z

Stack from ghstack (oldest at bottom):

Subset of changes in #132901, can't land the previous one because it is too complicated. Rest of the change will be implemented as follow up after export design meeting. This part just makes the training IR -> inference IR decomp to have the same path as normal export.

Differential Revision: D62000525

[ghstack-poisoned]

pytorch-bot · 2024-08-29T22:47:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/134801

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (9 Unrelated Failures)

As of commit e85a235 with merge base 195ac85 ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

linux-binary-libtorch-cxx11-abi / libtorch-cpu-shared-with-deps-cxx11-abi-test / test (gh) (trunk failure)
linux-binary-libtorch-pre-cxx11 / libtorch-cpu-shared-with-deps-pre-cxx11-test / test (gh) (trunk failure)
linux-binary-manywheel / manywheel-py3_9-cuda11_8-split-test / test (gh) (trunk failure)
linux-binary-manywheel / manywheel-py3_9-cuda11_8-test / test (gh) (trunk failure)
linux-binary-manywheel / manywheel-py3_9-cuda12_1-split-test / test (gh) (trunk failure)
linux-binary-manywheel / manywheel-py3_9-cuda12_1-test / test (gh) (trunk failure)
linux-binary-manywheel / manywheel-py3_9-cuda12_4-split-test / test (gh) (trunk failure)
linux-binary-manywheel / manywheel-py3_9-cuda12_4-test / test (gh) (trunk failure)
trunk / macos-py3-arm64 / test (default, 3, 3, macos-m1-stable) (gh) (trunk failure)
dynamo/test_subclasses.py::TestNestedTensor::test_inference_tensor

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: cfffa47 Pull Request resolved: #134801

[ghstack-poisoned]

ghstack-source-id: 430aa46 Pull Request resolved: #134801

tugsbayasgalan · 2024-08-29T22:50:49Z

@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

tugsbayasgalan · 2024-08-29T23:21:18Z

@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Subset of changes in #132901, can't land the previous one because it is too complicated. Rest of the change will be implemented as follow up after export design meeting. This part just makes the training IR -> inference IR decomp to have the same path as normal export. Differential Revision: [D62000525](https://our.internmc.facebook.com/intern/diff/D62000525) [ghstack-poisoned]

facebook-github-bot · 2024-09-03T04:43:44Z

This pull request was exported from Phabricator. Differential Revision: D62000525

facebook-github-bot · 2024-09-03T04:53:11Z

This pull request was exported from Phabricator. Differential Revision: D62000525

test/export/test_export.py

pianpwk · 2024-09-03T18:05:32Z

torch/_export/utils.py

-                fake_val = node.meta["val"]
-            if fake_val is not None and isinstance(fake_val, torch.Tensor):
-                fake_vals.append(fake_val)
+


I think this makes sense, but could you explain what prompted the change? The previous implementation seemed equivalent, and only required one loop right?

torch/export/_trace.py

pianpwk · 2024-09-03T18:09:32Z

torch/export/_trace.py

-                        node.meta[k] = v
+    _populate_param_buffer_metadata_to_new_gm(
+        params_buffers_to_node_meta, gm, export_graph_signature
+    )


torch/_export/utils.py

torch/export/exported_program.py

pianpwk · 2024-09-03T18:18:55Z

torch/export/exported_program.py

    return gm, new_graph_signature


+def _remove_unneccessary_copy_op_pass(


noob question: why is this op special?

This is because ep.module() adds copy_ nodes in the end to update the buffers. When we retrace, we functionalize these nodes and they will show up as extra nodes in the end. We actually don't need it because aot_export_module will take care of returning extra updated buffers.

Subset of changes in #132901, can't land the previous one because it is too complicated. Rest of the change will be implemented as follow up after export design meeting. This part just makes the training IR -> inference IR decomp to have the same path as normal export. Differential Revision: [D62000525](https://our.internmc.facebook.com/intern/diff/D62000525) [ghstack-poisoned]

facebook-github-bot · 2024-09-03T19:34:36Z

This pull request was exported from Phabricator. Differential Revision: D62000525

Pull Request resolved: #134801 @imported-using-ghimport Differential Revision: [D62000525](https://our.internmc.facebook.com/intern/diff/D62000525/) ghstack-source-id: 240844924

Subset of changes in #132901, can't land the previous one because it is too complicated. Rest of the change will be implemented as follow up after export design meeting. This part just makes the training IR -> inference IR decomp to have the same path as normal export. Differential Revision: [D62000525](https://our.internmc.facebook.com/intern/diff/D62000525) [ghstack-poisoned]

facebook-github-bot · 2024-09-04T00:48:39Z

This pull request was exported from Phabricator. Differential Revision: D62000525

Pull Request resolved: #134801 @imported-using-ghimport Differential Revision: [D62000525](https://our.internmc.facebook.com/intern/diff/D62000525/) ghstack-source-id: 240909553

Subset of changes in #132901, can't land the previous one because it is too complicated. Rest of the change will be implemented as follow up after export design meeting. This part just makes the training IR -> inference IR decomp to have the same path as normal export. Differential Revision: [D62000525](https://our.internmc.facebook.com/intern/diff/D62000525) [ghstack-poisoned]

tugsbayasgalan · 2024-09-04T17:09:19Z

@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

avikchaudhuri

lgtm after discussing offline

angelayi · 2024-08-29T22:57:03Z

torch/export/_trace.py

+    # When aot_export lifts the params, we lose metadata (e.g. source_fn_stack, stack_trace)
+    # from the param nodes as they are treated as fresh inputs
+    # Therefore, we manually extract them before calling into aot_export
+    # params_buffers_to_node_meta = _collect_param_buffer_metadata(gm_torch_level)


tugsbayasgalan · 2024-09-05T01:11:48Z

@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-09-05T06:35:59Z

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

pytorchmergebot · 2024-09-05T06:37:36Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

facebook-github-bot · 2024-09-05T21:12:11Z

This pull request was exported from Phabricator. Differential Revision: D62000525

Pull Request resolved: pytorch/pytorch#134801 @imported-using-ghimport Differential Revision: [D62000525](https://our.internmc.facebook.com/intern/diff/D62000525/) ghstack-source-id: 15a3e01

Subset of changes in pytorch#132901, can't land the previous one because it is too complicated. Rest of the change will be implemented as follow up after export design meeting. This part just makes the training IR -> inference IR decomp to have the same path as normal export. Differential Revision: [D62000525](https://our.internmc.facebook.com/intern/diff/D62000525) Pull Request resolved: pytorch#134801 Approved by: https://github.com/avikchaudhuri, https://github.com/angelayi

Fix decomp behaviour in export training IR

f0263fb

[ghstack-poisoned]

tugsbayasgalan requested review from angelayi, avikchaudhuri, ydwu4 and zhxchen17 as code owners August 29, 2024 22:47

pytorch-bot bot added ciflow/inductor labels Aug 29, 2024

tugsbayasgalan added a commit that referenced this pull request Aug 29, 2024

Fix decomp behaviour in export training IR

eaf4a1b

ghstack-source-id: cfffa47 Pull Request resolved: #134801

Update on "Fix decomp behaviour in export training IR"

d537ecf

[ghstack-poisoned]

tugsbayasgalan added a commit that referenced this pull request Aug 29, 2024

Fix decomp behaviour in export training IR

8758ac3

ghstack-source-id: 430aa46 Pull Request resolved: #134801

tugsbayasgalan added the topic: not user facing topic category label Aug 29, 2024

tugsbayasgalan mentioned this pull request Aug 29, 2024

Support preserving ops in one shot export #134657

Closed

facebook-github-bot added the fb-exported label Sep 3, 2024