[export] Refactor export() and separate the non-strict part. #114697

zhxchen17 · 2023-11-28T20:15:55Z

Summary: Refactor torch.export to separate strict part and non strict part. Adding an option to torch.export called strict=True.

Test Plan: buck2 test mode/opt caffe2/test:test_export -- -r non_strict

cc @avikchaudhuri @gmagogsfm @tugsbayasgalan @angelayi @suo @ydwu4

pytorch-bot · 2023-11-28T20:15:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114697

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (12 Unrelated Failures)

As of commit c4f4800 with merge base 3b7d60b ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2023-11-28T20:16:06Z

This pull request was exported from Phabricator. Differential Revision: D51604074

…#114697) Summary: Refactor torch.export to separate strict part and non strict part. Adding an option to torch.export called `strict=True`. Test Plan: buck2 test mode/opt caffe2/test:test_export -- -r non_strict Differential Revision: D51604074

facebook-github-bot · 2023-11-28T20:30:25Z

This pull request was exported from Phabricator. Differential Revision: D51604074

ydwu4

Overall looks good to me! Except it's a bit larger and difficult to review lol. I was distracted by some of the BE modifications and have a hard time relate the old implementation to the new implementaion. Left a few minor comments

ydwu4 · 2023-11-28T23:21:45Z

torch/export/__init__.py

    *,
    constraints: Optional[List[Constraint]] = None,
    dynamic_shapes: Optional[Union[Dict[str, Any], Tuple[Any]]] = None,
+    strict: bool = True,


Since "strict" is user-facing. We might need to add warnings and docs for this "strict" keyword. It would be better if we can mention what are the observable consequences/implications of using strict.

Maybe we should make it "_strict"?

I don't think this matters because non strict mode is already unsafe by design, and we just need to fix forward bugs if there's any.

ydwu4 · 2023-11-28T23:25:38Z

torch/_export/__init__.py

        gm = res.graph_module

    assert orig_out_spec is not None
-    tensor_constants = lift_constant_tensor_pass(gm, export_graph_signature, params_buffers)


em... why we need to remove _replace_sym_size_ops_pass?

we move it to _export_non_strict directly because this pass is pretty generic.

ydwu4 · 2023-11-28T23:36:50Z

torch/_export/__init__.py

+        fake_args,
+        _reorder_kwargs_by_names(orig_args, fake_args, fake_kwargs),
+        fake_params_buffers,
+        transform=_process_user_inputs


Non-blocking: do we really need the transform kwarg? It looks like a wrapper. When seeing the word transform, I thought it's some passes over the graph module. Can we just do the wrapping eagerly then pass the wrapped into _export_non_strict? It's a bit more readable I guess.

yeah good point. I might remove this at some point but right now I need this to support user input mutations which requires some code to run before and after aot_export_module, so it has a sandwich structure. I can add a TODO here.

ydwu4 · 2023-11-29T01:43:21Z

test/export/test_export.py

        inp = ([torch.ones(1, 3)], torch.ones(1, 3))
        self._test_export_same_as_eager(f, inp)

+    def test_basic_non_strict(self):


can we also add a test for fake tensor mode and fake tensor inputs for non strict mode?

angelayi · 2023-11-29T16:59:29Z

torch/_export/__init__.py

+    # Note: aot_export_module doesn't accept kwargs, we'd like to reorder the kwargs as an OrderedDict
+    # to follow the order in orig_args and correctly call module


is this comment relevant?

angelayi · 2023-11-29T17:07:04Z

torch/_export/__init__.py

    kwargs = kwargs or {}

+    if not strict:
+        assert isinstance(f, torch.nn.Module)


why do we assert this if we just wrap f with another module?

you're right, but I'd prefer to keep it here until we do need it.

angelayi · 2023-11-29T17:12:37Z

torch/_export/__init__.py

+                sig.buffers_to_mutate = pytree.tree_map(strip_root, sig.buffers_to_mutate)
+                return gm, sig
+            return _aot_export_non_strict
+        ep_non_strict = _export_non_strict(f, args, {}, f.state_dict(), transform=_tuplify_outputs)


Do we need to fakeify the args? Seems like _export_non_strict takes fake args? Or we can move the _convert_input_to_fake call to _export_non_strict?

I think I will leave this part for later discussion when we're supporting dynamic shapes. Right now it shouldn't matter.

angelayi · 2023-11-29T17:17:30Z

torch/_export/__init__.py

            if id(dynamo_buffer) in buffer_lookup:
                param_buffer_table[dynamo_name] = buffer_lookup[id(dynamo_buffer)].pop()

    if isinstance(f, torch.nn.Module):


For the lines above related to the param_buffer_table, should that also be added to _export_to_non_strict? Since we probably want to keep the param/buffer names the same in both cases right?

param_buffer_table is only for dynamo which messes up the state dict. If we're using aot_export_module directly we don't need to do anything special here.

Oh ok, then maybe we should move that to export_to_torch_ir (not related to this PR)

tugsbayasgalan · 2023-11-29T18:09:54Z

torch/_export/__init__.py

        o: b for o, b in graph_signature.buffers_to_mutate.items() if b not in names
    }
-    graph_signature.user_inputs = list(reversed(new_node_names.values()))  # type: ignore[arg-type]
+    graph_signature.user_inputs.extend(new_node_names.values())


Why is the reversed not matter here?

I put it at line 625

tugsbayasgalan · 2023-11-29T18:15:03Z

torch/_export/__init__.py

+                        super().__init__()
+                        self._export_root = mod
+
+                    def forward(self, *args, **kwargs):


Why not just:

is_scalar = False def forward(self, args): inner = self._export_root(args) if inner is not (list, dict, tuple): nonlocal is_scalar is_scalar = True return tuple(inner) return inner

sounds like pytree is more fool-proof?

tugsbayasgalan · 2023-11-29T18:16:29Z

torch/export/__init__.py

    *,
    constraints: Optional[List[Constraint]] = None,
    dynamic_shapes: Optional[Union[Dict[str, Any], Tuple[Any]]] = None,
+    strict: bool = True,


Maybe we should make it "_strict"?

facebook-github-bot · 2023-11-29T20:47:19Z

@zhxchen17 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-11-29T20:52:32Z

@zhxchen17 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

tugsbayasgalan · 2023-11-29T21:02:07Z

torch/_export/__init__.py

Can you leave a comment here explaining we don't actually use aot_export_module's out_spec so it is up to us to manipulate it however we want?

…#114697) Summary: Refactor torch.export to separate strict part and non strict part. Adding an option to torch.export called `strict=True`. Test Plan: buck2 test mode/opt caffe2/test:test_export -- -r non_strict Differential Revision: D51604074

zhxchen17 · 2023-11-30T15:53:31Z

@pytorchbot merge -f "fbgemm errors"

pytorchmergebot · 2023-11-30T15:55:17Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-11-30T15:55:34Z

Merge failed

Reason: This PR has internal changes and must be landed via Phabricator

Details for Dev Infra team

Raised by workflow job

zhxchen17 · 2023-11-30T16:42:42Z

@pytorchbot merge -f "fbgemm errors"

pytorchmergebot · 2023-11-30T16:44:28Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-11-30T16:44:48Z

Merge failed

Reason: This PR has internal changes and must be landed via Phabricator

Details for Dev Infra team

Raised by workflow job

zhxchen17 · 2023-11-30T16:45:39Z

@pytorchbot merge -f "fbgemm errors"

pytorchmergebot · 2023-11-30T16:47:28Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

) Current non-strict test cases (added in #114697) are already supported by strict mode, so it can't demonstrate the incremental value of non-strict mode. How about adding test cases that fail in strict mode but pass in non-strict mode? Test Plan: python test/export/test_export.py -k test_external_call_non_strict_real_tensor Pull Request resolved: #115245 Approved by: https://github.com/tugsbayasgalan, https://github.com/zhxchen17

…#114697) Summary: Refactor torch.export to separate strict part and non strict part. Adding an option to torch.export called `strict=True`. Test Plan: buck2 test mode/opt caffe2/test:test_export -- -r non_strict Pull Request resolved: pytorch#114697 Approved by: https://github.com/ydwu4, https://github.com/tugsbayasgalan

…rch#115245) Current non-strict test cases (added in pytorch#114697) are already supported by strict mode, so it can't demonstrate the incremental value of non-strict mode. How about adding test cases that fail in strict mode but pass in non-strict mode? Test Plan: python test/export/test_export.py -k test_external_call_non_strict_real_tensor Pull Request resolved: pytorch#115245 Approved by: https://github.com/tugsbayasgalan, https://github.com/zhxchen17

zhxchen17 requested review from avikchaudhuri, gmagogsfm and tugsbayasgalan as code owners November 28, 2023 20:15

facebook-github-bot added the fb-exported label Nov 28, 2023

github-actions bot added ciflow/inductor module: export labels Nov 28, 2023

zhxchen17 requested review from angelayi, suo and ydwu4 November 28, 2023 20:17

zhxchen17 added the topic: not user facing topic category label Nov 28, 2023

zhxchen17 force-pushed the export-D51604074 branch from 158810a to 935bdf6 Compare November 28, 2023 20:30

ydwu4 approved these changes Nov 28, 2023

View reviewed changes

ydwu4 reviewed Nov 29, 2023

View reviewed changes

ydwu4 added release notes: export and removed module: export labels Nov 29, 2023

angelayi reviewed Nov 29, 2023

View reviewed changes

tugsbayasgalan reviewed Nov 29, 2023

View reviewed changes

zhxchen17 force-pushed the export-D51604074 branch from 935bdf6 to 4fe7912 Compare November 29, 2023 20:45

github-actions bot added the module: export label Nov 29, 2023

zhxchen17 force-pushed the export-D51604074 branch from 4fe7912 to 908b0de Compare November 29, 2023 20:51

tugsbayasgalan approved these changes Nov 29, 2023

View reviewed changes

zhxchen17 force-pushed the export-D51604074 branch from 908b0de to c4f4800 Compare November 30, 2023 04:33

pytorchmergebot added the merging label Nov 30, 2023

pytorchmergebot removed the merging label Nov 30, 2023

pytorchmergebot added the merging label Nov 30, 2023

pytorchmergebot removed the merging label Nov 30, 2023

pytorchmergebot added the merging label Nov 30, 2023

pytorchmergebot added Merged and removed merging labels Nov 30, 2023

pytorchmergebot closed this in e6b3a8c Nov 30, 2023

huydhn mentioned this pull request Dec 2, 2023

Support fuzzy string matching to compare failures pytorch/test-infra#4775

Open

andrewlee302 mentioned this pull request Dec 6, 2023

Add test case to prove non-strict export supports external call #115245

Closed

		# Note: aot_export_module doesn't accept kwargs, we'd like to reorder the kwargs as an OrderedDict
		# to follow the order in orig_args and correctly call module

[export] Refactor export() and separate the non-strict part. #114697

[export] Refactor export() and separate the non-strict part. #114697

Uh oh!

Conversation

zhxchen17 commented Nov 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114697

✅ You can merge normally! (12 Unrelated Failures)

Uh oh!

facebook-github-bot commented Nov 28, 2023

Uh oh!

facebook-github-bot commented Nov 28, 2023

Uh oh!

ydwu4 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydwu4 Nov 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydwu4 Nov 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Nov 29, 2023

Uh oh!

facebook-github-bot commented Nov 29, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhxchen17 commented Nov 30, 2023

Uh oh!

pytorchmergebot commented Nov 30, 2023

zhxchen17 commented Nov 28, 2023 •

edited

Loading

pytorch-bot bot commented Nov 28, 2023 •

edited

Loading

ydwu4 left a comment •

edited

Loading

ydwu4 Nov 28, 2023 •

edited

Loading

ydwu4 Nov 29, 2023 •

edited

Loading