fix(fx): make all `make_fx` invocations isolated (opaque to higher `make_fx` invocations) by default #93290

jon-chuang · 2023-01-30T19:05:03Z

Example code:

import torch
from torch.fx.experimental.proxy_tensor import make_fx, wrapper_and_args_for_make_fx

@torch.fx.wrap
def func(a, b):
    return b.expand([1, a.shape[0], b.shape[-1]])

a = torch.randn(3, 4)
b = torch.randn(4)

class TestMode(torch.overrides.TorchFunctionMode):
    def __torch_function__(self, func, types, args=(), kwargs={}):
        if torch.overrides.resolve_name(func) in ["torch.Tensor.expand"]:
            print(f"TestMode: {func} {args} {kwargs}")
            wrapped, all_args = wrapper_and_args_for_make_fx(func, args, kwargs)
            gm = make_fx(wrapped, tracing_mode="real")(all_args)

        return func(*args, **kwargs)

with TestMode():
    gm = make_fx(func, tracing_mode="symbolic")(a, b)

gm.graph.print_tabular()

Before:

opcode         name        target               args                              kwargs
-------------  ----------  -------------------  --------------------------------  --------
placeholder    a_1         a_1                  ()                                {}
placeholder    b_1         b_1                  ()                                {}
call_function  detach      aten.detach.default  (b_1,)                            {}
call_function  detach_1    aten.detach.default  (detach,)                         {}
call_function  sym_size    aten.sym_size        (a_1, 0)                          {}
call_function  sym_size_1  aten.sym_size        (b_1, 0)                          {}
call_function  expand      aten.expand.default  (b_1, [1, sym_size, sym_size_1])  {}
call_function  detach_2    aten.detach.default  (expand,)                         {}
call_function  expand_1    aten.expand.default  (b_1, [1, sym_size, sym_size_1])  {}
output         output      output               (expand_1,)                       {}

After:

opcode         name        target               args                              kwargs
-------------  ----------  -------------------  --------------------------------  --------
placeholder    a_1         a_1                  ()                                {}
placeholder    b_1         b_1                  ()                                {}
call_function  sym_size    aten.sym_size        (a_1, 0)                          {}
call_function  sym_size_1  aten.sym_size        (b_1, 0)                          {}
call_function  expand      aten.expand.default  (b_1, [1, sym_size, sym_size_1])  {}
output         output      output               (expand_1,)                       {}

pytorch-bot · 2023-01-30T19:05:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/93290

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9b7b95d:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ezyang · 2023-01-31T01:38:07Z

I... guess we can do this? It kind of feels better to not by default (and let get_isolated_subgraph be used for this case) because if you do it this way, there is no way to have a single trace get recorded by multiple proxy tensor modes at once. This may seem like a weird thing to want but in fact functorch nested grad does something like this (where nested grad levels get recorded onto multiple tapes one per grad level).

The countervailing argument is make_fx doesn't return its outputs, so therefore it is not true compute and shouldn't get traced. I could be convinced by this.

jon-chuang · 2023-01-31T05:21:25Z

Actually, when we call make_fx nested in another make_fx, the inner make_fx will behave exactly like get_isolated_submodule modulo kwargs wrapping.

So this PR will isolate graphmodules by default.

I believe this is a good change as making a tape of making a tape doesn't seem sensible, and produces the weird artifacts observed.

(Unless: an inner tape is used in the definition of the exterior function?)

jon-chuang · 2023-01-31T06:51:22Z

(Unless: an inner tape is used in the definition of the exterior function?)

This is actually impossible. make_fx arrives at its output through mutation, so its output has no functional dependence on its inputs. Furthermore, it does not return any tensors as you point out.

jon-chuang · 2023-01-31T09:39:56Z

functorch nested grad does something like this (where nested grad levels get recorded onto multiple tapes one per grad level).

Could you point to an example? I tried constructing several examples where things might fail, including nesting torch.func.grad and torch.autograd.grad but:

for the former, make_fx is always just called once, the invocation produces the compiled function as a side effect.
for the latter, autograd.grad is actually unnestable and produces a None result...

There is basically no compilation pathway that has nested tapes that rely directly on an interior tape. Because it seems that torch.autograd.autograd itself does not rely on tapes from make_fx, it uses a C++ torch._C._ImperativeEngine.

ezyang · 2023-01-31T13:57:38Z

Could you point to an example? I tried constructing several examples where things might fail, including nesting torch.func.grad and torch.autograd.grad but:

It's not make_fx per se, but tape recording (which is like fx but a bit different). https://github.com/albanD/subclass_zoo/blob/main/simple_functorch.ipynb search for "to compute higher order gradients".

But yeah, consider me convinced, we can land this

test/test_proxy_tensor.py

torch/fx/experimental/proxy_tensor.py

ezyang · 2023-01-31T15:08:18Z

@pytorchbot merge

pytorchmergebot · 2023-01-31T15:12:30Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-01-31T15:22:37Z

Merge failed

Reason: 1 mandatory check(s) failed (Rule superuser). The first few are:

Lint / lintrunner

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

ezyang · 2023-01-31T20:36:59Z

@pytorchbot merge

pytorchmergebot · 2023-01-31T20:39:52Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-01-31T20:50:00Z

Merge failed

Reason: 1 mandatory check(s) failed (Rule superuser). The first few are:

Lint / lintrunner

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

ezyang · 2023-02-01T14:37:49Z

@pytorchbot merge

pytorchmergebot · 2023-02-01T14:41:56Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

jon-chuang added 2 commits January 31, 2023 02:33

fix

18bae49

edit

94f5479

pytorch-bot bot added the release notes: fx release notes category label Jan 30, 2023

jon-chuang mentioned this pull request Jan 30, 2023

Nested FX tracing doesn't work when outer tracing mode is symbolic #88996

Closed

comment

5073f0c

jon-chuang mentioned this pull request Jan 30, 2023

feat(fx): make_fx should be aware of functions wrapped with @fx.wrap #93273

Closed

pytorchbot added the open source label Jan 30, 2023

ezyang requested review from ezyang and Chillee January 31, 2023 01:29

fix

e61cb65

fix test missing arg

8c4c435

jon-chuang changed the title ~~fix(fx): disable other make_fx traces for nested make_fx~~ fix(fx): disable all make_fx traces except current for nested make_fx Jan 31, 2023

jon-chuang changed the title ~~fix(fx): disable all make_fx traces except current for nested make_fx~~ fix(fx): make all make_fx invocations isolated (opaque to higher make_fx invocations) by default Jan 31, 2023

minor comments

2559942

jon-chuang added 2 commits January 31, 2023 22:19

improve

f675e07

minor

061cbd9

jon-chuang commented Jan 31, 2023

View reviewed changes

test/test_proxy_tensor.py Show resolved Hide resolved

ezyang reviewed Jan 31, 2023

View reviewed changes

torch/fx/experimental/proxy_tensor.py Outdated Show resolved Hide resolved

ezyang approved these changes Jan 31, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 31, 2023

minor

12a21ac

lint

669175b

lint - not caught locally, why? :(

9b7b95d

ezyang mentioned this pull request Feb 1, 2023

Local and CI lintrunner should be more consistent #93156

Closed

pytorchmergebot added the Merged label Feb 1, 2023

pytorchmergebot closed this in d5901fc Feb 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(fx): make all `make_fx` invocations isolated (opaque to higher `make_fx` invocations) by default #93290

fix(fx): make all `make_fx` invocations isolated (opaque to higher `make_fx` invocations) by default #93290

jon-chuang commented Jan 30, 2023 •

edited

pytorch-bot bot commented Jan 30, 2023 •

edited

ezyang commented Jan 31, 2023

jon-chuang commented Jan 31, 2023 •

edited

jon-chuang commented Jan 31, 2023 •

edited

jon-chuang commented Jan 31, 2023 •

edited

ezyang commented Jan 31, 2023

ezyang commented Jan 31, 2023

pytorchmergebot commented Jan 31, 2023

pytorchmergebot commented Jan 31, 2023

ezyang commented Jan 31, 2023

pytorchmergebot commented Jan 31, 2023

pytorchmergebot commented Jan 31, 2023

ezyang commented Feb 1, 2023

pytorchmergebot commented Feb 1, 2023

fix(fx): make all make_fx invocations isolated (opaque to higher make_fx invocations) by default #93290

fix(fx): make all make_fx invocations isolated (opaque to higher make_fx invocations) by default #93290

Conversation

jon-chuang commented Jan 30, 2023 • edited

pytorch-bot bot commented Jan 30, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/93290

✅ No Failures

ezyang commented Jan 31, 2023

jon-chuang commented Jan 31, 2023 • edited

jon-chuang commented Jan 31, 2023 • edited

jon-chuang commented Jan 31, 2023 • edited

ezyang commented Jan 31, 2023

ezyang commented Jan 31, 2023

pytorchmergebot commented Jan 31, 2023

Merge started

pytorchmergebot commented Jan 31, 2023

Merge failed

ezyang commented Jan 31, 2023

pytorchmergebot commented Jan 31, 2023

Merge started

pytorchmergebot commented Jan 31, 2023

Merge failed

ezyang commented Feb 1, 2023

pytorchmergebot commented Feb 1, 2023

Merge started

fix(fx): make all `make_fx` invocations isolated (opaque to higher `make_fx` invocations) by default #93290

fix(fx): make all `make_fx` invocations isolated (opaque to higher `make_fx` invocations) by default #93290

jon-chuang commented Jan 30, 2023 •

edited

pytorch-bot bot commented Jan 30, 2023 •

edited

jon-chuang commented Jan 31, 2023 •

edited

jon-chuang commented Jan 31, 2023 •

edited

jon-chuang commented Jan 31, 2023 •

edited