[Dynamo] Support torch.Tensor.fn as TorchVariable, not UserDefinedObjectVariable, preventing graph break #93243

min-jean-cho · 2023-01-30T00:05:59Z

As found in #92709, thanks to @ngimel and @jansel, currently torch.Tensor.fn points to UserDefinedObjectVariable rather than TorchVariable. The root cause is due to #92709 (review). To prevent this, build TorchVariable of torch.Tensor.fn pointing to torch.ops.aten.fn.

This issue propagates to torch.Tensor.fn causing graph break with nopython=True.

import torch
import torch._dynamo as dynamo

#op = torch.ops.aten.abs_ # no graph break
op = torch.Tensor.abs_ # graph break 
args = torch.empty(10)

def foo(args):
    return op(args)

opt_foo = dynamo.optimize("inductor", nopython=True)(foo)
y_ = opt_foo(args)

cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire

pytorch-bot · 2023-01-30T00:06:01Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/93243

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 96e2e3a:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

min-jean-cho · 2023-02-01T17:34:03Z

Hi @jansel, please help have a look if it makes sense to support torch.Tensor.{fn} as torch.ops.aten.{fn}, thanks !

jansel · 2023-02-02T01:03:29Z

test/inductor/test_torchinductor_opinfo.py

    "to_sparse": {f32, f64},
-    "uniform": {f16, f32, f64},
+    # AssertionError: Tensor-likes are not close!
+    "uniform": {f16},


Interesting that these started working. I'd expect randomness to not work with opinfo testing.

I'm not sure why eager and inductor randomness match -- not sure if the default generator is global or configured separately and the backend is setting the seed (by the default generator) .

Here's a simple reproducer:

import torch import torch._dynamo as dynamo op = torch.Tensor.bernoulli_ args = torch.empty(10000) def foo(args): #torch.manual_seed(42) return op(args) opt_foo = dynamo.optimize("inductor")(foo) y = foo(args) y_ = opt_foo(args) print(y) print(y_) print(torch.allclose(y, y_))

Is there a graph break here? Try changing dynamo.optimize("inductor", nopython=True).

No, there's no graph break with this PR. I could reproduce the randomness match with nopython=True as well. I've generally noticed they match for CPU fp32, fp64. Even raises the warning, [WARNING] using triton random, expect difference from eager

jansel · 2023-02-02T01:12:09Z

torch/_dynamo/variables/builder.py

                guards=make_guards(GuardBuilder.FUNCTION_MATCH),
            )
+        elif (
+            not is_allowed(value)


Can we just make is_allowed() return True for these?

Should be able to modify:

pytorch/torch/_dynamo/allowed_functions.py

Line 129 in 74592a4

def _allowed_function_ids():

with something like:

for name in dir(torch.Tensor): method = getattr(torch.Tensor, name) if isinstance(method, types.MethodDescriptorType): torch_object_ids[id(method)] = f"torch.Tensor.{name}"

Hmm, torch.Tensor.{name} are method descriptors not module or method, hence doesn't have __module__ attribute. This issue will propagate to when generating fx graph and finding the module of the method:

pytorch/torch/fx/node.py

Line 40 in e7ace1f

def _find_module_of_method(orig_method: Callable[..., Any]) -> str:

Could we change FX to handle that? I suspect it just involves adding a case to the code printer.

Thanks @jansel, your suspect was right. Please help have another look.

torch/fx/node.py

torch/fx/graph.py

Co-authored-by: Jason Ansel <jansel@jansel.net>

min-jean-cho · 2023-02-03T04:23:05Z

Thanks @jansel, for the help in root cause and review.

min-jean-cho · 2023-02-06T17:12:39Z

test/test_torch.py

+    # warn_only=False correctly raises RuntimeError: put_ does not have a deterministic implementation
+    # warn_only=True logs warning from the FallbackKernel: torch.ops.aten.put_.default, instead of as UserWarning:
+    # [W Context.cpp:%(lineno)] Warning: put_ does not have a deterministic implementation
+    @skipIfTorchInductor("warning is logged from the FallbackKernel: torch.ops.aten.put_.default when warn_only=True")


Hi @jansel, skipping the following test. Warning when warn_only=True is correctly raised form the fallback kernel when graph break is eliminated.

min-jean-cho · 2023-02-07T01:50:27Z

@pytorchbot merge

pytorchmergebot · 2023-02-07T01:53:03Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-02-07T07:51:34Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / win-vs2019-cuda11.6-py3 / test (functorch, 1, 1, windows.g5.4xlarge.nvidia.gpu)

Details for Dev Infra team

Raised by workflow job

pytorchmergebot · 2023-02-07T07:51:35Z

The merge job was canceled. If you believe this is a mistake,then you can re trigger it through pytorch-bot.

min-jean-cho · 2023-02-07T08:34:15Z

@pytorchbot merge

pytorchmergebot · 2023-02-07T08:36:07Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

PaliC · 2023-02-07T19:38:00Z

Edit: for Proof of Concept on #94324

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / win-vs2019-cuda11.6-py3 / test (functorch, 1, 1, windows.g5.4xlarge.nvidia.gpu)

Details for Dev Infra team

Rule superuser

Raised by workflow job

… dropdown" We create a new type of error `MergeError` which effectively does what `handle_exception` was doing before in creating an error, however, it lets you add extra internal input. We use this new error type in order to move the broken rule during merges to the dev infra dropdown. Nit: Also minor fix MandatoryChecksMissingError now actually uses the broken rule Addresses: pytorch/test-infra#1081 Result should be something in the vein of #93243 (comment) (albeit with a different error message/summary) [ghstack-poisoned]

We create a new type of error `MergeError` which effectively does what `handle_exception` was doing before in creating an error, however, it lets you add extra internal input. We use this new error type in order to move the broken rule during merges to the dev infra dropdown. Nit: Also minor fix MandatoryChecksMissingError now actually uses the broken rule Addresses: pytorch/test-infra#1081 Result should be something in the vein of #93243 (comment) (albeit with a different error message/summary) [ghstack-poisoned]