-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[Dynamo] Support torch.Tensor.fn as TorchVariable, not UserDefinedObjectVariable, preventing graph break #93243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dynamo] Support torch.Tensor.fn as TorchVariable, not UserDefinedObjectVariable, preventing graph break #93243
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/93243
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 96e2e3a: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Hi @jansel, please help have a look if it makes sense to support |
| "to_sparse": {f32, f64}, | ||
| "uniform": {f16, f32, f64}, | ||
| # AssertionError: Tensor-likes are not close! | ||
| "uniform": {f16}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting that these started working. I'd expect randomness to not work with opinfo testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why eager and inductor randomness match -- not sure if the default generator is global or configured separately and the backend is setting the seed (by the default generator) .
Here's a simple reproducer:
import torch
import torch._dynamo as dynamo
op = torch.Tensor.bernoulli_
args = torch.empty(10000)
def foo(args):
#torch.manual_seed(42)
return op(args)
opt_foo = dynamo.optimize("inductor")(foo)
y = foo(args)
y_ = opt_foo(args)
print(y)
print(y_)
print(torch.allclose(y, y_))There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a graph break here? Try changing dynamo.optimize("inductor", nopython=True).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there's no graph break with this PR. I could reproduce the randomness match with nopython=True as well. I've generally noticed they match for CPU fp32, fp64. Even raises the warning, [WARNING] using triton random, expect difference from eager
torch/_dynamo/variables/builder.py
Outdated
| guards=make_guards(GuardBuilder.FUNCTION_MATCH), | ||
| ) | ||
| elif ( | ||
| not is_allowed(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just make is_allowed() return True for these?
Should be able to modify:
pytorch/torch/_dynamo/allowed_functions.py
Line 129 in 74592a4
| def _allowed_function_ids(): |
with something like:
for name in dir(torch.Tensor):
method = getattr(torch.Tensor, name)
if isinstance(method, types.MethodDescriptorType):
torch_object_ids[id(method)] = f"torch.Tensor.{name}"There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, torch.Tensor.{name} are method descriptors not module or method, hence doesn't have __module__ attribute. This issue will propagate to when generating fx graph and finding the module of the method:
Line 40 in e7ace1f
| def _find_module_of_method(orig_method: Callable[..., Any]) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we change FX to handle that? I suspect it just involves adding a case to the code printer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jansel, your suspect was right. Please help have another look.
Co-authored-by: Jason Ansel <jansel@jansel.net>
Co-authored-by: Jason Ansel <jansel@jansel.net>
|
Thanks @jansel, for the help in root cause and review. |
| # warn_only=False correctly raises RuntimeError: put_ does not have a deterministic implementation | ||
| # warn_only=True logs warning from the FallbackKernel: torch.ops.aten.put_.default, instead of as UserWarning: | ||
| # [W Context.cpp:%(lineno)] Warning: put_ does not have a deterministic implementation | ||
| @skipIfTorchInductor("warning is logged from the FallbackKernel: torch.ops.aten.put_.default when warn_only=True") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jansel, skipping the following test. Warning when warn_only=True is correctly raised form the fallback kernel when graph break is eliminated.
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / win-vs2019-cuda11.6-py3 / test (functorch, 1, 1, windows.g5.4xlarge.nvidia.gpu) Details for Dev Infra teamRaised by workflow job |
|
The merge job was canceled. If you believe this is a mistake,then you can re trigger it through pytorch-bot. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
Edit: for Proof of Concept on #94324 Merge failedReason: 1 jobs have failed, first few of them are: trunk / win-vs2019-cuda11.6-py3 / test (functorch, 1, 1, windows.g5.4xlarge.nvidia.gpu) |
… dropdown" We create a new type of error `MergeError` which effectively does what `handle_exception` was doing before in creating an error, however, it lets you add extra internal input. We use this new error type in order to move the broken rule during merges to the dev infra dropdown. Nit: Also minor fix MandatoryChecksMissingError now actually uses the broken rule Addresses: pytorch/test-infra#1081 Result should be something in the vein of #93243 (comment) (albeit with a different error message/summary) [ghstack-poisoned]
We create a new type of error `MergeError` which effectively does what `handle_exception` was doing before in creating an error, however, it lets you add extra internal input. We use this new error type in order to move the broken rule during merges to the dev infra dropdown. Nit: Also minor fix MandatoryChecksMissingError now actually uses the broken rule Addresses: pytorch/test-infra#1081 Result should be something in the vein of #93243 (comment) (albeit with a different error message/summary) [ghstack-poisoned]
… dropdown" We create a new type of error `MergeError` which effectively does what `handle_exception` was doing before in creating an error, however, it lets you add extra internal input. We use this new error type in order to move the broken rule during merges to the dev infra dropdown. Nit: Also minor fix MandatoryChecksMissingError now actually uses the broken rule Addresses: pytorch/test-infra#1081 Result should be something in the vein of #93243 (comment) (albeit with a different error message/summary) [ghstack-poisoned]
We create a new type of error `MergeError` which effectively does what `handle_exception` was doing before in creating an error, however, it lets you add extra internal input. We use this new error type in order to move the broken rule during merges to the dev infra dropdown. Nit: Also minor fix MandatoryChecksMissingError now actually uses the broken rule Addresses: pytorch/test-infra#1081 Result should be something in the vein of #93243 (comment) (albeit with a different error message/summary) [ghstack-poisoned]
… dropdown" We create a new type of error `MergeError` which effectively does what `handle_exception` was doing before in creating an error, however, it lets you add extra internal input. We use this new error type in order to move the broken rule during merges to the dev infra dropdown. Nit: Also minor fix MandatoryChecksMissingError now actually uses the broken rule Addresses: pytorch/test-infra#1081 Result should be something in the vein of #93243 (comment) (albeit with a different error message/summary) [ghstack-poisoned]
We create a new type of error `MergeError` which effectively does what `handle_exception` was doing before in creating an error, however, it lets you add extra internal input. We use this new error type in order to move the broken rule during merges to the dev infra dropdown. Nit: Also minor fix MandatoryChecksMissingError now actually uses the broken rule Addresses: pytorch/test-infra#1081 Result should be something in the vein of #93243 (comment) (albeit with a different error message/summary) [ghstack-poisoned]
… dropdown" We create a new type of error `MergeError` which effectively does what `handle_exception` was doing before in creating an error, however, it lets you add extra internal input. We use this new error type in order to move the broken rule during merges to the dev infra dropdown. Nit: Also minor fix MandatoryChecksMissingError now actually uses the broken rule Addresses: pytorch/test-infra#1081 Result should be something in the vein of #93243 (comment) (albeit with a different error message/summary) [ghstack-poisoned]
We create a new type of error `MergeError` which effectively does what `handle_exception` was doing before in creating an error, however, it lets you add extra internal input. We use this new error type in order to move the broken rule during merges to the dev infra dropdown. Nit: Also minor fix MandatoryChecksMissingError now actually uses the broken rule Addresses: pytorch/test-infra#1081 Result should be something in the vein of #93243 (comment) (albeit with a different error message/summary) [ghstack-poisoned]
… dropdown" We create a new type of error `MergeError` which effectively does what `handle_exception` was doing before in creating an error, however, it lets you add extra internal input. We use this new error type in order to move the broken rule during merges to the dev infra dropdown. Nit: Also minor fix MandatoryChecksMissingError now actually uses the broken rule Addresses: pytorch/test-infra#1081 Result should be something in the vein of #93243 (comment) (albeit with a different error message/summary) [ghstack-poisoned]
We create a new type of error `MergeError` which effectively does what `handle_exception` was doing before in creating an error, however, it lets you add extra internal input. We use this new error type in order to move the broken rule during merges to the dev infra dropdown. Nit: Also minor fix MandatoryChecksMissingError now actually uses the broken rule Addresses: pytorch/test-infra#1081 Result should be something in the vein of #93243 (comment) (albeit with a different error message/summary) [ghstack-poisoned]
As found in #92709, thanks to @ngimel and @jansel, currently
torch.Tensor.fnpoints toUserDefinedObjectVariablerather thanTorchVariable. The root cause is due to #92709 (review). To prevent this, buildTorchVariableoftorch.Tensor.fnpointing totorch.ops.aten.fn.This issue propagates to
torch.Tensor.fncausing graph break withnopython=True.cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire