-
Notifications
You must be signed in to change notification settings - Fork 21.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calling torch.ops.aten.add_
is ludicrously slow
#74943
Comments
From @albanD
Hmm.. perhaps the issue here is that we try to parse_args for the inputs, it throws an error that prints out the tensor, and then gets caught? |
Are the overloaded variants ( If the performance does matter then we should consider overhauling the overload resolution mechanism (or having new python bindings into torch.ops.aten) because this is not the first time it has caused us problems (remember TorchScript registering extra overloads that caused incorrect behavior?) |
I had the same question as @zou3519 -- do we actually need to use these bindings for any non-TorchScript use case? |
From @ngimel comments offline: In 2 places in pybind utils we are getting a string representation of unlucky tensor that cannot be shoehorned into schema, with all the slicing that's required for that, this diff kinda fixes it
|
I don't think we need these anymore now that torch_dispatch moved to use the overload specific functions. |
@gchanan Yeah... it's possible we can just always use the overloaded versions. It is a little bit finicky though, right now, since we can't torchscript the overloads. Also, if we go down this path, we should probably just make the non-overloaded version not callable? |
Making non-overloaded version non-callable would be very bc-breaking? But luckily not many users are using torch.ops |
torch_dispatch is now using the direct overloads and so doesn't have this issue. |
Resolved now - if anything, the |
@Chillee ho is it? How? |
馃悰 Describe the bug
Calling
torch.ops.aten.add_
, whether on CPU or CUDA, is orders of magnitude slower than callingTensor.add_
.results in
Getting a profile reveals that the tensor appears to be getting printed somehow
cc: @jansel, @anijain2305, @albanD , @Anja
Versions
N/A
cc @albanD @zou3519
The text was updated successfully, but these errors were encountered: