-
Notifications
You must be signed in to change notification settings - Fork 21.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[discussion] Have PyTorch functions support python scalars (like NumPy) + introduce convenience constants like torch.pi
and torch.e
and maybe analogue of scipy.constants
namespace
#110636
Comments
The idea sounds interesting to me, but I think without wrapped by Tensor the AD and other tracers would not work in this case? |
There is some precedent for this, but we diverge fairly significantly from Numpy's behavior on operators that do support python scalars. For example:
I don't remember if the numpy devs consider numpy scalars to be a mistake. It'd be a pretty big modeling change for PyTorch though, it doesn't feel like we're likely to do it. |
Yes, we do consider them a mistake. 0-D arrays/tensors have turned out to work just fine in PyTorch, CuPy, JAX & co., and avoid a ton of complexity that come with numpy's array scalars (which are instances of dtypes). |
This is indeed a good idea, and very easy to do. +1 for adding |
For PyTorch, my main motivation was more polymorphic code to avoid multiple distinct, but very similar code paths for dealing with python scalars and tensors (examples are found in the optimizers code mainly) |
Well, if you're OK with the operators promoting their results into 0d tensors, I don't think it's the worst to extend what ops take python scalars (in fact, we are quite a bit better about this in PrimTorch refs), though it can be pretty annoying to teach our codegen to do it. |
I think, in the optimizers' code specifically, these multiple code paths are done because Python scalars processing are now faster than CPU 0d tensors. So promotion on 0d tensors should be fine, but ideally it should allow to remove these duplicate code paths from the optimizers... |
yeah, makes sense. Pretty hard to fix eager mode 0d perf :( |
But it's also a bit surprising that this overhead is noticeable (given that much larger tensors are usually being manipulated in the optimizers, so these scalar norm calculations are only a small fraction) |
What do you think of the proposal in this issue, @janeyx99 ? |
So maybe in eager mode torch.sqrt and some other functions can produce Python scalars when given Python scalars (I see, this would be divergent from NumPy) - as a hack. And maybe some time in the future, they can be replaced to produce 0d tensors if it's found not very badly affecting perf |
torch.pi
and torch.e
I'd oppose having torch.sqrt return Python scalar, because torch.add(2, 3) doesn't return a Python scalar. |
@vadimkantorov The reason Python scalar math is so much faster is because we don't need to dispatch into kernels. If we are trying to move everything to be torch ops, I feel like the perf hit still remains, and it would still be valuable keeping hyperparams as strict Python scalars. I'm not sure I see how the proposal would be better even if we have torch ops taking and returning scalars, because the kernel dispatch would still exist, right? In fact, we've been migrating to the (not quite) "opposite" approach as more people enroll into torch.compile() or play with CUDA graphs--we want to accept more ScalarTensors instead of just Scalars for many of our foreach ops to enable dynamism. I agree the branching in our code adds confusion, but the tradeoff of perf is too strong. |
Regarding the scalars, it just feels strange that this dispatch overhead is noticeable in optimizers / overall perf given that a lot of number-crunching happens in optimizers :) Regarding constants, maybe just worth duplicating in PyTorch in scipy.constants namespace (+ constants from math python's module): https://docs.scipy.org/doc/scipy/reference/constants.html I again don't know if PyTorch should just include these as python numbers or as some collection of scalar pytorch tensors allocated/cached on all devices lazily :) |
torch.pi
and torch.e
torch.pi
and torch.e
and maybe analogue of scipy.constants
namespace
…5026) Fixes #65307 For consistency with Python Array API (https://data-apis.org/array-api/latest/API_specification/constants.html) and NumPy (https://numpy.org/devdocs/reference/constants.html), I added `torch.newaxis = None`. Note that the consistency is directly mentioned also in the `__init__.py`, right above the added export. The `torch.newaxis` is also mentioned in #110636. Pull Request resolved: #125026 Approved by: https://github.com/lezcano
…orch#125026) Fixes pytorch#65307 For consistency with Python Array API (https://data-apis.org/array-api/latest/API_specification/constants.html) and NumPy (https://numpy.org/devdocs/reference/constants.html), I added `torch.newaxis = None`. Note that the consistency is directly mentioned also in the `__init__.py`, right above the added export. The `torch.newaxis` is also mentioned in pytorch#110636. Pull Request resolved: pytorch#125026 Approved by: https://github.com/lezcano
A status update:
This was all done. The first four are available in 2.3.0, and
+1, it would yield easier to read (and write) code. This hasn't moved much recently. With 2.3.0: >>> t = torch.tensor([2, 3])
>>> torch.add(t, 1)
tensor([3, 4])
>>> torch.atan2(t, 1)
...
TypeError: atan2(): argument 'other' (position 2) must be Tensor, not int
>>> torch.maximum(t, 1)
...
TypeError: maximum(): argument 'other' (position 2) must be Tensor, not int The ergonomic benefits are useful - to do this in user code in a generic way yields things like: torch.maximum(t, torch.tensor(1, dtype=t.dtype, device=t.device)) which is quite verbose. |
@rgommers What do you think about adding sth like scipy.constants to torch? Was it actually useful/used by scipy users? |
|
🚀 The feature, motivation and pitch
OP: #110351 (comment)
As an example, it would be nice to have torch.sqrt(python scalar) -> python scalar without having to dispatch between torch.sqrt and math.sqrt and to enable a bit more polymorphic code
Another idea is to also have
torch.pi
and other constants (like NumPy), in order to avoid importing numpy or math in order to get these constants.Please close this if it's duplicate. I tried to search for similar previous discussions, but the keywords are a bit too generic :(
For torch.sqrt specifically, the polymorphic equivalent currently exists:
x ** 0.5
which works both for tensor and python scalar inputs. But it is useful to have this behavior for many (at least simple) functions like torch.exp and so forthAlternatives
No response
Additional context
No response
cc @mruberry @rgommers @albanD
The text was updated successfully, but these errors were encountered: