-
Notifications
You must be signed in to change notification settings - Fork 21.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better Numpy API (interoperability between ML frameworks) #94779
Comments
Another NumPy compat issue on supporting string dtypes: #40568 (comment) (useful for porting existing numpy code) And existing issue for append: #64359 |
cc @jisaacso |
Hi @Conchylicultor, thank you for the detailed write-up! The goal of adding There's two separate parts here: the big-picture "PyTorch compatibility with NumPy", and the presence and behavior of individual objects in the API. For the big-picture part:
+1
These are a mix of things that should be added to PyTorch, things that would be nice to add/change but may be challenging because of backwards compatibility ( |
Some common methods (present in
This one I'm not 100% sure about either way. We left naming the array object out of the array API standard on purpose, because there's a bunch of different names floating around.
Yes, this is a gap. I'll note that the array API standard has it as an
Could be added indeed. Not high-prio I think, because
This would be nice to add as an alias indeed, and easy to do. I'll note that there's quite a few functions like these in numpy like Behavior:
There doesn't seem to be much engagement on gh-40568. It could be done, but it's kind of weird of course. Having the same names should be enough, just like for other objects (like regular functions) in the namespaces. The intended pattern here is
Similar - could be added, and would be fairly pragmatic, but also technically incorrect. With numpy dtypes you can for example create scalars ( After a lot of discussion, we settled on isdtype in the array API standard. So comparisons like you want here will be
This is a fairly annoying UX papercut in PyTorch indeed. It'd be great if all integer tensors could be used for indexing.
This one I've wanted to see fixed for a long time; it's not high-prio as well as bc-breaking unfortunately. The
Use it as a positional keyword to avoid this - that's more idiomatic anyway, and the array API standard makes these arguments positional-only so this will not be an issue. Related: PyTorch using Casting:
Would be nice to fix, but is a minor papercut. I'll also note that the array API standard mandates only floating point dtype support for
This would be good to fix up indeed. Other differences (but not critical to fix):
That does work for operators. I'm not sure it can be improved upon without accepting numpy arrays to every PyTorch function (which I don't think is a good idea).
That seems useful indeed; I think it works in many places, but probably not 100% consistently.
This should be made to work via every array/tensor object supporting |
Thanks for the answer.
This feels strange to me Is there a place to report such issues ?
Unfortunately it doesn't work with xp.zeros((), dtype=xp.bool_) # torch has no attribute `torch.bool_` It looks like the numpy API will have
Overall, I was able to fix those issues by adding a My feeling is that multi-framework support today is way more complicated than it should. Because of all those small issues, it actually require quite a lot of effort to make the code compatible with torch/TF and often require custom wrapper/helper. |
Yes, the issue tracker of the standard. Here is the relevant PR for reference: data-apis/array-api#290.
Yes, indeed - NumPy really needs to reinstate
I expect that we'll add
Yes, you're completely right. That's why we've spent so much effort on standardization; that now needs to be rolled out in the main namespaces. Some issues are going to remain though, due to backwards compatibility concerns. For PyTorch the most prominent one is |
Context
For better interoperability between ML frameworks, it would be great if
torch
API matched more closelynumpy
API (liketf.experimental.numpy
andjax.numpy
).This is a highly requested features, like: #2228 (~100 upvotes), #50344, #38349,... Those issues have since been closed even though there's still many issues remaining.
This is even more relevant with the numpy API standard (NEP 47): The goal is to write functions once, then reuse them across frameworks:
Our team has multiple universal libraries which support both
numpy
,jax
andTF
(likedataclass_array
orvisu3d
).We've been experimenting adding
torch
support recently but encountered quite a lot of issues (whiletf.numpy
,jax
worked (mostly) out of the box). Here are all the issues we're encountered:numpy API issues
Some common methods (present in
np
,jnp
,tf.numpy
) are missing fromtorch
:torch.array
: likex = xnp.array([1, 2, 3])
) (alias oftorch.tensor
)torch.ndarray
: likeisinstance(x, xnp.ndarray)
(alias oftorch.Tensor
)Tensor.astype
: likex = x.astype(np.uint8)
torch.append
([feature request]numpy.append
/numpy.insrt
/numpy.delete
equivalents and implement dynamic arrays (reallocate storage with a surplus) #64359): https://numpy.org/doc/stable/reference/generated/numpy.append.htmltorch.expand_dims
: https://numpy.org/doc/stable/reference/generated/numpy.expand_dims.html (alias oftorch.unsqueeze
)torch.around
: https://numpy.org/doc/stable/reference/generated/numpy.around.htmltorch.concatenate
: https://numpy.org/doc/stable/reference/generated/numpy.concatenate.htmlBehavior:
np.dtype
everywheretorch.dtype
is valid (e.g.torch.ones((), dtype=np.int32)
) (See: Converting NumPy dtype to Torch dtype when usingas_tensor
#40568).torch.dtype
should be comparable withnp.dtype
:tf.int32 == np.int32
buttorch.int32 != np.int32
. This allow to have agnostic comparison:x.dtype == np.uint8
working for all frameworks.x[y]
fail fory.dtype == int32
(raiseIndexError: tensors used as indices must be long, byte or bool tensors
) but works in TF, Jax, Numpy (Inconsistency between index_select and __get_item__ #83702).axis=()
(like inx.mean(axis=())
) return a scalar fortorch
but is a no-op fornp
,jnp
,tf
(besides returning float for int array, see bellow). (torch.sum(tensor, dim=()) is different from np.sum(arr, axis=()) #29137)torch.ones(shape=())
fail (expectsize=
), butxnp.ones(shape=())
works (in other frameworks). Same fortorch.zeros
,...Casting:
x.mean()
currently require explicit dtype whenx.dtype == int32
(x.mean(float32)
). Other frameworks default to the default float type (float32
)torch.allclose(x, y)
currently fail ifx.dtype != y.dtype
, which is inconsistent withnp
,jnp
,tf
(this is very convenient in testsnp.allclose(x, [1, 2, 3])
)Other differences (but not critical to fix):
torch
andnp.array
fail (Mixing Numpy's arrays and PyTorch tensors #46829). Both TF and Jax supporttf.Tensor + np.array
.torch.asarray
was supportingjax
,tf
tensors (and vice-versa).Testing and experimenting
Those issues have been found in real production code. Our projects have a
@enp.testing.parametrize_xnp
to run the same unittests ontf
,jax
,numpy
,torch
to make sure our code works on all backends...For example: https://github.com/google-research/visu3d/blob/89d2a6c9cb3dee1d63a2f5a8416272beb266510d/visu3d/math/rotation_utils_test.py#L29
In order to make our tests pass with
torch
, we had to mocktorch
to fix some of those behaviors: https://github.com/google/etils/blob/main/etils/enp/torch_mock.pyHaving a universal standard API that all ML frameworks apply would be a great step towards. I hope this issue is a small step to help toward this goal.
cc @mruberry @rgommers
The text was updated successfully, but these errors were encountered: