New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyTorch with numpy syntax? #2228
Comments
this is something that'd be useful, but we cant commit yet to fully supporting such a package. We're first sorting out not having differences between @petered do you have a more concrete design proposal? do you see yourself using it just for Tensor, or also for Variable? |
So I'm a bit new to PyTorch and haven't fully wrapped my head around why Variable and Tensor are two different things (from the user's perspective anyway), though I see why the concepts should be separated in code.
I would want this to work with autograd, if that's what you mean, so Variable. More concretely, it would be great to work towards a
|
This is, IMHO, the biggest stopper in teaching (otherwise close to perfect) pytorch as a first deep learning framework. |
Really looking forward to the day pytorch will match numpy API. +1 |
As mentioned in pytorch/tutorials#197 |
Note that NumPy changes planned in NEP-18 could be helpful here: http://www.numpy.org/neps/nep-0018-array-function-protocol.html |
Is there still a plan to provide numpy-like api? It's really annoying to write two separate functions for numpy and pytorch. |
I think there is a plan, @umanwizard might be working on it. |
I opened a gist with some thoughts on this, which we have been discussing. But it seems gist is not the best format for open discussion, so I am reposting it here. @gchanan , can you paste your comments from the gist into this issue? |
SummaryThe goal of this project is to create an alternative Python API for PyTorch which is as similar as possible to NumPy, while retaining distinguishing features like simple GPU support and gradient computation, as illustrated below. Basic Example: >>> import torch.numpy as np
>>> a = np.arange(10)
>>> a.sum()
tensor(45) Goals
Data ModelThe torch.numpy.ndarray class is implemented similarly to
Tensors and ndarrays can be freely converted one to another: import torch, torch.numpy as np
>>> a = torch.randn(10)._np_compat()
>>> b = torch.randn(10)._np_compat()
>>> c = np.hypot(a, b)
>>> type(c)
<class 'torch.numpy.ndarray'>
>>> t = c._torch()
>>> type(t)
<class 'torch.Tensor'> Binding GenerationCode generation logic has been extended to allow NumPy API bindings to be defined in native_functions.yaml similarly to any other native function bindings. For example: - func: sum(np.ndarray a) -> np.ndarray
variants: function, method
np_compat: True causes a new signature to be added to the argument parser in the generated binding function In order to distinguish between the two cases, we make the generated binding function accept a template parameter Other than the bindings, this declaration of Argument parsing and translationThe argument parsing logic is extended to support the new compatibility mode. Parsers are now initialized with two separate lists of signatures: one for the traditional API and one for the new one. When invoked in the old mode, the new-API signatures are ignored, and everything works the same as always. When invoked in the new compatibility mode, the argument parsing works in two steps. First, the arguments are parsed against the compatiblity signatures. If a match is found, the argument names are translated into PyTorch equivalents (e.g., A set of common argument name translations (for now: - func: ones(int[1] shape, np.dtype dtype=float) -> np.ndarray
variants: function
np_compat: True
additional_translations:
shape: size Adding new functionsObviously, if a function is supported by NumPy and not by PyTorch, we need to actually implement it, not just rely on argument translation magic. The most straightfoward way to do this is to create a PyTorch binding, mark it as hidden, and then define a NumPy compatibility binding depending on it. For example: - func: hypot(Tensor input, Tensor b) -> Tensor
variants: function
hidden: True
dispatch:
CPU: hypot
CUDA: hypot
- func: hypot(np.ndarray a, np.ndarray b) -> np.ndarray
variants: function
np_compat: True The required underlying function CUDA supportA >>> import torch.numpy as np
>>> cpu = np.arange(10)
>>> cpu.sum()
tensor(45)
>>> gpu = cpu._cuda()
>>> gpu.sum()
tensor(45, device='cuda:0') Differentiation supportNot yet implemented. Since a However, currently, much of the logic for Keeping with the convention of prefixing API extensions that don't exist in NumPy with underscores, this functionality would be accessed via functions like NumPy concepts not existing in PyTorchNone of these are implemented yet in my proof of concept dtypesNumPy supports a very rich notion of dtype allowing complex structures, whereas PyTorch tensors are made of scalars: float, double, and so on. Unless we decide that it's worth making a fundamental refactor of PyTorch in order to support this, it is out of scope. Type promotionSome work has already been done on designing and implementing NumPy-like type promotion in PyTorch: #5795 and #9515. Now that we are implementing this NumPy compatibility layer, the importance of that project increases. Implementing this type promotion feature would involve:
Multiple packagesNumPy functionality is spread throughout many different packages, to a much greater extent than PyTorch. For example, while PyTorch has We can specify these with an option in the YAML defining the binding: - func: randn(int[] size)
np_compat: True
package: 'random' Implementing this is straightforward: we will define a set of different packages ( Common NumPy parametersNumPy ufuncs have a few parameters with no real PyTorch equivalent.
|
Just to provide some context here, the concern was 1) having multiple APIs that "live forever" and 2) downstream components have to worry about the tensor mode, which doesn't seem necessary.
Doesn't this example already work? (besides the torch.numpy part). Maybe you can use an example that doesn't work today?
are you suggesting you'll be able to have interoperate numpy arrays and pytorch tensors without having to do conversion?
What's the rationale behind this?
It would be nice to rewrite this section a bit in the "new world" where there is only a single tensor type.
does this still apply if we only have 1 tensor type? What changes?
If we force the user to convert to pytorch tensors (and don't support arbitrary interop), don't we get differentiation for free?
If we don't have a separate tensor type, what changes here?
we already have a concept for this, python_module.
Do all of the numpy functions that we want to implement that have an order parameter have an out variant? Also CC @VitalyFedyunin who is working on order now. |
Better example showing param translation: >>> import torch.numpy as np
>>> arr = np.arange(10)
>>> np.sum(a=arr, axis=0)
tensor(45)
No, I was suggesting we would have functions that let you convert between the types for "free" (i.e., with only some overhead of unwrapping/re-wrapping Python objects, no copying data) But, this is irrelevant if we decide to go with only one tensor type.
Making it obvious at a glance that the code will not work in stock Numpy.
Yes, it will become significantly simpler.
We will need to think about the exact API here (in particular, whether it makes sense to do it the "numpy way" by default, especially in cases where it would be much less performant.) I think we definitely need to provide user control of this somehow, as I can easily imagine anything we choose causing problems for some subset of users.
great, thanks for the pointer
All numpy "Universal functions" have both order and out parameters (and a variety of other common parameters): https://docs.scipy.org/doc/numpy/reference/ufuncs.html#optional-keyword-arguments |
How difficult would it be to get the list of methods on ndarray which we literally cannot support without introducing an ndarray type? I feel that this would help people make a more informed decision about the cost of an ndarray type. |
@umanwizard This project sounds great... how can we try it? |
@peteroconnor-bc We made some changes to improve NP compatibility, but are still far from full compatibility. Sorry, but I am not working at Facebook anymore so I'm not sure what the current status is. @gchanan or @jeffreyksmithjr can let you know whether work is still going on to improve NumPy compatibility. |
We don't have anyone whose primary focus is on numpy compatibility, but we gladly accept PRs in this area :). |
While numpy-equivalent functions are gradually resolved in the pull requests, could someone point to a third-party package, if any, that allows users to select the backend (either numpy or pytorch) for numpy-like API? |
@dizcza Not exactly what you are asking for but take a look at https://eagerpy.jonasrauber.de/ |
Since this issue is older, and our strategy for NumPy Compatibility has been greatly expanded and altered, I'd like to close it in favor of more modern issues. See #50344, in particular, which includes links to more detailed issues to capture all levels of suggestions for how to improve PyTorch's NumPy Compatibility. I encourage everyone interested in this issue to please review that issue and its sub-issues and add their thoughts. |
Hi all,
So there're some great things about PyTorch, but one of the not-great things is that it uses a mostly-but-not-fully different API than the one used for numpy, theano, and tensorflow. I find myself having to consult a lookup-table when I want to run familiar commands.
Are there any plans to make a numpytorch, where the API matches numpy's as closely as possible?
An advantage would be that many functions written for numpy could also be used by pytorch. It would also be nice to be able to call matplotlib functions (plot, imshow, etc) directly with torch variables without having to do any hacking of matplotlib.
cc @ezyang @gchanan @zou3519 @mruberry @rgommers @heitorschueroff
The text was updated successfully, but these errors were encountered: