Author: | Hameer Abbasi <habbasi@quansight.com> |
---|---|
Author: | Ralf Gommers <rgommers@quansight.com> |
Status: | Draft |
Type: | Standards Track |
Created: | 2019-08-22 |
This NEP proposes to make all of NumPy's public API overridable via a backend
mechanism, using a library called uarray
[1]
uarray
provides global and context-local overrides, as well as a dispatch
mechanism similar to NEP-18 [2]. First experiences with __array_function__
show that it is necessary to be able to override NumPy functions that
do not take an array-like argument, and hence aren't overridable via
__array_function__
. The most pressing need is array creation and coercion
functions - see e.g. NEP-30 [9].
This NEP proposes to allow, in an opt-in fashion, overriding any part of the NumPy API. It is intended as a comprehensive resolution to NEP-22 [3], and obviates the need to add an ever-growing list of new protocols for each new type of function or object that needs to become overridable.
The motivation behind uarray
is manyfold: First, there have been several attempts to allow
dispatch of parts of the NumPy API, including (most prominently), the __array_ufunc__
protocol
in NEP-13 [4], and the __array_function__
protocol in NEP-18 [2], but this has shown the
need for further protocols to be developed, including a protocol for coercion (see [5]). The reasons
these overrides are needed have been extensively discussed in the references, and this NEP will not
attempt to go into the details of why these are needed. Another pain point requiring yet another
protocol is the duck-array protocol (see [9]).
This NEP takes a more holistic approach: It assumes that there are parts of the API that need to be overridable, and that these will grow over time. It provides a general framework and a mechanism to avoid a design of a new protocol each time this is required.
The second is to ease the creation of new duck-arrays, by providing default implementations of many functions that can be easily expressed in terms of others, as well as a repository of utility functions that help in the implementation of duck-arrays that most duck-arrays would require.
The third is the existence of actual, third party dtype packages, and their desire to blend into the NumPy ecosystem (see [6]). This is a separate issue compared to the C-level dtype redesign proposed in [7], it's about allowing third-party dtype implementations to work with NumPy, much like third-party array implementations.
This NEP proposes the following: That unumpy
[8] becomes the recommended override mechanism
for the parts of the NumPy API not yet covered by __array_function__
or __array_ufunc__
,
and that uarray
is vendored into a new namespace within NumPy to give users and downstream dependencies
access to these overrides. This vendoring mechanism is similar to what SciPy decided to do for
making scipy.fft
overridable (see [10]).
_Note that this section will not attempt to explain the specifics or the mechanism of uarray
,_
_that is explained in the uarray
documentation. [1] However, the NumPy community_
_will have input into the design of uarray
, and any backward-incompatible changes_
_will be discussed on the mailing list._
The way we propose the overrides will be used by end users is:
from numpy import unumpy TODO
And a library that implements a NumPy-like API will use it like:
TODO: example corresponding to NEP 30 `duckarray`
The only change this NEP proposes at its acceptance, is to make unumpy
the officially recommended
way to override NumPy. unumpy
will remain a separate repository/package (which we propose to vendor
to avoid a hard dependency, and use the separate unumpy
package only if it is installed)
rather than depend on for the time being), and will be developed
primarily with the input of duck-array authors and secondarily, custom dtype authors, via the usual
GitHub workflow. There are a few reasons for this:
- Faster iteration in the case of bugs or issues.
- Faster design changes, in the case of needed functionality.
unumpy
will work with older versions of NumPy as well.- The user and library author opt-in to the override process,
rather than breakages happening when it is least expected.
In simple terms, bugs in
unumpy
mean thatnumpy
remains unaffected.
FIXME: this section doesn't match the proposal. in the abstract and motivation anymore.
Once maturity is achieved, unumpy
be moved into the NumPy organization,
and NumPy will become the reference implementation for unumpy
.
unumpy
offers a number of advantanges over the approach of defining a new protocol for every
problem encountered: Whenever there is something requiring an override, unumpy
will be able to
offer a unified API with very minor changes. For example:
ufunc
objects can be overridden via their__call__
,reduce
and other methods.dtype
objects can be overridden via the dispatch/backend mechanism, going as far as to allownp.float32
et. al. to be overridden by overriding__get__
.- Other functions can be overridden in a similar fashion.
np.asduckarray
goes away, and becomesnp.array
with a backend set.- The same holds for array creation functions such as
np.zeros
,np.empty
and so on.
This also holds for the future: Making something overridable would require only minor changes to unumpy
.
Another promise unumpy
holds is one of default implementations. Default implementations can be provided for
any multimethod, in terms of others. This allows one to override a large part of the NumPy API by defining
only a small part of it.
The third and last benefit is a clear way to coerce to a given backend, and a protocol for coercing not only arrays,
but also dtype
objects and ufunc
objects with similar ones from other libraries.
- Dask: https://dask.org/
- CuPy: https://cupy.chainer.org/
- PyData/Sparse: https://sparse.pydata.org/
- Xnd: https://xnd.readthedocs.io/
- Astropy's Quantity: https://docs.astropy.org/en/stable/units/
- Dask: https://dask.org/
- scikit-learn: https://scikit-learn.org/
- Xarray: https://xarray.pydata.org/
- TensorLy: http://tensorly.org/
ndtypes
: https://ndtypes.readthedocs.io/en/latest/- Datashape: https://datashape.readthedocs.io
- Plum: https://plum-py.readthedocs.io/
The implementation of this NEP will require the following steps:
- Implementation of
uarray
multimethods corresponding to the NumPy API, including classes for overridingdtype
,ufunc
andarray
objects, in theunumpy
repository. - Moving backends from
unumpy
into the respective array libraries.
There are no backward incompatible changes proposed in this NEP.
The current alternative to this problem is NEP-30 plus adding more protocols (not yet specified) in addition to it. Even then, some parts of the NumPy API will remain non-overridable, so it's a partial alternative.
The main alternative to vendoring unumpy
is to simply move it into NumPy
completely and not distribute it as a separate package. This would also achieve
the proposed goals, however we prefer to keep it a separate package for now.
- The discussion section of NEP-18: https://numpy.org/neps/nep-0018-array-function-protocol.html#discussion
- NEP-22: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html
- Dask issue #4462: dask/dask#4462
- PR #13046: numpy#13046
- Dask issue #4883: dask/dask#4883
- Issue #13831: numpy#13831
[1] uarray, A general dispatch mechanism for Python: https://uarray.readthedocs.io
[2] NEP 18 — A dispatch mechanism for NumPy’s high level array functions: https://numpy.org/neps/nep-0018-array-function-protocol.html
[3] NEP 22 — Duck typing for NumPy arrays – high level overview: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html
[4] NEP 13 — A Mechanism for Overriding Ufuncs: https://numpy.org/neps/nep-0013-ufunc-overrides.html
[5] Reply to Adding to the non-dispatched implementation of NumPy methods: http://numpy-discussion.10968.n7.nabble.com/Adding-to-the-non-dispatched-implementation-of-NumPy-methods-tp46816p46874.html
[6] Custom Dtype/Units discussion: http://numpy-discussion.10968.n7.nabble.com/Custom-Dtype-Units-discussion-td43262.html
[7] The epic dtype cleanup plan: numpy#2899
[8] unumpy: NumPy, but implementation-independent: https://unumpy.readthedocs.io
[9] NEP 30 — Duck Typing for NumPy Arrays - Implementation: https://www.numpy.org/neps/nep-0030-duck-array-protocol.html
[10] http://scipy.github.io/devdocs/fft.html#backend-control
This document has been placed in the public domain.