ENH: Add shape to *_like() array creation #13046

pentschev · 2019-02-26T15:00:31Z

This solves #13043.

@shoyer @mrocklin @hameerabbasi

pentschev · 2019-02-26T15:02:40Z

Please note I've optimistically added the versionadded tag to the documentation as version 1.17. :)

doc/source/reference/c-api.array.rst

rgommers · 2019-02-26T18:03:22Z

adding shape=None to *_like seems reasonable to me, +1

This change maintains backwards compatibility, rather than passing new arguments to PyArray_NewLikeArray().

pentschev · 2019-02-26T20:17:15Z

Alright, I think with the changes I just pushed, we're not breaking the API. Should I also update cversions.txt and setup_common.py, since now the API version needs to be incremented?

charris · 2019-02-27T03:22:41Z

Should I also update cversions.txt and setup_common.py

Yes.

numpy/core/src/multiarray/ctors.c

Arrays created with new shapes should not take into consideration the original array's stride, and ndim == 0 should not be a case to ignore a new shape, as the caller may request a 0d array.

* Add comments to cversions.txt (new PyArray_NewLikeArrayWithShape function) * Increment C_API_VERSION to 1.17 in setup_common.py

This change has a few benefits compared to the original proposal: * Its format is compatible with NumPy's copy of empty sliced arrays, as formats must now match * Avoid accidental overwriting of cupy_array when formats don't match * The current CUDA stream may be used implicitly This change implies on cupy_array being created beforehand. Eventually, we can benefit from numpy/numpy#13046 and this change combined.

Requirements: * numpy/numpy#13046 * cupy/cupy#2079

mrocklin · 2019-03-04T14:35:58Z

It looks like this is ready for another round of review

mattip · 2019-03-06T07:51:02Z

Do we need to expose this in the public numpy C API, or can it be done only at the python level? It would be nice to keep the C API to the minimum required, and this seems to be yet-another-wrapper around PyArray_NewFromDescr

pentschev · 2019-03-12T10:55:23Z

@mattip we need to expose the new PyArray_NewLikeArrayWithShape(), since the implementation was moved in there from PyArray_NewLikeArray(), we could now not expose the latter anymore. Is that what you mean?

mattip · 2019-03-12T11:16:28Z

We cannot currently remove already published API functions. Is there a need to expose the new function at the C-API level? Do we expect C-extension writers to use PyArray_NewLikeArrayWithShape from C or can we get by with the python-level additions at this point? I would prefer not to expand the already massive C-API with another function.

hameerabbasi · 2019-03-12T11:32:17Z

@pentschev What @mattip means is that we should simply remove the function from the header and make it internal to NumPy. That way we avoid changing the C API.

pentschev · 2019-03-12T12:30:27Z

We cannot currently remove already published API functions. Is there a need to expose the new function at the C-API level? Do we expect C-extension writers to use PyArray_NewLikeArrayWithShape from C or can we get by with the python-level additions at this point? I would prefer not to expand the already massive C-API with another function.

Not at the moment, but what is normally the C-API policy of NumPy? Should it avoid exposing new functionality if there's no known immediate use case? Sorry if this is a question that has been answered before.

pentschev · 2019-03-12T12:53:33Z

@pentschev What @mattip means is that we should simply remove the function from the header and make it internal to NumPy. That way we avoid changing the C API.

Sorry, I didn't see that comment before. I understand it, and that's doable. My previous question is a general one, should we always prefer not exposing C-API, or is there a general rule when we should choose to expose it?

charris · 2019-03-13T02:23:15Z

I don't recall that we have an explicit policy apart from not removing functions from the API, although they may be deprecated. Generally, new functions are added along with new basic functionality, particularly when the new functionality starts on the C side. The addition should also be discussed on the mailing list, and these days we might want to have an NEP as well. Needs some discussion.

pentschev · 2019-03-13T14:09:46Z

@mrocklin I could use a suggestion from you here, do you think for our use case with Dask and libraries like CuPy and Sparse, do you see us needing to pass shape to *_like() functions in the foreseeable future via the C-API? If not, maybe we can skip NumPy's C-API change.

mrocklin · 2019-03-13T15:00:53Z

From an applications perspective I do not anticipate the libraries above needing to use the C-API. If dropping that accelerates things, then I'm for dropping.

pentschev · 2019-04-16T09:36:51Z

Ok, looks like order=None means the same as order=K on the C side of things. So your suggestion is to get rid of raising a ValueError and just implying order=C when there's a change of dims and order=K, am I right @eric-wieser?

mrocklin · 2019-04-19T15:05:22Z

Checking in here. Is there anything we can do to accelerate this?

- If order='K' try to keep, otherwise, order='C' is implied - Do not raise ValueError anymore

pentschev · 2019-04-19T16:16:03Z

I pushed some changes which I believe now handle @eric-wieser's latest request.

eric-wieser · 2019-04-19T16:17:40Z

What I was hoping for is a distinction between order=none and order=k at the c level, and some feedback from other maintainers on whether that's a good idea.

pentschev · 2019-04-19T16:25:02Z

What I was hoping for is a distinction between order=none and order=k at the c level, and some feedback from other maintainers on whether that's a good idea.

I guess that would probably need a change of signature of either empty_like() (currently order=None) or all the other _like() functions (currently order='K'), which probably wouldn't be something that everyone would be keen to have done here. Plus, the change you're suggesting feels to me much broader and likely to have a potentially larger impact than what we intend with this PR.

pentschev · 2019-04-19T17:25:06Z

It seems to me that the failures are unrelated to this PR. Could anyone confirm this?

mattip · 2019-04-21T09:02:23Z

@eric-wieser could we merge this PR as-is and open an issue to fix order in empty_like, or are there implications that prevent merging this?

stefanv · 2019-04-25T05:36:24Z

This has been approved by two core contributors, and seems to address @eric-wieser's major concerns. Eric, do you want to go ahead and merge?

eric-wieser · 2019-04-25T06:15:27Z

I started writing a justification for changing the default to None, but then found that we document K as

/* An order as close to the inputs as possible */

Given this, I think we should do one of three things for when len(shape) != ndim

Try harder to be as close as possible before merging this patch
1. Interpret order='K' to mean "keep the order of out.strides[:m] compared to in_.strides[:m] where m=min(ndim, len(shape))
2. Interpret order='K' to mean "keep the order of out.strides[-m:] compared to in_.strides[-m:] where m=min(ndim, len(shape)) (consistent with broadcasting)
Accept this patch as is, declare 1. as a backward-compatible change, and schedule it at some point in future without needing any kind of deprecation cycle
Accept this patch, declare 1. as a backwards compatible and never fix it

My preference here is 1.ii, but really my goal is to ensure we don't pick 3

eric-wieser · 2019-04-25T06:33:16Z

numpy/core/src/multiarray/ctors.c

@@ -1259,14 +1267,14 @@ PyArray_NewLikeArray(PyArrayObject *prototype, NPY_ORDER order,
        for (idim = ndim-1; idim >= 0; --idim) {
            npy_intp i_perm = strideperm[idim].perm;
            strides[i_perm] = stride;
-            stride *= shape[i_perm];
+            stride *= dims[i_perm];
        }


My 1.ii proposal would look like changing everything between the new line 1261 and here to:

if (PyArray_NDIM(prototype) >= ndim) { int leading_dims = PyArray_NDIM(prototype) - ndim; /* Use only the trailing strides */ PyArray_CreateSortedStridePerm(ndim, PyArray_STRIDES(prototype) + leading_dims, strideperm); /* Build the new strides */ stride = dtype->elsize; for (idim = ndim-1; idim >= 0; --idim) { npy_intp i_perm = strideperm[idim].perm; strides[i_perm] = stride; stride *= shape[i_perm]; } } else { int leading_dims = ndim - PyArray_NDIM(prototype); /* Use all the strides */ PyArray_CreateSortedStridePerm(PyArray_NDIM(prototype), PyArray_STRIDES(prototype), strideperm); /* Build the new trailing strides */ stride = dtype->elsize; for (idim = PyArray_NDIM(prototype)-1; idim >= 0; --idim) { npy_intp i_perm = strideperm[idim].perm + leading_dims; strides[i_perm] = stride; stride *= shape[i_perm]; } /* Create remaining leading strides as C order */ for (idim = leading_dims; idim >= 0; --idim) { strides[idim] = stride; stride *= shape[idim]; } }

Or trading branching for state:

/* not sure if this is clearer via min/max */ int leading_src_dims = 0; // max(src.ndim - dst.ndim, 0) int leading_dst_dims = 0; // max(dst.ndim - src.ndim, 0) int shared_dims; // min(src.ndim, dst.ndim if (PyArray_NDIM(prototype) >= ndim) { shared_dims = ndim; leading_src_dims = PyArray_NDIM(prototype) - ndim; } else { shared_dims = PyArray_NDIM(prototype); leading_dst_dims = ndim - PyArray_NDIM(prototype); } /* Use only the trailing strides from the source */ PyArray_CreateSortedStridePerm(shared_dims, PyArray_STRIDES(prototype) + leading_src_dims, strideperm); /* Build the destrination trailing strides */ stride = dtype->elsize; for (idim = ndim-1; idim >= 0; --idim) { npy_intp i_perm = strideperm[idim].perm + leading_dst_dims; strides[i_perm] = stride; stride *= shape[i_perm]; } /* Create remaining leading strides as C order */ for (idim = leading_dst_dims; idim >= 0; --idim) { strides[idim] = stride; stride *= shape[idim]; }

eric-wieser · 2019-04-25T06:55:09Z

An example of what I'd expect from the behavior of 1ii is, I think:

# note: number of * shows stride order
a = np.zeros((2, 3, 5, 7), dtype=np.uint8, order='F')
assert a.strides == (1, 1*2, 1*2*3, 1*2*3*5)
a_t = a.transpose((2, 0, 3, 1))
assert a_t.strides == (1*2*3, 1, 1*2*3*5, 1*2)

b = np.zeros_like(a, shape=a.shape[1:], dtype=a.dtype)
assert b.strides == (1, 1*3, 1*3*5)  # note: just remove the 2 factors
b_t = np.zeros_like(a_t, shape=a_t.shape[1:], dtype=a_t.dtype)
assert b.strides == (1, 1*2*3, 1*2)  # note: just remove the 5 factors

c = np.zeros_like(a, shape=(11,) + a.shape, dtype=a.dtype)
assert c.strides == (1*2*3*5*7,) + a.strides  # C order beyond matching dimensions
c_t = np.zeros_like(a_t, shape=(11,) + a_t.shape, dtype=a_t.dtype)
assert c.strides == (1*2*3*5*7,) + a.strides

rgommers · 2019-04-25T10:11:04Z

Accept this patch as is, declare 1. as a backward-compatible change, and schedule it at some point in future without needing any kind of deprecation cycle

I'm fine with this - order='K' basically has always meant "we try our best, and in practice it just means you get C or F, that's all".

These very detailed considerations on strides have not been considered before AFAIK in relation to the order keyword, and if external code really depends on it then I'd consider them to rely on implementation details.

I would recommend merging this right now, and approving a future change without deprecation cycle. Requiring changes almost ad infinitum before merging is potentially discouraging for contributors, especially if there's back and forth between maintainers who don't agree on small details.

eric-wieser · 2019-04-25T15:36:57Z

we try our best, and in practice it just means you get C or F, that's all"

I think we've actually managed to do a good job at doing better than picking between C/F

if external code really depends on it

To be clear, I'm suggesting a performance dependency, not a behavior dependency.

I would recommend merging this right now, and approving a future change without deprecation cycle.

Sounds good to me

seberg · 2019-04-25T16:12:42Z

Great, thanks guys. Yes, merging as is good, I would have been fine with the error path for now even. Should we discuss about where to go?

I would be on board for the 1.ii. choice (undefined part is C-order, except when all is F-order already I suppose). There is maybe some question what to do if the array is almost F – i.e. F-order, but not contiguous). It seems the closest we will get to "keeping" some layout. In general, frankly, I doubt it will be used much (with someone caring about the result) anyway though.

pentschev · 2019-04-25T21:11:03Z

Thanks so much everyone for the reviews!

FWIW, I've tried testing the code you suggested "as-is" @eric-wieser, it didn't seem to work for all cases. I admit I don't really understand all this complex striding though, so maybe I did something wrong. However, feel free to ping me if I can help with any of the fixes.

rgommers · 2019-04-25T21:12:53Z

Thanks for the contribution @pentschev!

pentschev added 3 commits February 26, 2019 14:00

ENH: Added shape argument to *_like() array creation functions

f885c08

ENH: C backend adjustments for shape argument on *_like()

5ad61b6

TST: Added test for shape argument in *_like() functions

fd3e270

pentschev mentioned this pull request Feb 26, 2019

Allow creation of *_like() arrays of arbitrary shapes #13043

Closed

rgommers reviewed Feb 26, 2019

View reviewed changes

doc/source/reference/c-api.array.rst Outdated Show resolved Hide resolved

seberg added 01 - Enhancement component: numpy._core labels Feb 26, 2019

ENH: Added PyArray_NewLikeArrayWithShape()

b7202a7

This change maintains backwards compatibility, rather than passing new arguments to PyArray_NewLikeArray().

eric-wieser reviewed Feb 27, 2019

View reviewed changes

numpy/core/src/multiarray/ctors.c Show resolved Hide resolved

numpy/core/src/multiarray/ctors.c Outdated Show resolved Hide resolved

pentschev added 2 commits February 27, 2019 10:25

BUG: Fix for PyArray_NewLikeArrayWithShape strides and ndim == 0

013cbce

Arrays created with new shapes should not take into consideration the original array's stride, and ndim == 0 should not be a case to ignore a new shape, as the caller may request a 0d array.

REL: Updates for C-API, version 1.17.x

807f512

* Add comments to cversions.txt (new PyArray_NewLikeArrayWithShape function) * Increment C_API_VERSION to 1.17 in setup_common.py

charris changed the title ~~Add shape to *_like() array creation~~ ENH: Add shape to *_like() array creation Feb 28, 2019

pentschev added a commit to pentschev/dask-glm that referenced this pull request Mar 4, 2019

Adjust all algorithms to work with CuPy

e39aafd

Requirements: * numpy/numpy#13046 * cupy/cupy#2079

This was referenced Mar 4, 2019

[WIP] Adjust all algorithms to work with CuPy dask/dask-glm#75

Open

Allow copying in the format cupy_array[:] = numpy_array cupy/cupy#2079

Merged

rgommers added this to the 1.17.0 release milestone Apr 17, 2019

pentschev added 2 commits April 19, 2019 18:11

Change handling of order= for *_like()

f53afbe

- If order='K' try to keep, otherwise, order='C' is implied - Do not raise ValueError anymore

Merge branch 'master' into add-shape-to-like-array-creation

c3ac08e

Fix *_like() tests

695b836

pentschev mentioned this pull request Apr 24, 2019

NEP-18 Issue Tracking dask/dask#4731

Closed

19 tasks

eric-wieser self-requested a review April 25, 2019 05:44

eric-wieser reviewed Apr 25, 2019

View reviewed changes

eric-wieser merged commit 0c3d18f into numpy:master Apr 25, 2019

pentschev mentioned this pull request May 6, 2019

Support for shape argument in *_like functions cupy/cupy#2171

Merged

pentschev mentioned this pull request May 23, 2019

Leveraging NEP 18 for image processing and signal processing scipy/scipy#10204

Open

This was referenced Jun 3, 2019

Implement shape argument in *_like functions dask/dask#4875

Closed

Array creation and __array_function__ dask/dask#4883

Closed

jakirkham mentioned this pull request Aug 14, 2019

Rebuild Numba 0.45.1 packages with NumPy 1.17 numba/numba#4450

Closed

eric-wieser mentioned this pull request Mar 31, 2020

empty_like does not respect the shape argument if it is () #15880

Closed

jakirkham mentioned this pull request Aug 27, 2020

GPU Support dask/dask-image#133

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add shape to *_like() array creation #13046

ENH: Add shape to *_like() array creation #13046

pentschev commented Feb 26, 2019

pentschev commented Feb 26, 2019

rgommers commented Feb 26, 2019

pentschev commented Feb 26, 2019

charris commented Feb 27, 2019

mrocklin commented Mar 4, 2019

mattip commented Mar 6, 2019

pentschev commented Mar 12, 2019

mattip commented Mar 12, 2019

hameerabbasi commented Mar 12, 2019

pentschev commented Mar 12, 2019

pentschev commented Mar 12, 2019

charris commented Mar 13, 2019

pentschev commented Mar 13, 2019

mrocklin commented Mar 13, 2019

pentschev commented Apr 16, 2019

mrocklin commented Apr 19, 2019

pentschev commented Apr 19, 2019

eric-wieser commented Apr 19, 2019

pentschev commented Apr 19, 2019

pentschev commented Apr 19, 2019

mattip commented Apr 21, 2019

stefanv commented Apr 25, 2019 •

edited

eric-wieser commented Apr 25, 2019

eric-wieser Apr 25, 2019 •

edited

eric-wieser Apr 25, 2019 •

edited

eric-wieser commented Apr 25, 2019

rgommers commented Apr 25, 2019

eric-wieser commented Apr 25, 2019

seberg commented Apr 25, 2019

pentschev commented Apr 25, 2019

rgommers commented Apr 25, 2019

ENH: Add shape to *_like() array creation #13046

ENH: Add shape to *_like() array creation #13046

Conversation

pentschev commented Feb 26, 2019

pentschev commented Feb 26, 2019

rgommers commented Feb 26, 2019

pentschev commented Feb 26, 2019

charris commented Feb 27, 2019

mrocklin commented Mar 4, 2019

mattip commented Mar 6, 2019

pentschev commented Mar 12, 2019

mattip commented Mar 12, 2019

hameerabbasi commented Mar 12, 2019

pentschev commented Mar 12, 2019

pentschev commented Mar 12, 2019

charris commented Mar 13, 2019

pentschev commented Mar 13, 2019

mrocklin commented Mar 13, 2019

pentschev commented Apr 16, 2019

mrocklin commented Apr 19, 2019

pentschev commented Apr 19, 2019

eric-wieser commented Apr 19, 2019

pentschev commented Apr 19, 2019

pentschev commented Apr 19, 2019

mattip commented Apr 21, 2019

stefanv commented Apr 25, 2019 • edited

eric-wieser commented Apr 25, 2019

eric-wieser Apr 25, 2019 • edited

Choose a reason for hiding this comment

eric-wieser Apr 25, 2019 • edited

Choose a reason for hiding this comment

eric-wieser commented Apr 25, 2019

rgommers commented Apr 25, 2019

eric-wieser commented Apr 25, 2019

seberg commented Apr 25, 2019

pentschev commented Apr 25, 2019

rgommers commented Apr 25, 2019

stefanv commented Apr 25, 2019 •

edited

eric-wieser Apr 25, 2019 •

edited

eric-wieser Apr 25, 2019 •

edited