Skip to content

Loading…

don't copy data when changing shape #145

Closed
wants to merge 7 commits into from

5 participants

@tecki

When changing the shape of an array, by writing, say,

a.shape = 10,
an AttributeError is raised if the new shape is incompatible
with the old one. This is the intended behaviour and no
problem.

Under the hood, however, what's happening is that
in this case the array is copied as it would be
if one had written

a = a.reshape((10,))
and once it is realized that this doesn't work, the
result is discarded again, and the said AttributeError
is raised.

I discovered that while working with humongous
matrices, which nearly fill my memory. I used
the a.shape=something method because it
was supposed to be a memory no-op. Unfortunately
it isn't, as numpy tried and failed to copy the array
and raised a MemoryError instead of an AttributeError.

I consider this all a bug, not a feature, so I fixed it.
With this pull request, no data is ever copied when
changing the shape of an array.

tecki added some commits
@tecki tecki Use PyArray_Newshape instead of PyArray_Reshape
as per the comment in shape.c, the use of PyArray_Reshape is not
recommended. Change set_shape accordingly in getset.c.
96c3bb2
@tecki tecki Do not copy array if shape is set
If a shape of an array is set, but the shape is incompatible,
the array is copied, and the copy immediately discarded.

This commit changes this behaviour so that no copy is ever
generated.
40df395
@tecki tecki avoid doubling code
Move code of PyArray_Newshape and PyArray_Newshape_Nocopy into
one function.
82b3c0d
@charris charris commented on an outdated diff
numpy/core/src/multiarray/getset.c
((5 lines not shown))
/* Assumes C-order */
- ret = (PyArrayObject *)PyArray_Reshape(self, val);
- if (ret == NULL) {
- return -1;
+ if (!PyArray_IntpConverter(val, &newdims)) {
+ return -1;
@charris NumPy member
charris added a note

Indentation. Are you using tabs by any chance? Wrong indent setting?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@charris charris commented on an outdated diff
numpy/core/src/multiarray/shape.c
@@ -255,6 +245,12 @@ PyArray_Newshape(PyArrayObject *self, PyArray_Dims *newdims,
self = (PyArrayObject *)new;
flags = PyArray_FLAGS(self);
}
+ else {
+ PyErr_SetString(PyExc_AttributeError,
@charris NumPy member
charris added a note

Indent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@charris charris commented on an outdated diff
numpy/core/src/multiarray/shape.c
@@ -163,19 +163,9 @@ PyArray_Resize(PyArrayObject *self, PyArray_Dims *newshape, int refcheck,
return Py_None;
}
-/*
- * Returns a new array
- * with the new shape from the data
- * in the old array --- order-perspective depends on order argument.
- * copy-only-if-necessary
- */
-
-/*NUMPY_API
- * New shape for an array
- */
-NPY_NO_EXPORT PyObject *
-PyArray_Newshape(PyArrayObject *self, PyArray_Dims *newdims,
- NPY_ORDER order)
+static PyObject *
@charris NumPy member
charris added a note

Please document this function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@charris
NumPy member

Can this be done by adding a function that checks if the array can be reshaped in place? Something like

int can_reshape_inplace(...)

I suppose there also needs to be a function that also checks if the array can be reshaped with a copy.

tecki added some commits
@tecki tecki white space corrected
Accidentally, I was using tabs not spaces. (Yes, I do have
correct tab settings, tab is eight characters, but yes,
spaces are better anyhow).
7e33e79
@tecki tecki add comments for shape.c:_Newshape f7f409b
@tecki tecki simpler version for nocopy-reshape
Instead of introducing a new function PyArray_Reshape_Nocopy,
the original function PyArray_Reshape gets extended.
The parameter `order` may be set to a new value `NPY_NOCOPY`,
which prevents PyArray_Reshape from copying.
f220ecf
@tecki
@tecki

This commit is a simpler? solution to the same problem.
Which version is more desirable?

@mwiebe mwiebe commented on an outdated diff
numpy/core/include/numpy/ndarraytypes.h
@@ -187,7 +187,10 @@ enum NPY_TYPECHAR {
/* Fortran order */
NPY_FORTRANORDER=1,
/* An order as close to the inputs as possible */
- NPY_KEEPORDER=2
+ NPY_KEEPORDER=2,
+ /* Don't copy data, so order obviously is kept.
+ * used as special value in PyArray_Newshape */
+ NPY_NOCOPY=3
@mwiebe NumPy member
mwiebe added a note

I think this may be a bad idea. This is a general enum in the publicly exposed API, so a special value for one API function isn't appropriate. Further, NOCOPY seems to have roughly the same meaning as KEEPORDER.

@tecki
tecki added a note

No, NOCOPY has a very different meaning from KEEPORDER. The latter means that the order ought to
be kept (hence the name) while copying, while NOCOPY means that the data should not be copied.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@mwiebe
NumPy member

This looks like a worthy bug to fix, but I don't think the way it's being fixed is good. Fixing it should not require changing the exposed NumPy API.

@tecki

So, I just changed this pull request not to expose NPY_NOCOPY into
the API, by just declaring NPY_NOCOPY internally.

@teoliphant
NumPy member

I agree with tecki that this implementation of array_shape_set needs improvement so that the copy does not happen if it doesn't need to. But, there should be some function that tests this (it looks like there is a related function to such a function in current master _attempt_nocopy_reshape).

The NOCOPY flag to NewShape is an approach --- perhaps called KEEPORDER_AND_DATA

@charris
NumPy member

@tecki Needs rebase.
@seberg Sound like you thought this worthy. It does seem that there might be a function that performs the check available.

@seberg
NumPy member

I do think it makes sense to fix. The _attempt_nocopy_reshape could be called more directly maybe, since it does almost exactly what is needed (it does need a slight addition, but that is easy and I actually have it already). To be honest, I don't know what approach is best. Just a note for the direct call approach: you have to remember finalizing the array, or some temporary one like it happens at the moment (needed for matrix support).

@charris
NumPy member

Closing on account of stale. @seberg Thoughts?

@charris charris closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Aug 24, 2011
  1. @tecki

    Use PyArray_Newshape instead of PyArray_Reshape

    tecki committed
    as per the comment in shape.c, the use of PyArray_Reshape is not
    recommended. Change set_shape accordingly in getset.c.
  2. @tecki

    Do not copy array if shape is set

    tecki committed
    If a shape of an array is set, but the shape is incompatible,
    the array is copied, and the copy immediately discarded.
    
    This commit changes this behaviour so that no copy is ever
    generated.
  3. @tecki

    avoid doubling code

    tecki committed
    Move code of PyArray_Newshape and PyArray_Newshape_Nocopy into
    one function.
Commits on Oct 2, 2011
  1. @tecki

    white space corrected

    tecki committed
    Accidentally, I was using tabs not spaces. (Yes, I do have
    correct tab settings, tab is eight characters, but yes,
    spaces are better anyhow).
  2. @tecki
  3. @tecki

    simpler version for nocopy-reshape

    tecki committed
    Instead of introducing a new function PyArray_Reshape_Nocopy,
    the original function PyArray_Reshape gets extended.
    The parameter `order` may be set to a new value `NPY_NOCOPY`,
    which prevents PyArray_Reshape from copying.
Commits on Mar 14, 2012
  1. @tecki

    don't expose NPY_NOCOPY in API

    tecki committed
Showing with 25 additions and 12 deletions.
  1. +6 −7 numpy/core/src/multiarray/getset.c
  2. +15 −5 numpy/core/src/multiarray/shape.c
  3. +4 −0 numpy/core/src/multiarray/shape.h
View
13 numpy/core/src/multiarray/getset.c
@@ -44,17 +44,16 @@ array_shape_set(PyArrayObject *self, PyObject *val)
{
int nd;
PyArrayObject *ret;
+ PyArray_Dims newdims;
/* Assumes C-order */
- ret = (PyArrayObject *)PyArray_Reshape(self, val);
- if (ret == NULL) {
+ if (!PyArray_IntpConverter(val, &newdims)) {
return -1;
}
- if (PyArray_DATA(ret) != PyArray_DATA(self)) {
- Py_DECREF(ret);
- PyErr_SetString(PyExc_AttributeError,
- "incompatible shape for a non-contiguous "\
- "array");
+ ret = (PyArrayObject *)PyArray_Newshape(self, &newdims, NPY_NOCOPY);
+ PyDimMem_FREE(newdims.ptr);
+
+ if (ret == NULL) {
return -1;
}
View
20 numpy/core/src/multiarray/shape.c
@@ -167,7 +167,11 @@ PyArray_Resize(PyArrayObject *self, PyArray_Dims *newshape, int refcheck,
* Returns a new array
* with the new shape from the data
* in the old array --- order-perspective depends on order argument.
- * copy-only-if-necessary
+ * The data is copied only if it is necessary. Otherwise a view
+ * on the original array is returned. Copying can be prevented by setting
+ * `order` to the special value `NPY_NOCOPY`, which is equivalent to
+ * `NPY_CORDER`, just that an `AttributeError` is flagged if reshaping
+ * is impossible without copying.
*/
/*NUMPY_API
@@ -236,16 +240,17 @@ PyArray_Newshape(PyArrayObject *self, PyArray_Dims *newdims,
(((PyArray_CHKFLAGS(self, NPY_ARRAY_C_CONTIGUOUS) &&
order == NPY_FORTRANORDER) ||
(PyArray_CHKFLAGS(self, NPY_ARRAY_F_CONTIGUOUS) &&
- order == NPY_CORDER)) && (PyArray_NDIM(self) > 1))) {
+ (order == NPY_CORDER || order == NPY_NOCOPY)
+ )) && (PyArray_NDIM(self) > 1))) {
int success = 0;
- success = _attempt_nocopy_reshape(self,n,dimensions,
- newstrides,order);
+ success = _attempt_nocopy_reshape(self,n,dimensions, newstrides,
+ order == NPY_NOCOPY ? NPY_CORDER : order);
if (success) {
/* no need to copy the array after all */
strides = newstrides;
flags = PyArray_FLAGS(self);
}
- else {
+ else if (order != NPY_NOCOPY) {
PyObject *new;
new = PyArray_NewCopy(self, order);
if (new == NULL) {
@@ -255,6 +260,11 @@ PyArray_Newshape(PyArrayObject *self, PyArray_Dims *newdims,
self = (PyArrayObject *)new;
flags = PyArray_FLAGS(self);
}
+ else {
+ PyErr_SetString(PyExc_AttributeError,
+ "incompatible shape for a non-contiguous array");
+ return NULL;
+ }
}
/* We always have to interpret the contiguous buffer correctly */
View
4 numpy/core/src/multiarray/shape.h
@@ -15,4 +15,8 @@ NPY_NO_EXPORT void
PyArray_CreateSortedStridePerm(PyArrayObject *arr,
_npy_stride_sort_item *strideperm);
+/* Make sure the following value does not coincide with
+ * any value in the enum NPY_ORDER */
+#define NPY_NOCOPY 42
+
#endif
Something went wrong with that request. Please try again.