Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Refactor updateifcopy #9639

Merged
merged 16 commits into from
Nov 8, 2017
Merged

Conversation

mattip
Copy link
Member

@mattip mattip commented Sep 1, 2017

Replaces pull requests #9269 and #9451, see extensive previous discussions there. Summary:

  • moves the code to resolve UPDATEIFCOPY semantics from array_dealloc into a new API function, call the new function both from array_dealloc and from all the places that previously relied on Py_DECREF. This is to support implementations with a different GC strategy, where Py_DECREF may not immediately cause calling array_dealloc and UPDATEIFCOPY resolution
  • In order to alert downstream NumPy C-API consumers who use the UPDATEIFCOPY semantics of this change, add a new flag NPY_ARRAY_WRITEBACKIFCOPY and a new function PyArray_SetWritebackIfCopyBase. Use of either of those will raise a FutureWarning.
  • This pull request does not touch nditer, with its own special opflag updateifcopy.That will be handled in a future pull request. As a consequence a warning on resolution of the UPDATEIFCOPY semantics in array_dealloc will only happen now if the compiler flag -DDEPRECATE_UPDATEIFCOPY is used. On PyPy that is the default.

@charris
Copy link
Member

charris commented Sep 1, 2017

You should close older PRs if you intend to replace them with this. You can always resubmit them.

@mattip mattip force-pushed the refactor-updateifcopy2 branch 3 times, most recently from f14c0ee to 059f070 Compare September 2, 2017 18:19
@mattip
Copy link
Member Author

mattip commented Sep 2, 2017

@njsmith could you review? I reworked the pull request:

  • there is now WRITEBACKIFCOPY with the single letter X in flags, use of the old UPDATEIFCOPY (U) will issue a FutureWarning.
  • Added tests for the new flag and that the old one does indeed issue a warning,
  • Addeda test for a FUTUREWARNING if using UPDATEIFCOPY in the array creation routines.
  • Document the new behavior and deprecation of UPDATEIFCOPY

@charris charris changed the title Refactor updateifcopy2 MAINT: Refactor updateifcopy Sep 2, 2017
doc/CAPI.rst.txt Outdated
This is a special flag that is set if this array represents a copy
made because a user required certain flags in ``PyArray_FromAny`` and
a copy had to be made of some other array (and the user asked for
this flag to be set in such a situation). The base attribute then
points to the "misbehaved" array (which is set read_only). When
the array with this flag set is deallocated, it will copy its
``PyArray_ResoveWritebackIfCopy`` is called, it will copy its
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing l in Resolve.

Text should probably say something like "if you use this flag, you have to later call PyArray_ResolveWriteBackIfCopy on the array. If the the array has the NPY_WRITEBACKIFCOPY flag set, it will copy its contents back ... Otherwise, this operation is a no-op."

doc/CAPI.rst.txt Outdated
``NPY_UPDATEIFCOPY``
Similar to ``NPY_WRITEBACKIFCOPY``, but deprecated since it copied the
contents back when the array is deallocated, which is not explicit and
relies on refcount semantics.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably go ahead and mention PyPy here by name, since not everyone knows why refcount semantics matter :-). "which is unreliable on alternative Python implementations like PyPy that don't use refcounting.", maybe.

ultimate owner of the memory exposes a writeable buffer interface,
or is a string. (The exception for string is made so that unpickling
can be done without copying memory.)
The (deprecated) UPDATEIFCOPY and WRITEBACKIFCOPY flags can never be set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The WRITEBACKIFCOPY and (deprecated) UPDATEIFCOPY", to make clear that we're not saying both are deprecated.

@@ -894,11 +895,11 @@ typedef int (PyArray_FinalizeFunc)(PyArrayObject *, PyObject *);
#define NPY_ARRAY_IN_ARRAY (NPY_ARRAY_CARRAY_RO)
#define NPY_ARRAY_OUT_ARRAY (NPY_ARRAY_CARRAY)
#define NPY_ARRAY_INOUT_ARRAY (NPY_ARRAY_CARRAY | \
NPY_ARRAY_UPDATEIFCOPY)
NPY_ARRAY_WRITEBACKIFCOPY)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it turns out third party code is using INOUT_ARRAY (e.g. scipy/interpolate/src/_interpolate.cpp), so I guess we can't just change its value :-(. We could define some new version like NPY_ARRAY_INOUT_ARRAY2, or maybe we should just deprecate it and tell people that they should write NPY_ARRAY_CARRY | NPY_ARRAY_WRITEBACKIFCOPY if that's what they want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For backward compatibility I prefer the first option. I will extend the warning notice to reflect the possible flags that could trigger it.

#define NPY_ARRAY_IN_FARRAY (NPY_ARRAY_FARRAY_RO)
#define NPY_ARRAY_OUT_FARRAY (NPY_ARRAY_FARRAY)
#define NPY_ARRAY_INOUT_FARRAY (NPY_ARRAY_FARRAY | \
NPY_ARRAY_UPDATEIFCOPY)
NPY_ARRAY_WRITEBACKIFCOPY)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever we do for NPY_ARRAY_INOUT_ARRAY we should do the same here.

@mattip
Copy link
Member Author

mattip commented Sep 3, 2017

changes pushed in response to review, once approved I will squash to a single commit

@@ -2923,7 +2923,7 @@ npyiter_allocate_arrays(NpyIter *iter,
/* If the data will be written to, set UPDATEIFCOPY */
if (op_itflags[iop] & NPY_OP_ITFLAG_WRITE) {
Py_INCREF(op[iop]);
if (PyArray_SetUpdateIfCopyBase(temp, op[iop]) < 0) {
if (PyArray_SetWritebackIfCopyBase(temp, op[iop]) < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change and the next one, inside the nditer code, correct? I thought we were keeping nditer using UPDATEIFCOPY for now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal was for nditer to not print any warning after this pull request on CPython. The following logic allows that:

  • all calls to PyArray_SetUpdateIfCopyBase issue a FutureWarning,
  • to maintain backward compatiblity, PyArray_ResolveWritebackIfCopy is called in array_dealloc if arr->base exists
    • if PyArray_ResolveWritebackIfCopyactually does something, it will currently warn, but only on PyPy or if compiled with ``-DDEPRECATE_UPDATEIFCOPY"

By using PyArray_SetWritebackIfCopyBase, nditer will neither warn when creating its operands nor when resolving in array_dealloc. When we deal with nditer in a future pull request, we can enable the warning in array_dealloc on all implementations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all calls to PyArray_SetUpdateIfCopyBase issue a FutureWarning

This should be a DeprecationWarning. (Rule: DeprecationWarning is for when we're trying to remove an API, FutureWarning is for when we're going to keep it but change the semantics.) And I think this is where we should do the "if PyPy or -DDEPRECATE_UPDATEIFCOPY" check.

array_dealloc

I think I'd prefer the array_dealloc logic be:

  • If WRITEBACKIFCOPY is set, then issue a RuntimeWarning and that's it (you're supposed to call ResolveUpdateIfCopy or ClearUpdateIfCopy before the array is GCed).

  • If UPDATEIFCOPY is set, then resolve it up like we always did. We don't need to issue a warning, because we already did that when UPDATEIFCOPY was set.

This then suggests that nditer should keep using UPDATEIFCOPY until we're ready to fix it properly, which seems logical, since it will in fact be broken until then, and if you're on PyPy or have -DDEPRECATE_UPDATEIFCOPY set then it's probably useful to see that your program is in fact using UPDATEIFCOPY (via nditer).

assert_equal(self.a.flags['U'], False)
assert len(sup) == 2
assert_equal(self.a.flags.writebackifcopy, False)
assert_equal(self.a.flags['X'], False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use numpy.testing.assert_warns here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then I should wrap each test in a assert_warns context manager? (likewise line 4323)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like that? Honestly I don't remember the details, I just remember that @seberg had to go on like some ridiculous epic quest in order to get reliable warning tests working, so we should make use of his work :-). (@seberg, any hints?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing special there, the epic quest is just to never use "ignore" filter (and we have a test to make sure you don't do that). Assert warns as context filter is great, if you want to use the message text, you can use suppress_warnings directly.

@njsmith
Copy link
Member

njsmith commented Sep 5, 2017

OK, I finally reached the bottom! Sorry for the drawn-out review :-)

Overall it's looking quite reasonable. What state do you think it's at -- ready to go in?

The one remaining high level thing that occurred to me is that PyArray_XDECREF_ERR is pretty weird with the new semantics. Maybe we should have a PyArray_ClearWritebackIfCopyBase function clears the flag and ->base, basically the inverse of SetWritebackIfCopyBase?

@mattip
Copy link
Member Author

mattip commented Sep 5, 2017

@njsmith thanks for the non-trivial review and discussion process. Except for the assert_warns tweak, it seems ready, and does what I expect on PyPy as well.

If I understand correctly, PyArray_XDECREF_ERR seems to be a last-ditch effort to salvage arr->base if something else crashed, and arr should be discarded. IMO it should not touch arr at all, since its state is screwed up anyway, and it is about to be thrown away, so the less code the better. (EDIT) Since currently the UPDATEIFCOPY flag is cleared, the pull request already adds code to clear the new WRITEBACKIFCOPY flag as well

What next? Should I squash the commits to a single one?

Edit to mention that PyArray_XDECREF_ERR does clear the flags

@njsmith
Copy link
Member

njsmith commented Sep 5, 2017

Re: PyArray_XDECREF_ERR: my point is that with the old UPDATEIFCOPY API, the way you eventually resolve the whole view/copy/UPDATEIFCOPY situation is to either call Py_DECREF or PyArray_XDECREF_ERR on the array. This is problematic and makes the API design part of my brain itch, because it overloads the normal DECREF semantics to have this other meaning, but at least it's consistent – you always DECREF and the DECREF is always overloaded to resolve the UPDATEIFCOPY.

With this PR currently, the WRITEBACKIFCOPY API is that you have to eventually call either PyArray_ResolveWritebackIfCopy or else PyArray_XDECREF_ERR. This is weird and inconsistent – what does DECREF have to do with anything? It could work, but my preference would be to clean it up by deprecating PyArray_XDECREF_ERR and replacing it with a PyArray_DiscardWritebackIfCopy or similar that's like PyArray_ResolveWritebackIfCopy in that neither of them do the DECREF.

@mattip
Copy link
Member Author

mattip commented Sep 5, 2017

Changing the logic exposed a missing call to PyArray_ResolveWritebackIfCopy in the ufunc iterator code. Now fixed.

The PyArray_XDECREF_ERR -> PyArray_DiscardWritebackIfCopy change is still not in the pull request

@ghost
Copy link

ghost commented Nov 8, 2017

@njsmith Ready.

@mattip
Copy link
Member Author

mattip commented Nov 8, 2017

@njsmith if I could get a thumbs-up on this someone else can do the merge. Can you at some point give a stronger approval than the "I think this can go in" above?

@ahaldane
Copy link
Member

ahaldane commented Nov 8, 2017

OK so I just read through the problem. I'm not sure I caught all the subtleties, but at least I see the point and I see why this is an improvement.

So if this gets into 1.14, is the plan also to implement #9714 in time for 1.14?

And if I understand correctly, the discussion at this point is about exactly when to emit deprecation warnings. In 1.14 the old updateifcopy flag will still "work", just emit warnings. Is that right?

@ahaldane
Copy link
Member

ahaldane commented Nov 8, 2017

Also, PyArray_SetUpdateIfCopyBase may be part of the API, but it doesn't seem to be documented on the api doc page.

@mattip
Copy link
Member Author

mattip commented Nov 8, 2017

@ahaldane the fix for issue #9714 is a much smaller so if there is consensus on the direction it could be ready for 1.14. Please take a look and tell me if we should deprecate the use case or make nditer into a context manager.

The old flag and behaviour is still supported, and will emit warnings, except for the cases mentioned in issue #9714 (for a more complete discussion see this part of this pull request

#define NPY_ARRAY_IN_FARRAY (NPY_ARRAY_FARRAY_RO)
#define NPY_ARRAY_OUT_FARRAY (NPY_ARRAY_FARRAY)
#define NPY_ARRAY_INOUT_FARRAY (NPY_ARRAY_FARRAY | \
NPY_ARRAY_UPDATEIFCOPY)
#define NPY_ARRAY_INOUT_FARRAY2 (NPY_ARRAY_FARRAY | \
NPY_ARRAY_WRITEBACKIFCOPY)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So presumably this 2 version is only needed during the deprecation phase, afterwards we will redefine NPY_ARRAY_INOUT_FARRAY to use NPY_ARRAY_WRITEBACKIFCOPY.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's one option, but it's a bit risky since we don't have any way to know when all the old code using NPY_ARRAY_INOUT_FARRAY is gone. We might just keep the 2 version forever – it's not pretty but it's safe and works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we can leave the 2 version forever, but after the deprecation phase no one needs to use it any more.

@ahaldane
Copy link
Member

ahaldane commented Nov 8, 2017

What do you think of changing the release note as follows:

``UPDATEIFCOPY`` semantics deprecated in favor of ``WRITEBACKIFCOPY``
--------------------------------------------------------------------
The ``UPDATEIFCOPY`` flag is deprecated, and replaced by a new flag
``WRITEBACKIFCOPY``. In the C-api, the ``NPY_ARRAY_UPDATEIFCOPY`` flag is
deprecated, and should be replaced with ``NPY_ARRAY_WRITEBACKIFCOPY``, 
with new semantics.

Code using these flags must be updated to account for changed semantics, as
follows: When ndarrays are created with either the deprecated ``UPDATEIFCOPY``
or new ``WRITEBACKIFCOPY`` flags, a temporary copy of the data is created which
is eventually written back to the original array. For ``UPDATEIFCOPY``, the
writeback to the original array occurs at ndarray deallocation. For
``WRITEBACKIFCOPY``, a new ``PyArray_ResolveWritebackIfCopy`` function must be
explicitly called before deallocation (or, if an error occurred,
``PyArray_DiscardWritebackIfCopy``). Numpy now mostly uses the
``WRITEBACKIFCOPY`` mechanism internally, but still uses ``UPDATEIFCOPY`` in
nditer use for back-compatibility of the python nditer interface.

In python code, if numpy is compiled with ``-DDEPRECATE_UPDATEIFCOPY`` or if
run on PyPy, ``DeprecationWarning`` warnings will be issued on use of
``UPDATEIFCOPY`` (eg, when nditer is used) if writeback resolution has not
occurred before calling ``array_dealloc``. In the future this warning will
always be issued, once nditer stops using ``UPDATEIFCOPY``.

In C code, calling ``PyArray_SetUpdateIfCopyBase`` will now always issue a
``DeprecationWarning`` and should be replaced by
``PyArray_SetWritebackIfCopyBase``. Similarly, calls to ``PyArray_XDECREF_ERR``
should be replaced by ``PyArray_DiscardWritebackIfCopy``.  Calling ndarray
creation functions such as ``PyArray_FromArray`` where flags uses
``NPY_ARRAY_UPDATEIFCOPY`` will also raise a ``DeprecationWarning``. 

Note that during the deprecation period, the ``NPY_ARRAY_INOUT_ARRAY`` and
``NPY_ARRAY_INOUT_FARRAY`` flags should be replaced by
``NPY_ARRAY_INOUT_ARRAY2`` and ``NPY_ARRAY_INOUT_FARRAY2``, which behave
similarly but use the new `WRITEBACKIFCOPY` semantics.

I want to make it easy for downstream projects to know what to change.

@njsmith
Copy link
Member

njsmith commented Nov 8, 2017

@ahaldane: Are you happy?

@ghost
Copy link

ghost commented Nov 8, 2017

And if I understand correctly, the discussion at this point is about exactly when to emit deprecation warnings. In 1.14 the old updateifcopy flag will still "work", just emit warnings. Is that right?

That's correct. Generally the deprecation procedure is to immediately emit warnings as soon as the deprecations occur. We can't do that here because there is still some internal code that calls this, but that doesn't work on pypy so we do that anyway. IMHO the next PR should rename the function so that warnings can also be emitted on CPython.

@ahaldane
Copy link
Member

ahaldane commented Nov 8, 2017

Yeah looks good to me.

@njsmith
Copy link
Member

njsmith commented Nov 8, 2017

Okay, let's do this...

@njsmith njsmith merged commit 1368cbb into numpy:master Nov 8, 2017
@mattip mattip deleted the refactor-updateifcopy2 branch November 8, 2017 21:24
:c:data:`NPY_ARRAY_UPDATEIFCOPY` flag. Also resets the
:c:data:`NPY_ARRAY_WRITEABLE` flag on the base object. This is
useful for recovering from an error condition when
writeback semantics are used, but will lead to wrong results.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wording isn't great here - I read this as "This is useful ..., but will lead to wrong results", which is just confusing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about "useful for recovering from an exception when writeback semantics are used. Pending changes to the base array will be thrown away"

@@ -36,5 +36,6 @@
0x0000000a = 9b8bce614655d3eb02acddcb508203cb

# Version 11 (NumPy 1.13) Added PyArray_MapIterArrayCopyIfOverlap
# Version 11 (NumPy 1.14) No Change
# Version 11 (NumPy 1.14) Added PyArray_ResolveWritebackIfCopy,
# PyArray_SetWritebackIfCopyBase and deprecate PyArray_SetUpdateIfCopyBase.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this get a new hash on the last line? Or does that only happen at release time? @charris?

At any rate, I think the comment should go below the last hash in the file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gets checked and updated at release time, but should be updated on the fly. IIRC, that was how it was done with the last addition in 1.13.

s = PyUnicode_FromString(msg);
#endif
if (s) {
PyErr_WriteUnraisable(s);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct usage - the argument should be the context - the message is already printed along with the exception type. Right now, the message is printed twice.

Off the top of my head, (PyObject *)&PyArray_Type would be a safe context to pass. In theory you could pass self, but calling array_repr sounds a little too risky.

if nothing else, passing NULL would cause the message alone to be printed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we used the py3 tp_finalize slot, then we could pass ndarray.__del__ as the context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, the message is printed twice

This code should only run if DEPRECATE (which is PyErr_WarnEx ) raises an exeption, in which case the exception will be ignored since this is a deallocator. In order to be sure the error is emitted, I wanted to print it to stderr, and the documentation seems to indicate that is what PyErr_WriteUnraisable does. Should I print to stderr and then emit PyErr_WriteUnraisable as well, or simply skip it altogether? Since there was no example in NumPy I looked at how PyErr_WriteUnraisable is used in Pandas as a hint.

Copy link
Member

@eric-wieser eric-wieser Dec 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the documentation seems to indicate that is what PyErr_WriteUnraisable does

Correct, but it does this by looking at sys.exc_info, not by looking at its argument

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand, the entire if() clause should be replaced by the one-liner

PyErr_WriteUnraisable((PyObject *)&PyArray_Type);

How should I proceed since this pull request has been merged? Issue another pull request based off HEAD?

Copy link
Member Author

@mattip mattip Dec 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, it seems cpython prints both the context from PyErr_Fetch and repr(obj) for a call to PyErr_WriteUnrasiable but PyPy only prints the repr(obj) without the context. I will file an issue there

edit: wrong, PyPy does do the right thing, and double-prints the current message as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally go for a new PR based on the last commit of this PR, but going off head would work fine too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants