Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate truth-testing on empty arrays #9583

Closed
ExpHP opened this issue Aug 19, 2017 · 13 comments
Closed

Deprecate truth-testing on empty arrays #9583

ExpHP opened this issue Aug 19, 2017 · 13 comments

Comments

@ExpHP
Copy link
Contributor

ExpHP commented Aug 19, 2017

I tested the waters with this on the numpy-discussion newsgroup earlier this week yesterday, and the general response seemed to be that this is actionable, so I am making an issue for further discussion.

The long and short is that truth-testing on empty arrays is dangerous, misleading, and not in any way useful, and should be deprecated.

The next post will explain the rationale for deprecation more in-depth.

@ExpHP
Copy link
Contributor Author

ExpHP commented Aug 19, 2017

Rationale

Overview

It is impossible to take advantage of the fact that empty arrays are False, because an array can be False for other reasons.

bool(arr) on an ndarray currently has the following behavior:

  • if arr has size=0, return False.
  • if arr has size=1, call bool() on the only element.
  • if arr has size>2, raise ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Unfortunately, this tries to follow two different incompatible conventions at once. Somebody typing bool(np.array([])) at the REPL might wrongly get the impression that truth testing tests for emptiness:

>>> import numpy as np
>>> bool(np.array([]))
False
>>> # but this is not a good way to test for emptiness, because...
... bool(np.array([0]))
False

Personally, this behavior made me waste a good hour barking up the wrong tree debugging a function after I changed a list variable with 0 or 1 members into an array type, causing an if list: statement to suddenly start failing for the list [0].

Proposal

Based on comments in the newsgroup thread, the best proposal seemed to be as follows:

  • Deprecate __nonzero__/__bool__ on arrays with size=0, with a message recommending the use of arr.size if the user wants to test for emptiness. (the message could also mention any()/all())
  • Possibly in some future version, change this deprecation to an error, so that all non-size=1 arrays produce an error recommending any()/all() when boolean-evaluated.

One possible addition is to deprecate this for non-scalar arrays as well; (e.g. stop accepting bool() on arrays of shape (1,) and instead require arr.squeeze()); however, unlike the behavior for size=0 arrays, this behavior does not seem actively dangerous. (discussion in #9885)

pv warns against this addition:

While the intention of making it harder to write code with bugs is good,
it should not come at the cost of having everyone fix their old scripts,
which worked correctly previously, but then suddenly stop working.

To address some possible concerns

Why take away our ability to test for emptiness?

We aren't!

The fact of the matter is, the truthiness of a numpy array is not a reliable test for emptiness to begin with! As shown above, bool(array([0])) is False, and you'll get an error for any array with more than one element. The recommendation is to do your truth-testing on arr.size instead. (In fact, this is in the SciPy FAQ)

But isn't truthiness the accepted way to test for emptiness in Python? Why should numpy be different?

Eric Firing presents the stance that the behavior ought to be changed in the other direction:

Nothing you have shown is inconsistent with "Falseness is emptiness", which is quite fundamental
in Python. The inconsistency is in distinguishing between 1 element and
more than one element. To be consistent, bool(array([0])) and
bool(array([0, 1])) should both be True.

While this does have it's advantages (in particular, it facilitates duck typing in polymorphic code that wants to work with Numpy and non-numpy types), it is a pretty tough sell, since it changes the semantics of working, currently bug-free code. Unfortunately, Numpy has made a number of fundamental design decisions that are simply incompatible with using truth testing to determine emptiness.

@eric-wieser puts it as follows:

In idiomatic code, numpy arrays have semantics closer to scalars than to sequences - iteration is usually a red flag. Another example of how arrays are not like sequences - the + operator is element-wise addition, not sequence concatenation.

Another way in which arrays behave like scalars is in the fact that comparison operators return arrays. The implications of this are huge! Here's some things that would happen if we redefined bool(arr) to be equivalent to bool(arr.size) while retaining the current semantics of comparison operators:

>>> bool(array([1,2,3]) == array([7,8,9]))    # currently an error
True
>>> bool(array([1]) != array([1]))  # currently False
True
>>> 2 < array([0,7]) < 5   # currently an error
True
>>> bool(array([]) == array([]))  # actually, this is False even today
False

The first example is particularly sinister. Imagine unit tests that trivially succeed without actually testing anything!

My concern isn't addressed!

Please share it! Deprecations in such a popular library should be taken seriously. Also share if you have any existing code that legitimately/correctly makes use of the current behavior, as my currently held belief is that no such code exists.

@mhvk
Copy link
Contributor

mhvk commented Aug 20, 2017

Thanks for the nice & detailed summary!

@ExpHP ExpHP changed the title Truth testing on empty arrays is dangerous Deprecate truth-testing on empty arrays Aug 20, 2017
@eric-wieser
Copy link
Member

This is a trivial fix on numpy/core/src/multiarray/number.c:800 - if you don't want to make the patch yourself, would you like to suggest a deprecation message, @ExpHP?

@hemildesai
Copy link

Hi, I am looking to start contributing, and I would love to work on adding the patch (deprecation warning) based on the proposal described if it is still desired and no one is currently working on it.

@eric-wieser
Copy link
Member

@hemildesai: go for it - most of the work is in coming up with the right message to show, which you may as well propose here first

@eric-wieser eric-wieser added this to the 1.14.0 release milestone Sep 19, 2017
@ExpHP
Copy link
Contributor Author

ExpHP commented Sep 19, 2017

Cool by me, I've been busy with other things.

When I was trying to write a message earlier, I was looking through other deprecations and was disappointed to find that they don't link to anything; this is what I was planning to do with my "Rationale" post.

@hemildesai
Copy link

Being consistent with the error for arrays with size > 1, how about The truth value of an empty array is ambigous and will result in an error in the future. Use a.size? I could add a link to the post if that is suggested.

Also, is it suggested to use DEPRECATE or DEPRECATE_FUTUREWARNING?

@ExpHP
Copy link
Contributor Author

ExpHP commented Sep 19, 2017

Consistency is a decent goal and helps enforce a consistent mental model, though my 2 cents is that I don't think it's so important for deprecation warnings. There is a human element here; there likely exists code out there which uses this feature, and it may be intentional or accidental. I would try to consider how such code could arise and what message we want to communicate to the author in each case.

I can think of these scenarios:

  • Author's intent was to test an array as a scalar, and is unaware that the array they are testing is sometimes empty; it may originate from code they didn't write. (They might not even realize it is an array, since broadcasting papers over most common issues!)
    • Goal: communicate that the value is an empty array (:heavy_check_mark: the message does this)
    • Goal: communicate how this test used to behave, or at least make it easy to figure out
      • ❓ so long as the user asks themselves what would an empty array used to have evaluated to?, the answer is pretty easy to guess. The question then is whether or not their attention is effectively drawn to this detail
  • Author's intent was to test emptiness, and the arrays happened to always have size 0 or 1.
    • Goal: remind the user that boolean tests on arrays don't work like that
      (:heavy_check_mark: message is evocative of the error for size > 1, which every user has encountered)
    • Goal: suggest an alternative for emptiness testing (:heavy_check_mark:)
  • Author understood precisely how truth testing works and their code properly took all edge cases into account.
    • Goal: communicate that this will become an error (:heavy_check_mark:)

So by these measures, at least, I do think the message is good.

Edit: I added one more goal, which I'm not certain is effectively met. I'm trying to think of how one could do this while keeping the message terse.

@hemildesai
Copy link

@ExpHP Thanks for the feedback. That is an excellent way to think about deprecation warnings and I will definitely use this thought process wherever possible.

Regarding the last goal, how about -
Returning False for the truth/bool value of an empty array, but this is ambiguous and will result in an error in the future. Use a.size for testing emptiness of an array

Based off of https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L753-L754

@ExpHP
Copy link
Contributor Author

ExpHP commented Sep 19, 2017

I don't like this one as much because it feels like it says a lot. In particular, the "this is ambiguous" bit feels awkward now and my brain doesn't quite as easily make the connection to the size > 2 error message.
(putting myself in the shoes of scenario 2, I feel like saying "it's not ambiguous! It's false like every other python type!")

IMO, the best error messages are ones that communicate a lot while appearing to say very little. The same words just happen to communicate the right information to the right people.


(have I mentioned that I have a tendency to overthink things?
Because I very much have a tendency to overthink things. 😛)

@hemildesai
Copy link

I agree that the iterated message is a bit verbose.
How about -
Returning False. However, the truth value of an empty array is ambiguous and will result in an error in the future. Use a.size
?

@ExpHP
Copy link
Contributor Author

ExpHP commented Sep 19, 2017

Returning False. However, the truth value of an empty array is ambiguous and will result in an error in the future. Use a.size

I think this works.

@mhvk
Copy link
Contributor

mhvk commented Sep 19, 2017

Maybe Use a.size > 0 (I know it is the same thing, but I think it makes clearer why this is a good substitute).

ankostis added a commit to pandalone/pandalone that referenced this issue May 13, 2020
as recommended in numpy/numpy#9583,
to handle np.arrays's new bool-behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants