Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Size-1 arrays should be indexable by numpy.bool_ objects (Trac #823) #1421

Closed
numpy-gitbot opened this issue Oct 19, 2012 · 13 comments
Closed

Comments

@numpy-gitbot
Copy link

Original ticket http://projects.scipy.org/numpy/ticket/823 on 2008-06-17 by @teoliphant, assigned to unknown.

The numpy.bool_ scalar should be allowed for masked-indexing of size-1 arrays. Currently, if x is a 0-d array, then x>3 is a numpy.bool_ objects and for a 0-d array y[x>3] will fail, while it should succeed as if numpy.bool_ were a 0-d array of boolean datatype.

This is related to Ticket #1319. There are some questions about whether or not 0-d arrays should be be "maskable". I think they should be maskable and return a size-0 array if the mask does not succeed.

@numpy-gitbot
Copy link
Author

Milestone changed to Unscheduled by @cournape on 2009-03-02

@SylvainCorlay
Copy link

If x is a 0-d array, wouldn't it be more consistent if then x>3 returned a 0-d array of booleans instead of a numpy.bool_ ?

Generally, the

numpy.logical_and
numpy.logical_or
numpy.logical_not

actually numpy.bool_ values when called on 0-d arrays. It would probably be more consistent to let them return a 0-d array.

@SylvainCorlay
Copy link

Hello, here is a message I sent to the numpy-discussion mailing list regarding this issue. I would be very glad it someone was interested in addressing these problems of consistency with 0d array.

Hello everyone,

0d arrays are very convenient because they allow us to write functions that are valid for both arrays and scalar-like argument, thanks to Boolean indexing.

However, most special functions in numpy (and scipy) and most Boolean operators, when called on 0d arrays, return scalars rather than 0d arrays, which is quite annoying.

For example, numpy.exp called on a 0d array containing a float number returns a float, rather than a 0d array, and if x is a 0d array, x > 0 returns a Boolean, rather than a 0d array containing a Boolean.

What I would expect is the following.

-If x is a 0d array containing float, I expect numpy.exp(x) to return a 0d array, and x>0 to return a 0d array containing a Boolean.

-If x is a scalar, numpy.exp(x) returns the expected scalar type, and x>0 returns the expected Boolean.

Here is an example of a simple function that suffers from this issue (a corrected version is proposed later)

import numpy
from scipy.stats import norm

def normal_time_value(sig, m, strikes):
    """
    The three arguments are array-like and have the same shape.
    Consider a random variable G ~ N (m , sig^2)
    The function returns          E[(G-K)+] - (E[G]-K)_+
    which is also equal to         E[(K-G)+] - (K-E[G])_+
    """
    sig = numpy.array(sig)
    strikes = numpy.array(strikes)
    m = numpy.array(m)
    tv = numpy.zeros(strikes.shape)
    tv[sig < 0] = numpy.nan         # sig must be nonnegative
    non0 = sig > 0.0
    dev = numpy.where(non0, (m - strikes) / sig, numpy.nan)
    tv[non0] = numpy.where(strikes > m, (m - strikes) * norm.cdf(dev) + sig * norm.pdf(dev), (strikes - m) * norm.cdf(-dev) + sig * norm.pdf(dev))[non0]
    return tv

This function does not work with scalars or 0d arrays. To make it work, we need to modify it in the following fashion: reconvert intermediate results to 0d array to take advantage of the Boolean indexing.

import numpy
from scipy.stats import norm

def normal_time_value(sig, m, strikes):
    """
    The three arguments are array-like and have the same shape.
    Consider a random variable G ~ N (m , sig^2)
    The function returns          E[(G-K)+] - (E[G]-K)_+
    which is also equal to        E[(K-G)+] - (K-E[G])_+
    """
    sig = numpy.array(sig)
    strikes = numpy.array(strikes)
    m = numpy.array(m)
    tv = numpy.zeros(strikes.shape)
    tv[numpy.array(sig < 0)] = numpy.nan         # sig must be nonnegative
    non0 = numpy.array(sig > 0.0)
    dev = numpy.where(non0, (m - strikes) / sig, numpy.nan)
    tv[non0] = numpy.where(numpy.array(strikes > m), (m - strikes) * norm.cdf(dev) + sig * norm.pdf(dev), (strikes - m) * norm.cdf(-dev) + sig * norm.pdf(dev))[non0]
    return tv

This problem also affects functions like logical_and, logical_or and logical_not, which all return numpy.bool_ type rather than 0d array of dtype bool.

Best,

@seberg
Copy link
Member

seberg commented May 9, 2013

You are suggesting to not silently convert 0-d arrays to scalar. I think that is a very much non-trivial change (and maybe even difficult to know if it might break things), though maybe it is a long term plan. On the other hand, allowing scalar bools to index 0-d arrays -- I don't see a problem with that -- just another special case (or even not really, since there is no point in casting True/False to 1/0 for indexing purpose inside numpy).

@SylvainCorlay
Copy link

Sure, I can see that it is quite a significant change, but having a better logic in the way 0d arrays work would probably help having less bugs in the future.
If you guys were to tackle this issue, you could do it function by function to have a minimal impact. For example

  1. The comparison of a 0d array with a scalar or another 0d array should return a 0d array of bools. (example of t. oliphant) Very useful for mask indexing.
  2. logical_and, logical_or, logical_not
  3. numpy.exp, numpy.sqrt etc. (all vectorized scalar functions)
    Cheers,

@njsmith
Copy link
Member

njsmith commented May 9, 2013

The problem is that there are many many places where numpy in principle
should return either a 0d array or a scalar depending on whether its input
is a 0d array or a scalar, and right now the logic to keep track of which
was used as input just doesn't exist -- instead we sort of assume that
people usually want scalars. (And people will get confused if they start
getting 0d arrays at unexpected locations too, so we can't just switch the
default to assuming people usually want 0d arrays, even on individual
functions.)

We could fix this in principle but it would be a lot of work, and it's not
clear that it's even the right idea. A better idea might be to just get rid
of scalars and use 0d arrays everywhere, and that might be a better place
to put effort. But either way a significant chunk of work needs to be done,
and in particular if we want to get rid of scalars entirely then there are
a few things that need to happen first, like some optimization of small
array operations...

On Thu, May 9, 2013 at 3:45 PM, Sylvain Corlay notifications@github.comwrote:

Sure, I can see that it is quite a significant change, but having a better
logic in the way 0d arrays work would probably help having less bugs in the
future.
If you guys were to tackle this issue, you could do it function by
function to have a minimal impact. For example

  1. The comparison of a 0d array with a scalar or another 0d array should
    return a 0d array of bools. (example of t. oliphant) Very useful for mask
    indexing.
  2. logical_and, logical_or, logical_not
  3. numpy.exp, numpy.sqrt etc. (all vectorized scalar functions)
    Cheers,


Reply to this email directly or view it on GitHubhttps://github.com//issues/1421#issuecomment-17685393
.

@seberg
Copy link
Member

seberg commented May 9, 2013

There are two points here, the first is, make this boolean indexing work. That is simple (but honestly I will not try it, because if I look again at indexing it will be a complete rewrite which is already half done, schedule open though).

The other point is the suggestion of actually retaining the 0-d arrays. That is far more complex and dangerous, since for example arrays are mutable and views, it could create huge bugs out in the wild. Note that you actually don't need this part, instead for your case adding another check to the 0-d special case(s) in mapping.c that checks for np.bool_ is sufficient.

@SylvainCorlay
Copy link

Regarding last comment of njsmith, I would find more logical to keep track of the input type and choose the return type accordingly.
Regarding last comment of seberg, I don't see why 0d-array should allow boolean indexing by anything else than a 0d-array. It would make it a very special kind of array, with a different semantic right?

@seberg
Copy link
Member

seberg commented May 10, 2013

From a purity point of view you are probably right, but practicality beats purity (and even more so if the pure solution might break user code).

I disagree that its very special (just implementation wise). First converting the np.bool_ scalar into an array is something I do not feel is special at all, since we accept scalars for 0-d arrays everywhere, the np.bool_ does not make sense as an index by itself and in that case trying to convert to an array is standard procedure.

Second we are talking about the "full boolean index" case, i.e. where the boolean index has exactly the same shape as the array being indexed. Now you probably can define something like array[np.array(False), ...] for non-0-d arrays (result would be np.empty((0,) + array.shape), dtype=array.dtype)) and if you do that the "full boolean index" special case is much less special. Thinking about it, I like the idea of allowing it, but I nevertheless really think that the full boolean index is special enough to, at least for now, add the 0-d special case only to it and not to fancy indexing. If you want to add that special case to fancy indexing too, I don't mind that, but that would probably not be easy in the current code base and give little functional gain.

@SylvainCorlay
Copy link

By the way, in which kind of case could it break things to return 0d-arrays rather tan scalar in the case of 0d-array inputs? Aren't all scalar operations valid for 0d-arrays?

@seberg
Copy link
Member

seberg commented May 15, 2013

Probably painting it more dangerous then it is, since in normal usage I guess nothing bad can happen. But hashing for example is not valid for arrays and in-place operations behave different (mutable vs. immutable). I am simply wary that assuming that nobody has weird enough code to trigger such a difference is a good idea.

This is nonsense and certainly weird code, but it would break badly:

if array.ndim == 0:
    i = stop = array - 10  # i is the same object as stop
    stop += array           # i is increment as well
    while i < stop:
        i += 1
        # do something less boring

@seberg
Copy link
Member

seberg commented Feb 10, 2014

This is now implemented in master by the new index machinery from gh-3798.

@seberg seberg closed this as completed Feb 10, 2014
@SylvainCorlay
Copy link

Should there be a separate issue opened to carry on the discussion on the behavior of ufuncs?(Regarding whether they should return 0d-arrays in case of 0d-array inputs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants