New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Size-1 arrays should be indexable by numpy.bool_ objects (Trac #823) #1421
Comments
Milestone changed to |
If x is a 0-d array, wouldn't it be more consistent if then x>3 returned a 0-d array of booleans instead of a numpy.bool_ ? Generally, the numpy.logical_and actually numpy.bool_ values when called on 0-d arrays. It would probably be more consistent to let them return a 0-d array. |
Hello, here is a message I sent to the numpy-discussion mailing list regarding this issue. I would be very glad it someone was interested in addressing these problems of consistency with 0d array. Hello everyone, 0d arrays are very convenient because they allow us to write functions that are valid for both arrays and scalar-like argument, thanks to Boolean indexing. However, most special functions in numpy (and scipy) and most Boolean operators, when called on 0d arrays, return scalars rather than 0d arrays, which is quite annoying. For example, numpy.exp called on a 0d array containing a float number returns a float, rather than a 0d array, and if x is a 0d array, x > 0 returns a Boolean, rather than a 0d array containing a Boolean. What I would expect is the following. -If x is a 0d array containing float, I expect numpy.exp(x) to return a 0d array, and x>0 to return a 0d array containing a Boolean. -If x is a scalar, numpy.exp(x) returns the expected scalar type, and x>0 returns the expected Boolean. Here is an example of a simple function that suffers from this issue (a corrected version is proposed later) import numpy
from scipy.stats import norm
def normal_time_value(sig, m, strikes):
"""
The three arguments are array-like and have the same shape.
Consider a random variable G ~ N (m , sig^2)
The function returns E[(G-K)+] - (E[G]-K)_+
which is also equal to E[(K-G)+] - (K-E[G])_+
"""
sig = numpy.array(sig)
strikes = numpy.array(strikes)
m = numpy.array(m)
tv = numpy.zeros(strikes.shape)
tv[sig < 0] = numpy.nan # sig must be nonnegative
non0 = sig > 0.0
dev = numpy.where(non0, (m - strikes) / sig, numpy.nan)
tv[non0] = numpy.where(strikes > m, (m - strikes) * norm.cdf(dev) + sig * norm.pdf(dev), (strikes - m) * norm.cdf(-dev) + sig * norm.pdf(dev))[non0]
return tv This function does not work with scalars or 0d arrays. To make it work, we need to modify it in the following fashion: reconvert intermediate results to 0d array to take advantage of the Boolean indexing. import numpy
from scipy.stats import norm
def normal_time_value(sig, m, strikes):
"""
The three arguments are array-like and have the same shape.
Consider a random variable G ~ N (m , sig^2)
The function returns E[(G-K)+] - (E[G]-K)_+
which is also equal to E[(K-G)+] - (K-E[G])_+
"""
sig = numpy.array(sig)
strikes = numpy.array(strikes)
m = numpy.array(m)
tv = numpy.zeros(strikes.shape)
tv[numpy.array(sig < 0)] = numpy.nan # sig must be nonnegative
non0 = numpy.array(sig > 0.0)
dev = numpy.where(non0, (m - strikes) / sig, numpy.nan)
tv[non0] = numpy.where(numpy.array(strikes > m), (m - strikes) * norm.cdf(dev) + sig * norm.pdf(dev), (strikes - m) * norm.cdf(-dev) + sig * norm.pdf(dev))[non0]
return tv This problem also affects functions like logical_and, logical_or and logical_not, which all return numpy.bool_ type rather than 0d array of dtype bool. Best, |
You are suggesting to not silently convert 0-d arrays to scalar. I think that is a very much non-trivial change (and maybe even difficult to know if it might break things), though maybe it is a long term plan. On the other hand, allowing scalar bools to index 0-d arrays -- I don't see a problem with that -- just another special case (or even not really, since there is no point in casting True/False to 1/0 for indexing purpose inside numpy). |
Sure, I can see that it is quite a significant change, but having a better logic in the way 0d arrays work would probably help having less bugs in the future.
|
The problem is that there are many many places where numpy in principle We could fix this in principle but it would be a lot of work, and it's not On Thu, May 9, 2013 at 3:45 PM, Sylvain Corlay notifications@github.comwrote:
|
There are two points here, the first is, make this boolean indexing work. That is simple (but honestly I will not try it, because if I look again at indexing it will be a complete rewrite which is already half done, schedule open though). The other point is the suggestion of actually retaining the 0-d arrays. That is far more complex and dangerous, since for example arrays are mutable and views, it could create huge bugs out in the wild. Note that you actually don't need this part, instead for your case adding another check to the 0-d special case(s) in mapping.c that checks for np.bool_ is sufficient. |
Regarding last comment of njsmith, I would find more logical to keep track of the input type and choose the return type accordingly. |
From a purity point of view you are probably right, but practicality beats purity (and even more so if the pure solution might break user code). I disagree that its very special (just implementation wise). First converting the np.bool_ scalar into an array is something I do not feel is special at all, since we accept scalars for 0-d arrays everywhere, the np.bool_ does not make sense as an index by itself and in that case trying to convert to an array is standard procedure. Second we are talking about the "full boolean index" case, i.e. where the boolean index has exactly the same shape as the array being indexed. Now you probably can define something like |
By the way, in which kind of case could it break things to return 0d-arrays rather tan scalar in the case of 0d-array inputs? Aren't all scalar operations valid for 0d-arrays? |
Probably painting it more dangerous then it is, since in normal usage I guess nothing bad can happen. But hashing for example is not valid for arrays and in-place operations behave different (mutable vs. immutable). I am simply wary that assuming that nobody has weird enough code to trigger such a difference is a good idea. This is nonsense and certainly weird code, but it would break badly:
|
This is now implemented in master by the new index machinery from gh-3798. |
Should there be a separate issue opened to carry on the discussion on the behavior of ufuncs?(Regarding whether they should return 0d-arrays in case of 0d-array inputs) |
Original ticket http://projects.scipy.org/numpy/ticket/823 on 2008-06-17 by @teoliphant, assigned to unknown.
The numpy.bool_ scalar should be allowed for masked-indexing of size-1 arrays. Currently, if x is a 0-d array, then x>3 is a numpy.bool_ objects and for a 0-d array y[x>3] will fail, while it should succeed as if numpy.bool_ were a 0-d array of boolean datatype.
This is related to Ticket #1319. There are some questions about whether or not 0-d arrays should be be "maskable". I think they should be maskable and return a size-0 array if the mask does not succeed.
The text was updated successfully, but these errors were encountered: