Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: np.piecewise not working for scalars #8194

Merged
merged 1 commit into from
Nov 9, 2016

Conversation

alvarosg
Copy link

@alvarosg alvarosg commented Oct 21, 2016

I was using np.piecewise to create a piecewise lambda function, and I realized that the way it is written it only allows to work with ndarrays, and not with list of scalars for the condlist argument.

Similarly to may other functions in np which can handle both ndarrays and native types, this also should.

I am making the following function:

import numpy as np

def buildPieceWise(x):
    condlist=[x<-1,(x>=-1)*(x<1),x>1]
    funclist=[lambda x: -x, 0, lambda x:x]
    return np.piecewise(x,condlist,funclist)

This works fine:

>>>print buildPieceWise(np.linspace(-2,2,11))
[ 2.   1.6  1.2  0.   0.   0.   0.   0.   1.2  1.6  2. ]

But this fails:

>>>print buildPieceWise(-1.5)
C:\Anaconda\lib\site-packages\numpy\lib\function_base.pyc in piecewise(x, condlist, funclist, *args, **kw)
    773     if (n != n2):
    774         raise ValueError(
--> 775                 "function list and condition list must be the same")
ValueError: function list and condition list must be the same

And the reason is that internally, it is detecting that my masklist is [False,False,True], this part of the code:

if (isscalar(condlist) or not (isinstance(condlist[0], list) or
                                   isinstance(condlist[0], ndarray))):
        condlist = [condlist]

makes len(condlist)==1.

I have replaced that condition by:

if (isscalar(condlist) or not (isinstance(condlist[0], list) or
                                   isinstance(condlist[0], ndarray))):
        if not isscalar(condlist) and x.size == 1 and len(x.shape) == 0:
            condlist = [[c] for c in condlist]
        else:
            condlist = [condlist]

This makes it work in both cases:

import numpy as np

def buildPieceWise(x):
    condlist=[x<-1,(x>=-1)*(x<1),x>1]
    funclist=[lambda x: -x, 0, lambda x:x]
    return np.piecewise(x,condlist,funclist)
>>>print buildPieceWise(np.linspace(-2,2,11))
>>>print buildPieceWise(-1.5)
[ 2.   1.6  1.2  0.   0.   0.   0.   0.   1.2  1.6  2. ]
1.5

@alvarosg
Copy link
Author

I just noticed that the changes are making this test fail:

np.piecewise([0, 0], [True, False], [1])

But I do not understand why the default behavior is for this not to fail, and to return [1,0], instead. I It must by undocumented behaviour:

condlist : list of bool arrays
Each boolean array corresponds to a function in funclist. Wherever condlist[i] is True, funclisti is used as the output value.
Each boolean array in condlist selects a piece of x, and should therefore be of the same shape as x.
The length of condlist must correspond to that of funclist. If one extra function is given, i.e. if len(funclist) - len(condlist) == 1, then that extra function is the default value, used wherever all conditions are false.

In that example the number of functions is neither equal, or one more than the number of conditions.
In reality it is interpreting the call as if it was:

np.piecewise([0, 0], [[True, False]], [1])

which in fact also returns [1,0].
However, the documentation clearly says that condlist is a LIST of boolean arrays, and not a list of arrays. So if in this case there is only one function, and one boolean array it should be inputed as [[True, False]].

@seberg
Copy link
Member

seberg commented Oct 21, 2016

Did not check, but it is not completely impossible that the test more documents behaviour then actually trying to fix it (i.e. there might even be a comment that it should be deprecated). It is however a hint that even if the current behaviour is bad, a deprecation/future warning cycle should likely be made before changing it.

@alvarosg
Copy link
Author

alvarosg commented Oct 21, 2016

@seberg
I think I may be able to make it work in both cases, by checking the shape of the input compared to the shape of the conditions, but I am not sure if there will be more tests that will still fail.

@alvarosg
Copy link
Author

alvarosg commented Oct 22, 2016

@seberg

I have found a way to do this, changing the behaviour only in the cases when it was failing, so 100% retrocompatible.

Essentially I have modified my original condition to be:

    if (isscalar(condlist) or not (isinstance(condlist[0], list) or
                                   isinstance(condlist[0], ndarray))):
        if not isscalar(condlist) and x.size == 1 and len(x.shape) == 0:
            condlist = [[c] for c in condlist]
        else:
            condlist = [condlist]

and before my fork it was:

    if (isscalar(condlist) or not (isinstance(condlist[0], list) or
                                   isinstance(condlist[0], ndarray))):
        condlist = [condlist]

This means that the change in behaviour will only happen when the condition list is not a scalar (not isscalar(condlist)), the condition list is also not a lists of lists or arrays (isinstance(condlist[0], list) or isinstance(condlist[0], ndarray)), and the input was a scalar (x.size == 1 and len(x.shape) == 0).

The only circumstance when this can be different than before is when the input is a single value, and the condition list is something of the shape [True,False,True,...,False]. And the only subset of this cases that overlaps with the behaviour that I described in my previous comment is when the length in the of the input and the length of the conditions is one, in which case the result of condlist = [condlist] and condlist = [[c] for c in condlist] should be the same.

It should now be good to go :)

@alvarosg
Copy link
Author

@seberg @charris

I actually realized, based on one of the old existing tests:

x = 3
piecewise(x, [x <= 3, x > 3], [4, 0])  # Should succeed.

that the behaviour that I was implementing was already expected to work. So this is not really an enhancement, but a bug correction.

Essentially that test was not failing becuase if only two intervals are given, then after condlist=[condlist], len(condlist)==1 which is still allowed for len(functlist)==2. However, as soon as there are three intervals len(condlist) is still 1, but len(functlist)==3, making it stop working.

@seberg
Copy link
Member

seberg commented Oct 23, 2016

@tkamishima is this the same bug you were going to fix in gh-7800?

@alvarosg
Copy link
Author

alvarosg commented Oct 23, 2016

@seberg It does look like the same problem. In fact I just realize that the current implementation in 1.11 does not give an error (as I suggested in my first post, I was using 1.10 when I first tried that, my bad), but it still gives the wrong value as @tkamishima pointed out (I was not aware of that previous pull request at all).

However, I still think that my solution is a bit less invasive, as it only changes behavior in a very particular subcase, leaving the function untouched in all other cases. This should be better than changing the location of n = len(condlist), which may also work, but it is much difficult to justify not changing behavior in other cases.

@tkamishima
Copy link
Contributor

@seberg @alvarosg This patch can fix the bug that I want to fix, and I confirmed that the unittests added in my pactch could be passed by @alvarosg 's patch.

I leave it up to @seberg decision which patch would be used.

Note: I tried to remove the hack as @seberg suggested, but I failed to simplify; it's too complicated.

@seberg
Copy link
Member

seberg commented Oct 24, 2016

Frankly, I am not sure that this fix is correct, can we go a step back and make sure we are on the same page as to what is right? I still would like to get rid of that zerod hack, which I think is the reason for this confusing condlist manipulation....

There are two basic things that you can do with condlist when condlist has less then n+1 dimensions:

  1. You assume that there is only a single condition
  2. You use boolean indexing broadcasting-like rules (ignore the missing dimensions). (Warning: This is not broadcasting, for broadcasting the indexing operations would have to be y[..., condlist[k]]).
  3. Throw an error (or warning at least).

Now what actually happens? (this might be wrong):

  • If condlist.ndim == 0, fine you can make it 1D if you really want, since 0D, does not really make sense in any case (though we could deprecate that as well). (unrelated to the above)
  • If condlist.ndim == 1 all is fine for x.ndim == 0 (after your fix), for x.ndim > 0 we have behaviour 1.
  • If condlist.ndim > 1 we get behaviour 2 always.

I don't like the fact that we get two very different types of behaviour based on the dimensionality of the inputs. An option might be to just deprecate it to go with 3. Or we try to consolidate it (will need future warnings)? To me option 2 seems slightly more sensible, but because boolean indexing does not actually truly broadcast, that is very slightly, so I am actually tempted to think option 3 is best...

My simplified piecewise code:

def piecewise(x, condlist, funclist):
    x = np.asanyarray(x)
    cond = np.array(condlist, copy=False, dtype=bool, ndmin=1); condlist=list(cond)
    if len(funclist) == len(condlist) + 1:
        condlist.append(~np.logical_or.reduce(cond, axis=0))
    y = np.zeros(x.shape, x.dtype)
    for k in range(len(condlist)):
        item = funclist[k]
        if not isinstance(item, collections.Callable):
            y[condlist[k]] = item
        else:
            vals = x[condlist[k]]
            if vals.size > 0:
                y[condlist[k]] = item(vals)
    return y

This function always uses option 2 (plus the "just make it 1-d if its 0-d logic). But I think adding the axis insertions after the cond array creation here is probably much more straight forward (especially since there is no weird x.ndim == 0 avoidance logic.

@seberg
Copy link
Member

seberg commented Oct 24, 2016

It would also be good to throw actual errors when funclist is too long....

@alvarosg
Copy link
Author

alvarosg commented Oct 24, 2016

@seberg
I agree with you, going a step back is the best option. And if we can find a way to make it more elegant that it is right now, while keeping back-compatibility, we should do it. Basically the "hack" was not very well implemented...

On the other hand, maybe we should treat this PR as a bug, and not an enhancement:
With the existing current implementation:

>>> np.piecewise(3, [False, True, False], [1, 2, 3])
array(0)
>>> np.piecewise(3, [False, True, False], [1, 2, 3, 4])
array(0)

It just completely ignores everything after the first condition, and after the first function. Furthermore all those cases are cases where condlist has exactly n+1 dimensions, so it should be a no brainer.

My tiny fix, gets this:

>>> np.piecewise(3, [False, True], [1, 2])
array(2)
>>> np.piecewise(3, [False, False], [1, 2, 3])
array(3)
>>> np.piecewise(3, [False, True, False], [1, 2, 3])
array(2)
>>> np.piecewise(3, [False, False, False], [1, 2, 3])
array(0)
>>> np.piecewise(3, [False, True, False], [1, 2, 3, 4])
array(2)
>>> np.piecewise(3, [False, False, False], [1, 2, 3, 4])
array(4)

And because of the way it is implemented we know is only acting differently from the previous implementation in cases when: x.ndim==0, condlist.ndim=1, and len(condlist)>1. And almost all possible tests under those conditions are there.

So my proposal would be to merge this as it is, and start a separate PR to make it nicer, including exceptions, making as many tests as we can to make sure it is as consistent as possible with previous behaviour. If that makes it before the next release, great. If it does not, then we at least would have solved the bug.

@seberg
Copy link
Member

seberg commented Oct 25, 2016

Yes, I am OK with doing the bug fix only thing, but had to have a bit deeper look to see what exactly your code changes anyway. I guess we can probably put this in as is, could you squash the commits and make the commit message "BUG: ...", etc. as in the dev guidelines?

@seberg
Copy link
Member

seberg commented Oct 25, 2016

As an example, does the code

        if condlist.shape[-1] != 1:
             condlist = condlist.T

actually do anything...?

@alvarosg
Copy link
Author

alvarosg commented Oct 25, 2016

@seberg

EDIT: Squash has been done

Great, I will squash the commits and include an appropriate message. A couple of questions:

  • Do you want me to rebase to master? Currently this is based in maintenance.
  • Is there something similar to Thanks.txt in scipy to acknowledge contributions? I saw that there is an equivalent file here, but it is not really updated for small contributions like this, and hasn't been for 6 years hehehe.

I will get back to you about the latest question later when I have access to a computer!

@alvarosg alvarosg changed the title np.piecewise to also work with lists of scalars BUG: np.piecewise not working for scalars Oct 25, 2016
@alvarosg
Copy link
Author

@seberg

About this:

As an example, does the code
if condlist.shape[-1] != 1:
condlist = condlist.T
actually do anything...?

The reason for this lines is actually related to what I have done.

Essentially my implementation is preventing the function from transforming a one dimensional array of conditions in something like [cond1, cond2,...,condn] when the input is zero-dimensional, and instead transform it into [cond1],[cond2],...,[condn].

This was doing exactly the same, but a posteriori: First it makes it into (A), and then, if the input was zero-dimensional, it transposes it to make it like (B). the problem, is that if calculates n = len(condlist), before transposing, and this leads to bad behavior, hence, @tkamishima solution.

The good thing, is that after this fix, we will just never be in a case where that happens(Only when condlist.shape=[1] --> condlis.shape=[1,1] , and in that case, transposing does nothing) , so now we can just remove those two lines. I will do that.

@seberg
Copy link
Member

seberg commented Oct 26, 2016

Yeah please always start of against master (not sure you can change the PR in that regard maybe need to make a new PR). Yeah not sure about thanks.txt, we have not really been using it anyway and but list contributers in the individual release notes. Luckily with version control you basically got a list in some sense anyway.

@alvarosg alvarosg changed the base branch from maintenance/1.11.x to master October 26, 2016 08:48
@alvarosg
Copy link
Author

@seberg

The rebase to master is done, and everything is working. And for future reference, yes, there is an option in the pull requests to change the base branch. So I just had to cherry pick my last commit and re run it in top of master.

I did not edit the release notes. Based on your message and the commits history I assumed you would do that.

@seberg
Copy link
Member

seberg commented Oct 26, 2016

We add the contributers at release times from the git history and a bug fix like this does not have to be in the release notes for changed behaviour, so it should be fine. Will look over it later though, no time now.

@alvarosg
Copy link
Author

@seberg

Looking further into solving the bug in a more elegant way I tried a different approach to make scalar and arrays work more similarly by casting scalars into arrays with flatten, and then back to scalars with reshape.

Essentially I made this wrapper:

def piecewise(x, condlist, *args, **kwargs):
    x = np.asarray(x)
    shape = x.shape
    if x.ndim==0:
        if hasattr(condlist, '__iter__'):
            condlist = [np.asarray(c).flatten() for c in condlist]
        else:
            condlist = [[condlist]]
    x = x.flatten()
    xout = np.piecewise(x, condlist, *args, **kwargs)
    return xout.reshape(shape)

Which passes all the tests, because it essentially always makes the input how it should be: x is an array, and condlist is a list of arrays (So no further reshaping would be required inside piecewise). I guess the downside of this may be the mandatory casting into an array for efficiency-wise. But I am sure we can find a balance.

If you like this approach I can explore it further. It would take some time as we should include more tests on the old behaviour so we make sure it really is backwards compatible, so I would still prefer doing this in the context of a different PR (Essentially so the tests we just added in this PR are already taken into account).

@alvarosg
Copy link
Author

alvarosg commented Nov 6, 2016

@seberg

I had some time this weekend to look into this, and came with the following implementation for piecewise that passes all the tests, without the previously existing hack. The idea is to first homogenize the input so the x is always a 1-d array, and the list of conditions a list of 1-d arrays (2-d array) following the existing behavior, and then reshape the output at the end according to the input.

It is a bit similar to what you proposed here, but taking into account some of the undocumented behaviour, and the zero-d case (I tried your version as it was, and it made lots of the tests fail.)

def piecewise(x, condlist, funclist, *args, **kw):
    x = asarray(x)
    condlist = array(condlist, dtype=bool)

    # Make sure that piecewise([0,1],[True,False]), 
    # is interpreted as piecewise([0,1],[[True,False]]), 
    # according to previous undocumented behavior
    if x.ndim==condlist.ndim:
        condlist = [condlist]

    # We flatten everything, this way, 0-d arrays
    # can be treated in exactly the same way as n-d arrays
    condlist = array([np.asarray(c).flatten() for c in condlist])   
    shape = x.shape
    x = x.flatten()

    # We look at the lenght only after normalizing input
    nf = len(funclist)
    nc = len(condlist)

    # Adding the default case, when there are more functions than conditions
    if nf == nc + 1:
        totlist = np.logical_or.reduce(condlist, axis=0)
        condlist = np.vstack([condlist, ~totlist])
        nc += 1

    # Calculating output
    y = zeros(x.shape, x.dtype)
    for k in range(nc):
        item = funclist[k]
        if not isinstance(item, collections.Callable):
            y[condlist[k]] = item
        else:
            vals = x[condlist[k]]
            if vals.size > 0:
                y[condlist[k]] = item(vals, *args, **kw)

    return y.reshape(shape)

It is shorter than the previous version (after removing the comments), and in my opinion, it is clearer in what is doing what.

It may still need some work regarding optimizing array copying, and making more tests (specially with higher dimensional arrays, for which there are none, now) to check against previous behavior. Also, I would need to add some exceptions.

I am happy to work on this as long as you think it is worth it and will be merged. Please, let me know :)

@seberg
Copy link
Member

seberg commented Nov 6, 2016

If there are enough new tests, we are happy to merge cleanups, if you
are interested in making this nicer.

The way I did it with the 0-d arrays, at the very least the next stage
in the indexing deprecations would have to be done first, I guess
(possibly more, but high_d_array[True], etc. won't work currently
IIRC). If that would make things nicer, I think we might be able to do
that next step after 1.12 is branched off for good. But more things
might be wrong with my approach....

@alvarosg
Copy link
Author

alvarosg commented Nov 6, 2016

The way I did it with the 0-d arrays, at the very least the next stage
in the indexing deprecations would have to be done first, I guess
(possibly more, but high_d_array[True], etc. won't work currently
IIRC). If that would make things nicer, I think we might be able to do
that next step after 1.12 is branched off for good.

Yes, I guess it may be better to do the cleanup after 1.12, so we can count on, e.g. high_d_array[True] working.
Would it then be better in that case to first solve the BUG by merging the code as it is in this PR for 1.12, so the bug is gone in the next release, and then do the cleanup afterwards directly on 1.12, probably with a hybrid between your code and my code proposed before?

But more things
might be wrong with my approach....

I think then the only other thing that your approach would be missing is what to do in cases where x.ndim == condlist.ndim, which as for current behaviour it should be interpreted as if condlist was [condlist].

@seberg
Copy link
Member

seberg commented Nov 9, 2016

OK, I will merge it as is then. @charris might make sense to still squeeze into 1.12, though maybe also not a big deal.

@seberg seberg merged commit 0268680 into numpy:master Nov 9, 2016
@alvarosg
Copy link
Author

alvarosg commented Nov 9, 2016

Great, just let me know if/when you want me to create a separate PR to refactor piecewise

@charris
Copy link
Member

charris commented Nov 10, 2016

OK, will put it in. I'm planning on releasing the beta this weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants