Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: (closes #4352) any and all applied to object arrays should return bool #11857

Closed
wants to merge 3 commits into from

Conversation

Cheukting
Copy link

BUG: (closes #4352) numpy any and all applied to object arrays should return bool

Fix problem of np.any and np.all not always return bool.

(from EuroSciPy 2018 sprints)

@Cheukting Cheukting changed the title BUG: (#4352) numpy any and all applied to object arrays should return… BUG: (closes #4352) numpy any and all applied to object arrays should return… Sep 1, 2018
@Cheukting Cheukting changed the title BUG: (closes #4352) numpy any and all applied to object arrays should return… BUG: (closes #4352) any and all applied to object arrays should return bool Sep 1, 2018
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, unfortunately this misses the arr.any(). There are also C-level equivalent functions IIRC, so those should be fixed up as well :/. Although it seems that numpy.core._methods includes _any so it is probably easy to give it a similar treatment (in python code).

Overall, while it seems easy, I am not sure this is actually easy to get right.

numpy/core/fromnumeric.py Outdated Show resolved Hide resolved
numpy/core/tests/test_umath.py Outdated Show resolved Hide resolved
@Cheukting
Copy link
Author

@seberg thanks for the review. Right now any() and all() does not call _any at _method but directly call the ufuncs. Now sure if it will change in future version. Will have a look at arr.any() as well, thanks for reminding.

@seberg
Copy link
Member

seberg commented Sep 3, 2018 via email

@seberg
Copy link
Member

seberg commented Sep 3, 2018 via email

@Cheukting
Copy link
Author

so what would be expected if it's
np.any(np.array([object(), 2.4], dtype=object)) and np.any(np.array([np.array([3]), 2.4], dtype=object)) ? In this case should object() to be converted to None and np.array([3]) to 3? These 2 cases is a bit extreme so want to be sure.

@Cheukting
Copy link
Author

unfortunately dtype is not supported in any and all ufunc so dtype=np.bool_ will not work

@eric-wieser
Copy link
Member

unfortunately dtype is not supported in any and all ufunc so dtype=np.bool_ will not work

[citation needed]. Just change

result = _wrapreduction(a, np.logical_or, 'any', axis, None, out,
                            keepdims=keepdims)

to

result = _wrapreduction(a, np.logical_or, 'any', axis, np.bool_, out,
                            keepdims=keepdims)

@eric-wieser
Copy link
Member

@seberg

dtype=np.bool_ probably works, although it
will probably cast all inputs to bool from the start (and not just
force output dtype selection I think)

As far as I know, it's impossible in python for bool(a or b) to produce a different value to bool(a) or bool(b). The only difference is that bool will be evaluated non-lazily.

I think the way to go here would be:

  • Pass dtype=bool under the hood to any and all
  • Add a dtype argument to any / all in case the user really wants to override it back to np.object_

@Cheukting
Copy link
Author

#11857 (comment)

@eric-wieser I have tried that and it causes an error in the _wrapreduction as the reduction it called in this if statement:

    if type(obj) is not mu.ndarray:
        try:
            reduction = getattr(obj, method)
        except AttributeError:
            pass
        else:
            # This branch is needed for reductions like any which don't
            # support a dtype.
            if dtype is not None:
                return reduction(axis=axis, dtype=dtype, out=out, **passkwargs)
            else:
                return reduction(axis=axis, out=out, **passkwargs)

should be without the dtype

@eric-wieser
Copy link
Member

eric-wieser commented Sep 6, 2018

I don't understand what you mean. What error do you get if you make the change I proposed above?

@Cheukting
Copy link
Author

@eric-wieser the error would be as follow:

    def _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs):
        passkwargs = {}
        for k, v in kwargs.items():
            if v is not np._NoValue:
                passkwargs[k] = v

        if type(obj) is not mu.ndarray:
            try:
                reduction = getattr(obj, method)
            except AttributeError:
                pass
            else:
                # This branch is needed for reductions like any which don't
                # support a dtype.
                if dtype is not None:
>                   return reduction(axis=axis, dtype=dtype, out=out, **passkwargs)
E                   TypeError: all() got an unexpected keyword argument 'dtype'

@eric-wieser
Copy link
Member

What is type(obj) when that happens?

@Cheukting
Copy link
Author

obj = matrix([[ True,  True,  True],
        [ True,  True,  True],
        [ True,  True,  True]])

@eric-wieser
Copy link
Member

Then matrix.all needs changing too.

@Cheukting
Copy link
Author

I think it also happened to MaskedArray.all ...... actually applying np.bool_ will cause around 70 fails in the tests. Not sure changing all methods is the best way to apply this fix... please advice.

@eric-wieser
Copy link
Member

Perhaps this PR should come in two parts then - adding support for a dtype argument to all the functions, then a follow-up that attempts to change the default

@seberg
Copy link
Member

seberg commented Sep 6, 2018

Doing the first step would be very nice in any caes maybe (though did not think about it too much). We also have to worry a bit about changing someone code. I am not really worried about it, but I don't see a nice way to warn here.
That shouldn't stop us, but it might delay actually changing it for a bit I think.

EDIT: Sorry that numpy has "traps" like this. Things sometimes look easy on first sight and then get sticky very fast when it gets being careful about breaking things and just the details.

@Cheukting
Copy link
Author

Ok, I will give part 1 a try then. I will leave it here after I finish part 1 then. Thanks for being patient with me and giving advice.

@Cheukting
Copy link
Author

Cheukting commented Sep 19, 2018

The problem was pin pointed, in core/code_generators/generate_umath.py:

'logical_and':
    Ufunc(2, 1, One,
          docstrings.get('numpy.core.umath.logical_and'),
          'PyUFunc_SimpleBinaryComparisonTypeResolver',
          TD(nodatetime_or_obj, out='?', simd=[('avx2', ints)]),
          TD(O, f='npy_ObjectLogicalAnd'),
          ),
'logical_not':
    Ufunc(1, 1, None,
          docstrings.get('numpy.core.umath.logical_not'),
          None,
          TD(nodatetime_or_obj, out='?', simd=[('avx2', ints)]),
          TD(O, f='npy_ObjectLogicalNot'),
          ),
'logical_or':
    Ufunc(2, 1, Zero,
          docstrings.get('numpy.core.umath.logical_or'),
          'PyUFunc_SimpleBinaryComparisonTypeResolver',
          TD(nodatetime_or_obj, out='?', simd=[('avx2', ints)]),
          TD(O, f='npy_ObjectLogicalOr'),
          ),

For dtype = 'O' it will fall into the last td (e.g. TD(O, f='npy_ObjectLogicalOr')) which will require npy_ObjectLogicalNot (and adding out='?' only does NOT work)

which is implemented in core/src/umath/funcs.inc.src:

/* Emulates Python's 'a or b' behavior */
static PyObject *
npy_ObjectLogicalOr(PyObject *i1, PyObject *i2)
{
    if (i1 == NULL) {
        Py_XINCREF(i2);
        return i2;
    }
    else if (i2 == NULL) {
        Py_INCREF(i1);
        return i1;
    }
    else {
        int retcode = PyObject_IsTrue(i1);
        if (retcode == -1) {
            return NULL;
        }
        else if (retcode) {
            Py_INCREF(i1);
            return i1;
        }
        else {
            Py_INCREF(i2);
            return i2;
        }
    }
}

which is mimicking the python and, not sure if changing it to always return bool is a good thing... please advice

@hameerabbasi
Copy link
Contributor

I think #11857 (comment) is a simple fix that will work. If needed, I can make a PR.

@gimseng
Copy link

gimseng commented Oct 8, 2020

Any progress on this? What's the tldr of the discussion so far? Thanks !

Base automatically changed from master to main March 4, 2021 02:04
@rgommers
Copy link
Member

rgommers commented May 9, 2021

@Cheukting I resolved the merge conflicts and rebased to get a sense of what's going on with this. Locally all tests pass for me.

Argh, I was almost done writing a really long comment, and now I lost it due to a UI glitch. Let me summarize only:

  • Current tests look good and pass
  • your try-except is correct but needs to be made a bit more specific, or changed to hasattr(a, 'astype')
  • the reason that @eric-wieser's suggestion for using np.bool_ cannot be done instead of the try-except is that that would not be backwards-compatible (_wrapreduction will call obj.any/obj/all for third-party obj which is not guaranteed to have a dtype keyword; so fixing up np.matrix is not enough)
  • the bigger issue is that ndarray.any/all still have the old behaviour
  • you want to write some new tests that verify the equivalence of functions and methods. ideally with @pytest.mark.parametrize looping over all dtypes. I'd expect only object dtype to fail
  • then we should get back to the question you ended with in your last comment.

@rgommers
Copy link
Member

rgommers commented May 9, 2021

which is mimicking the python and, not sure if changing it to always return bool is a good thing

I think any and all should always return a bool. I don't see why they should follow and, which has different semantics:

>>> 4 and np
<module 'numpy' from '/home/rgommers/code/numpy/numpy/__init__.py'>
>>> all([3, 4, np, [1]])
True

@Cheukting
Copy link
Author

Hi, @rgommers sorry I was having some issues with my computer yesterday so I have not much progress. I will try again this coming weekend and let you know how it goes. Thanks for helping me out.

@rgommers
Copy link
Member

That's too bad, the sprint was fun! Anyway, I'll keep an eye on this, would be nice to get it merged:)

@charris
Copy link
Member

charris commented Jun 9, 2022

@Cheukting This needs a rebase.

@Cheukting
Copy link
Author

Hi @charris

Thanks for reminding me about this. I am afraid this is too old and would not be relevant anymore. I am happy to close it.

Looking forward to work on something else in the future :-)

@Cheukting Cheukting closed this Jun 9, 2022
@seberg
Copy link
Member

seberg commented Jun 9, 2022

Still very relevant, but we should maybe decide whether we want to change this for any/all, or directly in np.logical_or.reduce().
And it might be a "larger" change (in the sense that it is hard to add a warning, but might break someone who relies on the current behavior). The change itself is fairly straight forward...

@Cheukting
Copy link
Author

Cheukting commented Jun 9, 2022

Ok, I will reopen and rebase it to keep this discussion. However, I would need more information and guidance to move forward. I would really love to get this done and move on 😅

@Cheukting Cheukting reopened this Jun 9, 2022
@seberg seberg added the triage review Issue/PR to be discussed at the next triage meeting label Jun 9, 2022
@seberg
Copy link
Member

seberg commented Jun 9, 2022

We need to make a decision, lets see if we can manage in the next triage meeting. My suspicion is that:

  • The change here (as proposed) modifying only any and all may just fly.
  • The larger change would likely be nice, but maybe should be part of a 2.0 release (which might be the next release, but its hard to plan)

@Cheukting Cheukting changed the base branch from main to maintenance/1.0.3.x June 9, 2022 16:51
@Cheukting Cheukting changed the base branch from maintenance/1.0.3.x to main June 9, 2022 16:51
@seberg
Copy link
Member

seberg commented Dec 14, 2022

Mailing list ping to see if anyone voices concerns about trying this: https://mail.python.org/archives/list/numpy-discussion@python.org/thread/6JE4FGYCRTCURGPSDZV4X7HKY77PUZMU/

I need to take a closer look, because right now I am confused about the None special path, and not sure we should bother about returning a Python bool for scalars (unless we currently do, which may be!).

@melissawm
Copy link
Member

@seberg would you mind taking a look again or guiding @Cheukting through on what needs to be done here? If the best idea is to start from scratch or for someone else to take over, that is ok too. Cheers!

@seberg
Copy link
Member

seberg commented Jul 17, 2023

OK, lets close it... We could aim for logical_or directly nowadays, but not sure if that is even better.

In either case, this doesn't look terrible or should be complicated. The dtype=np.bool_ solution was always the right approach, NumPy arrays support it already, but the main thing is that _wrapreduction doesn't support that not passing anything should default to using it.

We can add dtype, but maybe it doesn't even matter: We should just set it implicitly in the array method (change default) and inline _wrapreduction to pass it explicitly.

(I am not convicned that any/all need dtype=, if you really want that, using np.logical_or.reduce() seems OK; although we are missing a reference to it.)

@seberg seberg closed this Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug component: numpy._core triage review Issue/PR to be discussed at the next triage meeting
Projects
Development

Successfully merging this pull request may close these issues.

numpy any and all applied to object arrays should return booleans.
8 participants