New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

MAINT: Removed duplicated code around `ufunc->identity` #8952

Merged

mhvk merged 1 commit into numpy:master from eric-wieser:identity-refactor

Dec 18, 2017

Member

eric-wieser commented Apr 16, 2017 •

edited

There didn't seem to be any value to a assign_identity function - all we
actually care about is the value to assign.

This paves the way for:

easily adding more values (ufunc identity of -inf #7702)
using the identity in more places (reduceat cornercase (Trac #236) #834)

This also fixes #8860 as a side-effect, but introduces a different problem elsewhere:

np.add.reduce(np.array([], object)) - previously False, now 0 (good)
np.logical_or.reduce(np.array([], object)) - previously False, now 0 (not so good)

The latter of these is fixed by #8955 which can be merged after this PR.

This seems to be better behaviour, since at least 0 is what np.logical_or.identity returns. In a future commit, I think we should distinguish between True and 1 in the identity, but I don't want to change too much at once, so I'm leaving that for another PR

eric-wieser added 03 - Maintenance component: numpy._core labels

eric-wieser commented

View reviewed changes

numpy/core/src/umath/reduction.h Outdated

    
            @@ -147,7 +147,7 @@ PyUFunc_ReduceWrapper(PyArrayObject *operand, PyArrayObject *out,
          
                                    npy_bool *axis_flags, int reorderable,

                                    int keepdims,

                                    int subok,

                                    PyArray_AssignReduceIdentityFunc *assign_identity,

                                    PyObject *assign_identity,

Member Author

eric-wieser Apr 16, 2017 •

edited

This stopped being public API in 3626d0c, despite the name suggesting it is - so this change is fine

numpy/core/src/umath/ufunc_object.c

    
                  switch(ufunc->identity) {

                  case PyUFunc_One:

                      *reorderable = 1;

                      return PyInt_FromLong(1);

Member Author

eric-wieser Apr 16, 2017

This used to be True when used internally, and 1 when viewed externally.

The upshot is that this fixes #8860.

Ideally, we'd use True for functions like logical_or, and 1 for functions like add

numpy/core/src/umath/ufunc_object.c

    
                  case PyUFunc_MinusOne:

                      *reorderable = 1;

                      return PyInt_FromLong(-1);

Member Author

eric-wieser Apr 16, 2017

Previously, we cached these values in a static variable. Is this something we should keep doing?

numpy/core/src/umath/ufunc_object.c Outdated

    
                  default:

                      PyErr_Format(PyExc_ValueError, "ufunc has an invalid identity");

                      return NULL;

Member Author

eric-wieser Apr 16, 2017

Could include the ufunc name in the error message, but really we should merge
#8876 before that to pick the right name.

numpy/core/src/umath/ufunc_object.c

    
                              }

                              PyArray_FillWithScalar(op[i], identity);

Member Author

eric-wieser Apr 16, 2017

assign_identity was never anything but this operation

numpy/core/src/umath/ufunc_object.c

    
                                  "ufunc %s has an invalid identity for reduction",

                                  ufunc_name);

                          return NULL;

                  if (_get_bufsize_errmask(NULL, "reduce", &buffersize, &errormask) < 0) {

Member Author

eric-wieser Apr 16, 2017 •

edited

Moving this check to occur first makes cleanup a little easier.

numpy/core/src/umath/ufunc_object.c

    
                      Py_DECREF(identity);

                      identity = Py_None;

                      Py_INCREF(identity);

                  }

Member Author

eric-wieser Apr 16, 2017

I don't think this behaviour is justified, but this is a maintenance commit, so for now, we should leave it in

eric-wieser force-pushed the identity-refactor branch from 7dded6e to 802e447 Compare

April 17, 2017 12:24

Contributor

mhvk commented Apr 17, 2017

It would seem a function to assign the identity is not in itself a crazy idea, as it allows one to calculate it based on *data. However, it doesn't seem a regular user could in fact make use of this, unless there were some plans to make it possible for ufunc->identity itself to be a function (but even if so, I think that could be accommodated just as easily in your scheme).

Member Author

eric-wieser commented Apr 17, 2017 •

edited

a function to assign the identity is not in itself a crazy idea, as it allows one to calculate it based on *data

I think you're confusing "assign" and "calculate" here. A function to calculate the identity from the data (or just the dtype) might make sense, but it doesn't make any sense to me that it should also do the actual assigning - that sounds like a recipe for refcounting errors to me.

Contributor

mhvk commented Apr 17, 2017

Yes, I meant it in the calculate sense, and wondered whether that was ever intended since the function was passed *data (which, as I understand it, are auxiliary data pointers, not the actual arrays). Anyway, overall, I think our conclusion is the same: as is, any such calculation was not possible anyway, and I think your work makes the code quite a bit more readable, which is not unimportant!

Member Author

eric-wieser commented Apr 17, 2017 •

edited

which, as I understand it, are auxiliary data pointers, not the actual arrays

You're correct, I overlooked this. AFAICT, these pointers are for implementing loop-specific behaviour, so would match the "different identity for different dtype" cases.

If we do want this in future, I think it should be performed at a higher level - as a data argument to _get_identity, with perhaps a PyUFunc_IdentityFromData enum value to use it.

The only situation I can construct where loop-specific identities would be useful is if np.add has a specialized string loop to concatenate, in which case the identity would want to be "". But then np.add.identity wouldn't have a good value to return.

For all the numeric types, casting is sufficient to produce the different values

Contributor

mhvk commented Apr 17, 2017

I think you're right it would mostly be useful to make things dtype-dependent (including perhaps structured arrays). In any case, it seems the code as is doesn't really allow that, and the current PR does not preclude adding that later.

This was referenced Apr 17, 2017

ENH: Allow ufunc.identity to be any python object #8955

Merged

reduceat cornercase (Trac #236) #834

Open

Member Author

eric-wieser commented Apr 26, 2017

@mhvk: Any more comments? Does this look ready to merge to you?

Contributor

mhvk commented Apr 26, 2017

I'm still on an observing run, and while this all looks OK, I haven't really thought more about it, so it may be good to have someone else have a look.

eric-wieser requested a review from jaimefrio

April 26, 2017 23:49

Contributor

homu commented May 1, 2017

☔ The latest upstream changes (presumably #8876) made this pull request unmergeable. Please resolve the merge conflicts.

eric-wieser force-pushed the identity-refactor branch 2 times, most recently from 5192858 to 81b5d9e Compare

May 1, 2017 10:47

Member Author

eric-wieser commented Jul 8, 2017

Any chance someone could take a look at this?

Member

charris commented Oct 2, 2017

Needs rebase.

Member Author

eric-wieser commented Oct 3, 2017

Relatedly, what happened to @homu?

eric-wieser force-pushed the identity-refactor branch from 81b5d9e to fb2f67c Compare

October 3, 2017 04:02

Member Author

eric-wieser commented Oct 3, 2017

Man, kdiff3 was awful for rebasing that. Sorted now.

eric-wieser force-pushed the identity-refactor branch 2 times, most recently from e2fed71 to 551a12d Compare

October 3, 2017 04:30


          MAINT: Removed duplicated code around ufunc->identity

31fd25d

There didn't seem to be any value to a `assign_identity` function - all we
actually care about is the value to assign.

This also fixes numpy#8860 as a side-effect, and paves the way for:

* easily adding more values (numpy#7702)
* using the identity in more places (numpy#834)

eric-wieser force-pushed the identity-refactor branch from 551a12d to 31fd25d Compare

October 3, 2017 07:22

Member Author

eric-wieser commented Oct 6, 2017

@charris: Does this one look ready?

Member Author

eric-wieser commented Nov 3, 2017

eric-wieser mentioned this pull request

BUG, MAINT: Ufunc reduce reference leak #10194

Merged

mhvk reviewed

View reviewed changes

Contributor

mhvk left a comment

This does all look OK, so unless @charris or @jaimefrio objects in the next day or so, I'll merge.

Just to be sure, am I correct to assume that given NPY_NOEXPORT, we do not have to worry about anyone using this directly?

Member Author

eric-wieser commented Dec 11, 2017 •

edited

You can see it was removed from the API in this commit

Member Author

eric-wieser commented Dec 18, 2017

@mhvk: looks like there are no objections

mhvk merged commit c18ab2c into numpy:master

Contributor

mhvk commented Dec 18, 2017

OK, merged!

This was referenced Feb 20, 2018

min/max of empty arrays #5032

Closed

Sum/product of empty object array is False/True #10639

Closed

EricCousineau-TRI mentioned this pull request

User-defined dtype: Can be confusing to know which conversions to provide for compatibility with core methods #10904

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment