Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Removed duplicated code around ufunc->identity #8952

Merged
merged 1 commit into from Dec 18, 2017

Conversation

eric-wieser
Copy link
Member

@eric-wieser eric-wieser commented Apr 16, 2017

There didn't seem to be any value to a assign_identity function - all we
actually care about is the value to assign.

This paves the way for:

This also fixes #8860 as a side-effect, but introduces a different problem elsewhere:

  • np.add.reduce(np.array([], object)) - previously False, now 0 (good)
  • np.logical_or.reduce(np.array([], object)) - previously False, now 0 (not so good)

The latter of these is fixed by #8955 which can be merged after this PR.

This seems to be better behaviour, since at least 0 is what np.logical_or.identity returns. In a future commit, I think we should distinguish between True and 1 in the identity, but I don't want to change too much at once, so I'm leaving that for another PR

@@ -147,7 +147,7 @@ PyUFunc_ReduceWrapper(PyArrayObject *operand, PyArrayObject *out,
npy_bool *axis_flags, int reorderable,
int keepdims,
int subok,
PyArray_AssignReduceIdentityFunc *assign_identity,
PyObject *assign_identity,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stopped being public API in 3626d0c, despite the name suggesting it is - so this change is fine

switch(ufunc->identity) {
case PyUFunc_One:
*reorderable = 1;
return PyInt_FromLong(1);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to be True when used internally, and 1 when viewed externally.

The upshot is that this fixes #8860.

Ideally, we'd use True for functions like logical_or, and 1 for functions like add


case PyUFunc_MinusOne:
*reorderable = 1;
return PyInt_FromLong(-1);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we cached these values in a static variable. Is this something we should keep doing?


default:
PyErr_Format(PyExc_ValueError, "ufunc has an invalid identity");
return NULL;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could include the ufunc name in the error message, but really we should merge
#8876 before that to pick the right name.

}
PyArray_FillWithScalar(op[i], identity);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assign_identity was never anything but this operation

"ufunc %s has an invalid identity for reduction",
ufunc_name);
return NULL;
if (_get_bufsize_errmask(NULL, "reduce", &buffersize, &errormask) < 0) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this check to occur first makes cleanup a little easier.

Py_DECREF(identity);
identity = Py_None;
Py_INCREF(identity);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this behaviour is justified, but this is a maintenance commit, so for now, we should leave it in

@mhvk
Copy link
Contributor

mhvk commented Apr 17, 2017

It would seem a function to assign the identity is not in itself a crazy idea, as it allows one to calculate it based on *data. However, it doesn't seem a regular user could in fact make use of this, unless there were some plans to make it possible for ufunc->identity itself to be a function (but even if so, I think that could be accommodated just as easily in your scheme).

@eric-wieser
Copy link
Member Author

eric-wieser commented Apr 17, 2017

a function to assign the identity is not in itself a crazy idea, as it allows one to calculate it based on *data

I think you're confusing "assign" and "calculate" here. A function to calculate the identity from the data (or just the dtype) might make sense, but it doesn't make any sense to me that it should also do the actual assigning - that sounds like a recipe for refcounting errors to me.

@mhvk
Copy link
Contributor

mhvk commented Apr 17, 2017

Yes, I meant it in the calculate sense, and wondered whether that was ever intended since the function was passed *data (which, as I understand it, are auxiliary data pointers, not the actual arrays). Anyway, overall, I think our conclusion is the same: as is, any such calculation was not possible anyway, and I think your work makes the code quite a bit more readable, which is not unimportant!

@eric-wieser
Copy link
Member Author

eric-wieser commented Apr 17, 2017

which, as I understand it, are auxiliary data pointers, not the actual arrays

You're correct, I overlooked this. AFAICT, these pointers are for implementing loop-specific behaviour, so would match the "different identity for different dtype" cases.

If we do want this in future, I think it should be performed at a higher level - as a data argument to _get_identity, with perhaps a PyUFunc_IdentityFromData enum value to use it.

The only situation I can construct where loop-specific identities would be useful is if np.add has a specialized string loop to concatenate, in which case the identity would want to be "". But then np.add.identity wouldn't have a good value to return.

For all the numeric types, casting is sufficient to produce the different values

@mhvk
Copy link
Contributor

mhvk commented Apr 17, 2017

I think you're right it would mostly be useful to make things dtype-dependent (including perhaps structured arrays). In any case, it seems the code as is doesn't really allow that, and the current PR does not preclude adding that later.

@eric-wieser
Copy link
Member Author

@mhvk: Any more comments? Does this look ready to merge to you?

@mhvk
Copy link
Contributor

mhvk commented Apr 26, 2017

I'm still on an observing run, and while this all looks OK, I haven't really thought more about it, so it may be good to have someone else have a look.

@homu
Copy link
Contributor

homu commented May 1, 2017

☔ The latest upstream changes (presumably #8876) made this pull request unmergeable. Please resolve the merge conflicts.

@eric-wieser eric-wieser force-pushed the identity-refactor branch 2 times, most recently from 5192858 to 81b5d9e Compare May 1, 2017 10:47
@eric-wieser
Copy link
Member Author

Any chance someone could take a look at this?

@charris
Copy link
Member

charris commented Oct 2, 2017

Needs rebase.

@eric-wieser
Copy link
Member Author

Relatedly, what happened to @homu?

@eric-wieser
Copy link
Member Author

Man, kdiff3 was awful for rebasing that. Sorted now.

@eric-wieser eric-wieser force-pushed the identity-refactor branch 2 times, most recently from e2fed71 to 551a12d Compare October 3, 2017 04:30
There didn't seem to be any value to a `assign_identity` function - all we
actually care about is the value to assign.

This also fixes numpy#8860 as a side-effect, and paves the way for:

* easily adding more values (numpy#7702)
* using the identity in more places (numpy#834)
@eric-wieser
Copy link
Member Author

@charris: Does this one look ready?

@eric-wieser
Copy link
Member Author

@charris: Ping

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does all look OK, so unless @charris or @jaimefrio objects in the next day or so, I'll merge.

Just to be sure, am I correct to assume that given NPY_NOEXPORT, we do not have to worry about anyone using this directly?

@eric-wieser
Copy link
Member Author

eric-wieser commented Dec 11, 2017

You can see it was removed from the API in this commit

@eric-wieser
Copy link
Member Author

@mhvk: looks like there are no objections

@mhvk mhvk merged commit c18ab2c into numpy:master Dec 18, 2017
@mhvk
Copy link
Contributor

mhvk commented Dec 18, 2017

OK, merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: ufunc.reduce(ndarray(dtype=object)) returns bool
4 participants