-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: DOC: improving error messages for casting errors in ufuncs #14828 #14843
base: main
Are you sure you want to change the base?
Conversation
catch up to numpy
catch up to master
In principle this looks like an improvement. Could you show a direct comparison of old-to-new? Also, is there a short-hand notation we could use to keep the error message length from getting out of hand, perhaps |
PyTuple_SetItem(froms, j, (PyObject *)PyArray_DESCR(operands[j])); | ||
PyTuple_SetItem(tos, j, (PyObject *)dtypes[j]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing Py_INCREF(value)
here - PyTuple_SetItem
steals references.
Also, you might as well use PyTuple_SETITEM
, it's faster and no more dangerous
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your helpful comments so far!
I included the Py_INCREF
on the dtypes[j]
object and it removed a segfault in an edge case I found. I don't think there needs to be one with regards to the PyArray_DESCR(operands[j])
object because PyArray_DESRC
returns a borrowed reference, correct me if i misunderstand something there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are borrowed references, and this is exactly why you do need to incref them an extra time - when you call a function like SETITEM
that steals a reference, you must make sure the reference is yours to give (ie not borrowed).
These lines will likely need to change too: Lines 83 to 96 in dcc1fc2
|
I think it might be worth renaming the python class members to be something like: @_display_as_base
class _UFuncCastingError(UFuncTypeError):
def __init__(self, ufunc, casting, call_dtypes, loop_dtypes):
super().__init__(ufunc)
self.casting = casting
self.call_dtypes = call_dtypes
self.loop_dtypes = loop_dtypes
@_display_as_base
class _UFuncInputCastingError(_UFuncCastingError):
# other members as before
@property
def from_(self):
return self.call_dtypes[self.in_i]
@property
def to(self):
return self.loop_dtypes[self.in_i]
@_display_as_base
class _UFuncOutputCastingError(_UFuncCastingError):
# other members as before
@property
def from_(self):
return self.loop_dtypes[self.ufunc.nin + self.out_i]
@property
def to(self):
return self.call_dtypes[self.ufunc.nin + self.out_i] This has the nice benefit of not changing the meaning of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle this looks like an improvement. Could you show a direct comparison of old-to-new? Also, is there a short-hand notation we could use to keep the error message length from getting out of hand, perhaps
dtype.str
would be more compact
For the original (mentioned in issue #14843) example:
import numpy as np
x = np.arange(5, dtype='u8')
y = np.arange(5, dtype='i8')
x += y
The result for the current codebase is:
TypeError: Cannot cast ufunc add output from dtype('float64') to dtype('uint64') with casting rule 'same_kind'
The result from this change is:
UFuncTypeError: Output of ufunc add(uint64, int64, out=uint64) resolved to the add(float64, float64, out=float64) loop, but np.can_cast(np.float64, np.uint64, casting='same_kind') is False
I agree that that a shorter message is better, but I opted for the np.* form because it allows the code from the message to be directly copy and pasted into a REPL to play with it; if shorter takes precedence I can take the suggestion and make it less verbose.
PyTuple_SET_ITEM(froms, j, Py_None); | ||
} else { | ||
PyTuple_SET_ITEM(froms, j, (PyObject *)PyArray_DESCR(operands[j])); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check seems necessary in some cases where the operands
are not all available, is there a chance that the dtypes
will contain NULL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I don't know the answer
As only a regular user of numpy the mixed (with respect to the presence of
It's true that you can't directly throw the last call into a REPL if you only have |
I totally see your point with the consistency. Any other opinions out there? If it were my choice I would want to have |
numpy/core/_exceptions.py
Outdated
i_str = "{} ".format(self.out_i) if self.ufunc.nout != 1 else "" | ||
if self.ufunc.nout > 1: | ||
from_outs = ", ".join(["out{}={}".format(i+1, f.name) for i, f in enumerate(self.from_) if i>=self.ufunc.nin]) | ||
to_outs = ", ".join(["out{}={}".format(i+1, t.name) for i, t in enumerate(self.to) if i>=self.ufunc.nin]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't produce a legal call, it should be:
to_outs = ", ".join(["out{}={}".format(i+1, t.name) for i, t in enumerate(self.to) if i>=self.ufunc.nin]) | |
to_outs = "out=({})".format(", ".join(t.name for t in self.to[self.ufunc.nin:])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same for the other three lines)
Numpy is not the only module capable of defining ufuncs, so prepending |
@micha2718l, do you think you'll have time to come back to this? |
@micha2718l it would be nice to finish this up. We are nearing the cutoff for the next release. |
merge origin nump/numpy
@mattip @eric-wieser |
I only have the same trivial remark as earlier: the mixture of But this might be bikeshedding territory, and I don't have the perspective to tell whether my preference is actually better (but unless I missed it I don't think we got a clear core dev statement on the matter, beyond that |
I missed that comment when I returned to this (and it had been a while so I had forgot about your comment). I still think there are valid points on either side, if any core have an opinion it is welcome. I'm open to any of the three potential options here, all np, no np, or the hybrid as is; which balances brevity and usefulness IMHO |
Thanks for coming back to this! I think this comment still applies: #14843 (comment) |
What are your thoughts on this comment: #14843 (comment)? |
numpy/core/_exceptions.py
Outdated
else: | ||
from_outs = "out={}".format(self.from_[-1]) | ||
to_outs = "out={}".format(self.to[-1]) | ||
|
||
return ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth a helper function somewhere,
def _ufunc_call_sig(ufunc, arg_dtypes):
assert len(arg_dtypes) == ufunc.nin + ufunc.nout
if ufunc.nout > 1:
outs = "out=({})".format(", ".join(f.name for f in arg_dtypes[ufunc.nin:]))
else:
outs = "out={}".format(arg_dtypes[-1])
return f"{}({}, {})".format(ufunc.__name__, ", ".join(arg_dtypes[:ufunc.nin]), outs)
which you can then call as _ufunc_call_sig(self.ufunc, self.from)
and _ufunc_call_sig(self.ufunc, self.to)
in the error messages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like this concept, I'm putting it together now to see how that works out. Helper function likely good, but either way something to have the meanings of from/to not switching around is a good thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm putting it together now to see how that works out.
Were you able to finish this up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eric-wieser @mattip Sorry about the 4 year delay, I had some time and was reminded of this. I do believe the current state should reflect the merged changes from main
as well as the helper function being in place. I know there is a lot going on currently with the 2.0 release, but when you do get a minute please let me know if anything looks like it is still missing or should change with this PR.
ENH: DOC: improving error messages for casting errors in ufuncs
This PR adds functionallity to create better error messages in the case that
casting can not be completed in application of ufuncs.
There are probably still some bugs or edge cases, as well as formatting issues;
but the functions seem to work to give messages such as:
UFuncTypeError: Output of ufunc add(uint64, int64, out=uint64) resolved to the add(float64, float64, out=float64) loop, but np.can_cast(np.float64, np.uint64, casting='same_kind') is False
Fixes #14828