Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: DOC: improving error messages for casting errors in ufuncs #14828 #14843

Open
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

micha2718l
Copy link

@micha2718l micha2718l commented Nov 6, 2019

ENH: DOC: improving error messages for casting errors in ufuncs

This PR adds functionallity to create better error messages in the case that
casting can not be completed in application of ufuncs.
There are probably still some bugs or edge cases, as well as formatting issues;
but the functions seem to work to give messages such as:
UFuncTypeError: Output of ufunc add(uint64, int64, out=uint64) resolved to the add(float64, float64, out=float64) loop, but np.can_cast(np.float64, np.uint64, casting='same_kind') is False

Fixes #14828

@mattip
Copy link
Member

mattip commented Nov 7, 2019

In principle this looks like an improvement. Could you show a direct comparison of old-to-new? Also, is there a short-hand notation we could use to keep the error message length from getting out of hand, perhaps dtype.str would be more compact

Comment on lines 169 to 170
PyTuple_SetItem(froms, j, (PyObject *)PyArray_DESCR(operands[j]));
PyTuple_SetItem(tos, j, (PyObject *)dtypes[j]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing Py_INCREF(value) here - PyTuple_SetItem steals references.

Also, you might as well use PyTuple_SETITEM, it's faster and no more dangerous

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your helpful comments so far!
I included the Py_INCREF on the dtypes[j] object and it removed a segfault in an edge case I found. I don't think there needs to be one with regards to the PyArray_DESCR(operands[j]) object because PyArray_DESRC returns a borrowed reference, correct me if i misunderstand something there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are borrowed references, and this is exactly why you do need to incref them an extra time - when you call a function like SETITEM that steals a reference, you must make sure the reference is yours to give (ie not borrowed).

@eric-wieser
Copy link
Member

eric-wieser commented Nov 7, 2019

These lines will likely need to change too:

# try to deal with broken casting rules
try:
return ufunc(*args, out=out, **kwargs)
except _exceptions._UFuncOutputCastingError as e:
# Numpy 1.17.0, 2019-02-24
warnings.warn(
"Converting the output of clip from {!r} to {!r} is deprecated. "
"Pass `casting=\"unsafe\"` explicitly to silence this warning, or "
"correct the type of the variables.".format(e.from_, e.to),
DeprecationWarning,
stacklevel=2
)
return ufunc(*args, out=out, casting="unsafe", **kwargs)

@eric-wieser
Copy link
Member

eric-wieser commented Nov 7, 2019

I think it might be worth renaming the python class members to be something like:

@_display_as_base
class _UFuncCastingError(UFuncTypeError):
    def __init__(self, ufunc, casting, call_dtypes, loop_dtypes):
        super().__init__(ufunc)
        self.casting = casting
        self.call_dtypes = call_dtypes
        self.loop_dtypes = loop_dtypes

@_display_as_base
class _UFuncInputCastingError(_UFuncCastingError):
    # other members as before
    @property
    def from_(self):
        return self.call_dtypes[self.in_i]
    @property
    def to(self):
        return self.loop_dtypes[self.in_i]

@_display_as_base
class _UFuncOutputCastingError(_UFuncCastingError):
    # other members as before
    @property
    def from_(self):
        return self.loop_dtypes[self.ufunc.nin + self.out_i]
    @property
    def to(self):
        return self.call_dtypes[self.ufunc.nin + self.out_i]

This has the nice benefit of not changing the meaning of the to and from properties

Copy link
Author

@micha2718l micha2718l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattip

In principle this looks like an improvement. Could you show a direct comparison of old-to-new? Also, is there a short-hand notation we could use to keep the error message length from getting out of hand, perhaps dtype.str would be more compact

For the original (mentioned in issue #14843) example:

import numpy as np
x = np.arange(5, dtype='u8')
y = np.arange(5, dtype='i8')
x += y

The result for the current codebase is:
TypeError: Cannot cast ufunc add output from dtype('float64') to dtype('uint64') with casting rule 'same_kind'

The result from this change is:
UFuncTypeError: Output of ufunc add(uint64, int64, out=uint64) resolved to the add(float64, float64, out=float64) loop, but np.can_cast(np.float64, np.uint64, casting='same_kind') is False

I agree that that a shorter message is better, but I opted for the np.* form because it allows the code from the message to be directly copy and pasted into a REPL to play with it; if shorter takes precedence I can take the suggestion and make it less verbose.

PyTuple_SET_ITEM(froms, j, Py_None);
} else {
PyTuple_SET_ITEM(froms, j, (PyObject *)PyArray_DESCR(operands[j]));
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check seems necessary in some cases where the operands are not all available, is there a chance that the dtypes will contain NULL?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I don't know the answer

@adeak
Copy link
Contributor

adeak commented Apr 2, 2020

As only a regular user of numpy the mixed (with respect to the presence of np) error message seems a bit weird to me. The first two function calls and their arguments are without np, and I think it would look both shorter and better to omit them altogether:

UFuncTypeError: Output of ufunc add(uint64, int64, out=uint64) resolved to the add(float64, float64, out=float64) loop, but can_cast(float64, uint64, casting='same_kind') is False

It's true that you can't directly throw the last call into a REPL if you only have numpy imported as np, but you can alternatively import those three names instead. Or trust the interpreter when it says the result is False :)

@micha2718l
Copy link
Author

As only a regular user of numpy the mixed (with respect to the presence of np) error message seems a bit weird to me. The first two function calls and their arguments are without np, and I think it would look both shorter and better to omit them altogether:

UFuncTypeError: Output of ufunc add(uint64, int64, out=uint64) resolved to the add(float64, float64, out=float64) loop, but can_cast(float64, uint64, casting='same_kind') is False

It's true that you can't directly throw the last call into a REPL if you only have numpy imported as np, but you can alternatively import those three names instead. Or trust the interpreter when it says the result is False :)

I totally see your point with the consistency. Any other opinions out there? If it were my choice I would want to have np on all parts, though it does create longer messages. I think there are valid points on both sides.

i_str = "{} ".format(self.out_i) if self.ufunc.nout != 1 else ""
if self.ufunc.nout > 1:
from_outs = ", ".join(["out{}={}".format(i+1, f.name) for i, f in enumerate(self.from_) if i>=self.ufunc.nin])
to_outs = ", ".join(["out{}={}".format(i+1, t.name) for i, t in enumerate(self.to) if i>=self.ufunc.nin])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't produce a legal call, it should be:

Suggested change
to_outs = ", ".join(["out{}={}".format(i+1, t.name) for i, t in enumerate(self.to) if i>=self.ufunc.nin])
to_outs = "out=({})".format(", ".join(t.name for t in self.to[self.ufunc.nin:]))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same for the other three lines)

@eric-wieser
Copy link
Member

Any other opinions out there? If it were my choice I would want to have np on all parts

Numpy is not the only module capable of defining ufuncs, so prepending np. before add would most likely be incorrect for errors from, say, scipy ufuncs

@eric-wieser
Copy link
Member

@micha2718l, do you think you'll have time to come back to this?

@mattip
Copy link
Member

mattip commented Oct 16, 2020

@micha2718l it would be nice to finish this up. We are nearing the cutoff for the next release.

@micha2718l
Copy link
Author

@mattip @eric-wieser
I think I took care of the outstanding issues with this PR. Let me know if there is any other details to think about.

@adeak
Copy link
Contributor

adeak commented Oct 28, 2020

I only have the same trivial remark as earlier: the mixture of np.can_cast and np./non-np. types in that error message look a bit weird to me :)

But this might be bikeshedding territory, and I don't have the perspective to tell whether my preference is actually better (but unless I missed it I don't think we got a clear core dev statement on the matter, beyond that np.add would definitely be off). I'm sorry if I just missed it.

@micha2718l
Copy link
Author

I only have the same trivial remark as earlier: the mixture of np.can_cast and np./non-np. types in that error message look a bit weird to me :)

But this is might be bikeshedding territory, and I don't have the perspective to tell whether my preference is actually better (but unless I missed it I don't think we got a clear core dev statement on the matter, beyond that np.add would definitely be off). I'm sorry if I just missed it.

I missed that comment when I returned to this (and it had been a while so I had forgot about your comment). I still think there are valid points on either side, if any core have an opinion it is welcome. I'm open to any of the three potential options here, all np, no np, or the hybrid as is; which balances brevity and usefulness IMHO

@eric-wieser
Copy link
Member

Thanks for coming back to this! I think this comment still applies: #14843 (comment)

@eric-wieser
Copy link
Member

What are your thoughts on this comment: #14843 (comment)?

else:
from_outs = "out={}".format(self.from_[-1])
to_outs = "out={}".format(self.to[-1])

return (
Copy link
Member

@eric-wieser eric-wieser Oct 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth a helper function somewhere,

def _ufunc_call_sig(ufunc, arg_dtypes):
    assert len(arg_dtypes) == ufunc.nin + ufunc.nout
    if ufunc.nout > 1:
        outs = "out=({})".format(", ".join(f.name for f in arg_dtypes[ufunc.nin:]))
    else:
        outs = "out={}".format(arg_dtypes[-1])
    return f"{}({}, {})".format(ufunc.__name__, ", ".join(arg_dtypes[:ufunc.nin]), outs)

which you can then call as _ufunc_call_sig(self.ufunc, self.from) and _ufunc_call_sig(self.ufunc, self.to) in the error messages

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like this concept, I'm putting it together now to see how that works out. Helper function likely good, but either way something to have the meanings of from/to not switching around is a good thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm putting it together now to see how that works out.

Were you able to finish this up?

Copy link
Author

@micha2718l micha2718l Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-wieser @mattip Sorry about the 4 year delay, I had some time and was reminded of this. I do believe the current state should reflect the merged changes from main as well as the helper function being in place. I know there is a lot going on currently with the 2.0 release, but when you do get a minute please let me know if anything looks like it is still missing or should change with this PR.

Base automatically changed from master to main March 4, 2021 02:04
@InessaPawson InessaPawson added 52 - Inactive Pending author response and removed 61 - Stale labels Jun 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Pending author's response
Status: Pending authors' response
Development

Successfully merging this pull request may close these issues.

Improving the error message when adding i8 array to u8 array
6 participants