-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Raise error in np.einsum_path
when output subscript is specified multiple times
#25230
Conversation
68e363d
to
a891dc9
Compare
np.einsum_path
when multiple output subscripts are specifiednp.einsum_path
when output subscript is specified multiple times
This seems sensible to me but let's ping @dgasmith to look at this small behavior change for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. The two should have identical parsing and parametrizing behavior.
@@ -714,6 +714,9 @@ def _parse_einsum_input(operands): | |||
|
|||
# Make sure output subscripts are in the input | |||
for char in output_subscript: | |||
if output_subscript.count(char) != 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall the exact logical pathways. If we have moved this logic here- can we remove the duplicate logic from np.einsum
which is causing the non-uniform error responses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the error message is thrown from C (which is why np.einsum_path
currently doesn't throw):
numpy/numpy/_core/src/multiarray/einsum.c.src
Lines 174 to 180 in f209869
if (memchr(subscripts + i + 1, label, length - i - 1) != NULL) { | |
PyErr_Format(PyExc_ValueError, | |
"einstein sum subscripts string includes " | |
"output subscript '%c' multiple times", | |
(char)label); | |
return -1; | |
} |
I guess it makes sense to keep it given that also the other error cases are checked again in the C code, or do you disagree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah- it's in C. I would keep both.
I wonder how hard it would be to refactor the C parsing code to expose a private python function and get rid of the python implementation in That said, that's a significant bump in complexity for this PR and the status quo is we have a Python and C implementation that we have to remember to keep in sync, so the solution in this PR is probably fine IMO unless you're willing to take on the bigger C-level refactoring. Exposing C code to python isn't terribly hard but it's not very obvious what to do if you've never done it before, doubly so if you're unfamiliar with C. Happy to help out with that process if you're interested. |
OK, let's merge then. Thanks @lgeiger! |
currently throws an error since the output subscript
j
is specified multiple times. Howeverwould still return an einsum path despite the wrong einsum equation. This might lead to subtle errors if a user ends up relying on this behaviour by accident.
This PR raises the error already during parsing of the einsum equation which ensures that
np.einsum_path
matches the behaviour ofnp.einsum
. Or am I missing a use case for when it would still be useful to return an einsum path despite having multiple output subscripts defined?I recommend reviewing this PR commit by commit with whitespace hidden. 6855caa refactors the unittests to use
pytest.mark.parameterize
and doesn't include any changes in behaviour. I'm happy to split this into it's own PR to reduce the diff if necessary.