Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Raise error in np.einsum_path when output subscript is specified multiple times #25230

Merged
merged 3 commits into from
Dec 1, 2023

Conversation

lgeiger
Copy link
Contributor

@lgeiger lgeiger commented Nov 22, 2023

np.einsum("ij->jij", [[0, 0], [0, 0]])
# ValueError: einstein sum subscripts string includes output subscript 'j' multiple times

currently throws an error since the output subscript j is specified multiple times. However

np.einsum_path("ij->jij", [[0, 0], [0, 0]])

would still return an einsum path despite the wrong einsum equation. This might lead to subtle errors if a user ends up relying on this behaviour by accident.

This PR raises the error already during parsing of the einsum equation which ensures that np.einsum_path matches the behaviour of np.einsum. Or am I missing a use case for when it would still be useful to return an einsum path despite having multiple output subscripts defined?

I recommend reviewing this PR commit by commit with whitespace hidden. 6855caa refactors the unittests to use pytest.mark.parameterize and doesn't include any changes in behaviour. I'm happy to split this into it's own PR to reduce the diff if necessary.

@lgeiger lgeiger changed the title BUG: Raise error in np.einsum_path when multiple output subscripts are specified BUG: Raise error in np.einsum_path when output subscript is specified multiple times Nov 22, 2023
@ngoldbaum
Copy link
Member

This seems sensible to me but let's ping @dgasmith to look at this small behavior change for einsum_path to better match einsum.

Copy link
Contributor

@dgasmith dgasmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. The two should have identical parsing and parametrizing behavior.

@@ -714,6 +714,9 @@ def _parse_einsum_input(operands):

# Make sure output subscripts are in the input
for char in output_subscript:
if output_subscript.count(char) != 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall the exact logical pathways. If we have moved this logic here- can we remove the duplicate logic from np.einsum which is causing the non-uniform error responses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the error message is thrown from C (which is why np.einsum_path currently doesn't throw):

if (memchr(subscripts + i + 1, label, length - i - 1) != NULL) {
PyErr_Format(PyExc_ValueError,
"einstein sum subscripts string includes "
"output subscript '%c' multiple times",
(char)label);
return -1;
}

I guess it makes sense to keep it given that also the other error cases are checked again in the C code, or do you disagree?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah- it's in C. I would keep both.

@ngoldbaum
Copy link
Member

I wonder how hard it would be to refactor the C parsing code to expose a private python function and get rid of the python implementation in _parse_einsum_inputs.

That said, that's a significant bump in complexity for this PR and the status quo is we have a Python and C implementation that we have to remember to keep in sync, so the solution in this PR is probably fine IMO unless you're willing to take on the bigger C-level refactoring. Exposing C code to python isn't terribly hard but it's not very obvious what to do if you've never done it before, doubly so if you're unfamiliar with C. Happy to help out with that process if you're interested.

@ngoldbaum
Copy link
Member

OK, let's merge then. Thanks @lgeiger!

@ngoldbaum ngoldbaum merged commit b72c24e into numpy:main Dec 1, 2023
60 checks passed
@lgeiger lgeiger deleted the einsum-errors branch December 2, 2023 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants