BUG: Raise error in `np.einsum_path` when output subscript is specified multiple times #25230

lgeiger · 2023-11-22T22:05:11Z

np.einsum("ij->jij", [[0, 0], [0, 0]])
# ValueError: einstein sum subscripts string includes output subscript 'j' multiple times

currently throws an error since the output subscript j is specified multiple times. However

np.einsum_path("ij->jij", [[0, 0], [0, 0]])

would still return an einsum path despite the wrong einsum equation. This might lead to subtle errors if a user ends up relying on this behaviour by accident.

This PR raises the error already during parsing of the einsum equation which ensures that np.einsum_path matches the behaviour of np.einsum. Or am I missing a use case for when it would still be useful to return an einsum path despite having multiple output subscripts defined?

I recommend reviewing this PR commit by commit with whitespace hidden. 6855caa refactors the unittests to use pytest.mark.parameterize and doesn't include any changes in behaviour. I'm happy to split this into it's own PR to reduce the diff if necessary.

…e specified

ngoldbaum · 2023-11-28T22:12:00Z

This seems sensible to me but let's ping @dgasmith to look at this small behavior change for einsum_path to better match einsum.

dgasmith

Makes sense. The two should have identical parsing and parametrizing behavior.

dgasmith · 2023-12-01T12:09:46Z

numpy/_core/einsumfunc.py

@@ -714,6 +714,9 @@ def _parse_einsum_input(operands):

    # Make sure output subscripts are in the input
    for char in output_subscript:
+        if output_subscript.count(char) != 1:


I don't recall the exact logical pathways. If we have moved this logic here- can we remove the duplicate logic from np.einsum which is causing the non-uniform error responses?

Currently the error message is thrown from C (which is why np.einsum_path currently doesn't throw):

numpy/numpy/_core/src/multiarray/einsum.c.src

Lines 174 to 180 in f209869

if (memchr(subscripts + i + 1, label, length - i - 1) != NULL) {

PyErr_Format(PyExc_ValueError,

"einstein sum subscripts string includes "

"output subscript '%c' multiple times",

(char)label);

return -1;

}

I guess it makes sense to keep it given that also the other error cases are checked again in the C code, or do you disagree?

Ah- it's in C. I would keep both.

ngoldbaum · 2023-12-01T16:44:39Z

I wonder how hard it would be to refactor the C parsing code to expose a private python function and get rid of the python implementation in _parse_einsum_inputs.

That said, that's a significant bump in complexity for this PR and the status quo is we have a Python and C implementation that we have to remember to keep in sync, so the solution in this PR is probably fine IMO unless you're willing to take on the bigger C-level refactoring. Exposing C code to python isn't terribly hard but it's not very obvious what to do if you've never done it before, doubly so if you're unfamiliar with C. Happy to help out with that process if you're interested.

ngoldbaum · 2023-12-01T22:52:14Z

OK, let's merge then. Thanks @lgeiger!

github-actions bot added the 00 - Bug label Nov 22, 2023

lgeiger added 3 commits November 22, 2023 22:09

MAINT: Parameterize TestEinsum::test_einsum_errors

6855caa

MAINT: Test errors of np.einsum_path

1166ef8

BUG: Raise error in np.einsum_path when multiple output subscripts ar…

a891dc9

…e specified

lgeiger force-pushed the einsum-errors branch from 68e363d to a891dc9 Compare November 22, 2023 22:12

lgeiger changed the title ~~BUG: Raise error in np.einsum_path when multiple output subscripts are specified~~ BUG: Raise error in np.einsum_path when output subscript is specified multiple times Nov 22, 2023

This was referenced Nov 24, 2023

Raise error in parse_einsum_input when output subscript is specified multiple times dgasmith/opt_einsum#222

Merged

Handle error in case outputs subscripts of xeinsum are not unique google/jax#18670

Open

dgasmith approved these changes Dec 1, 2023

View reviewed changes

ngoldbaum merged commit b72c24e into numpy:main Dec 1, 2023
60 checks passed

lgeiger deleted the einsum-errors branch December 2, 2023 01:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Raise error in `np.einsum_path` when output subscript is specified multiple times #25230

BUG: Raise error in `np.einsum_path` when output subscript is specified multiple times #25230

lgeiger commented Nov 22, 2023 •

edited

Loading

ngoldbaum commented Nov 28, 2023

dgasmith left a comment

dgasmith Dec 1, 2023

lgeiger Dec 1, 2023

dgasmith Dec 1, 2023

ngoldbaum commented Dec 1, 2023

ngoldbaum commented Dec 1, 2023

	if (memchr(subscripts + i + 1, label, length - i - 1) != NULL) {
	PyErr_Format(PyExc_ValueError,
	"einstein sum subscripts string includes "
	"output subscript '%c' multiple times",
	(char)label);
	return -1;
	}

BUG: Raise error in np.einsum_path when output subscript is specified multiple times #25230

BUG: Raise error in np.einsum_path when output subscript is specified multiple times #25230

Conversation

lgeiger commented Nov 22, 2023 • edited Loading

ngoldbaum commented Nov 28, 2023

dgasmith left a comment

Choose a reason for hiding this comment

dgasmith Dec 1, 2023

Choose a reason for hiding this comment

lgeiger Dec 1, 2023

Choose a reason for hiding this comment

dgasmith Dec 1, 2023

Choose a reason for hiding this comment

ngoldbaum commented Dec 1, 2023

ngoldbaum commented Dec 1, 2023

BUG: Raise error in `np.einsum_path` when output subscript is specified multiple times #25230

BUG: Raise error in `np.einsum_path` when output subscript is specified multiple times #25230

lgeiger commented Nov 22, 2023 •

edited

Loading