simplify: Simplification of non-commutative expressions #14520

valglad · 2018-03-16T18:05:30Z

This PR introduces nc_simplify - a function for simplifying expressions with non-commutative symbols. Specifically, it simplifies the terms of the expression that involve only multiplication and raising to a power by grouping together repeated subterms, e.g. nc_simplify(a*b*a*b) == (a*b)**2.

At present, SymPy doesn't really do much to simplify such expressions so that simplify(a*b*a*b) == a*b*a*b. On this branch, simplify uses nc_simplify if the expression is non-commutative, before attempting further simplifications as it normally would.

Other examples of nc_simplify with non-commutative symbols a, b, c are:

>>> nc_simplify(a**3*b*a*b*a*b)
a**2*(a*b)**3
>>> nc_simplify(a*b*a**2*b*a**2*b*a)
(a*b*a)**3
>>> nc_simplify(a*b*a*b*c*(a*b)**2*c)
((a*b)**2*c)**2

More examples are included in the docstring and the tests.

This commit introduces nc_simplify - a function for simplifying expressions with non-commutative symbols. To be specific, it simplifies the terms of the expression that involve only multiplication and raising to a power by grouping together repeated subterms, e.g. nc_simplify(a*b*a*b) == (a*b)**2.

asmeurer · 2018-03-16T20:55:24Z

Nice. Your algorithm could probably be useful for #10271. It tried your examples there and they aren't as good.

>>> cse(a**3*b*a*b*a*b)
([], [a**3*(b*a*b)*a*b])
>>> cse(a*b*a**2*b*a**2*b*a)
([(x0, a**2)], [a*b*x0*b*x0*b*a])
>>> cse(a*b*a*b*c*(a*b)**2*c)
([(x0, a*b)], [x0**2*c*x0**2*c])

(side note: I can't believe that pull request is still open. I should probably just fix the merge conflicts and get it in already)

valglad · 2018-03-17T14:28:57Z

I was wondering if there were any related issues or PRs trying to do similar things. The algorithm I used is mostly naive, the only not-entirely-naive thing about it is a large if clause to modify the subterm we're trying to match by splitting up the powers. E.g. if we are grouping subterms of length 2 in a**2*b*a*b*a*b, we'd start with a**2*b (the first two arguments) and immediately see that there is no match for it (the next two arguments give a*b) - but if we took only one a from a**2, then we'd have a match. There a couple more cases like that.

asmeurer · 2018-03-17T20:37:13Z

I think there must be an issue somewhere in my cse code. The underlying algorithm should be able to recursively get the same results as yours

>>> shortest_repeated_subsequence((a, a, a, b, a, b, a, b))
(a, b)
>>> shortest_repeated_subsequence((a, b, a, a, b, a, a, b, a))
(a, b, a)
>>> shortest_repeated_subsequence((a, b, a, b, c, a, b, a, b, c))
(a, b)

I haven't had a look at it in a long time so I'm not sure what is going on. I thought maybe it was that it doesn't recognize powers as products, but it fails even with Mul(a, b, a, a, b, a, a, b, a, evaluate=False). Although the last one does work when you do that

>>> cse(Mul(a, b, a, b, c, a, b, a, b, c, evaluate=False))
([(x0, a*b), (x1, x0**2*c)], [x1**2])

I might do some more debugging later.

Let me know what you think of the "shortest repeated subsequence" idea. It can probably be extended to work with powers implicitly, so that they don't have to be manually extended in the sequence (that would be important for very large powers).

At some point I got bogged down in @smichr's suggestions, and it turned out that I didn't need cse on matrix expressions as much as I thought I did, so my focus moved to other places.

asmeurer · 2018-03-17T20:40:24Z

Although cse perhaps has a different goal than nc_simplify. For cse, a*b*c*a*b can be simplified, because a*b can be pulled out and computed only once. But for simplify, there is nothing to be done here, because the result is only a single expression. So likely there isn't complete overlap here.

asmeurer · 2018-03-17T20:53:06Z

Ah, cse is destroying unevaluated Muls here. And it's a known issue that cse doesn't automatically expand powers (apparently nontrivial, because it was fixed once, but had to be reverted because of performance issues).

valglad · 2018-03-17T23:09:31Z

shortest_repeated_subsequence sounds reasonable but why go for the shortest one? If I'm understanding correctly, you will have:

>>> shortest_repeated_subsequence((a, b, c, d, b, b, c, d, b, a, b))
(a, b)

rather than (b, c, d, b) (the longest one). If you are using shortest_repeated_subsequence iteratively to simplify the sequence, then using the longest one will give a shorter tuple in the next iteration and probably a better simplification. E.g., if you picked (b, c, d, b) first, then you'd immediately have (a, b, x_0, x_0, a, b) then (x_1, x_0, x_0, x_1) then (x_2, x_2) - with (a, b), it's (x_0, c, d, b, b, c, d, b, x_0) then (x_0, x_1, b, x_1, x_0) then (x_2, b, x_2) and you are stuck.

It's true that if you go for the longest one, there might be shorter repeated subsequences inside so you'll have to run the function on it as well, but you could probably reuse the information from the matrix you've computed for the whole expression and then it shouldn't be too inefficient (maybe)?

asmeurer · 2018-03-19T21:06:29Z

The x2 in your example isn't correct (it's (x1, x0) or (x0, x1)). Also, it should be (a, x0, x0, a, b)

For your example (corrected), we have

x0 = b*c*d*b
expr = a*x0*x0*a*b

vs.

x0 = a*b
x1 = c*d*b
expr = x0*x1*b*x1*x0

From the point of view of cse, they are the same, because they both have 7 multiplications in total, down from the original 10. For simplification, your version is clearly better, because you can combine the x0 part into a power (the second version doesn't really do anything for simplify).

Maybe there's a better example where the total number of multiplications isn't the same. But I'm thinking the cse algorithm isn't directly applicable for simplify, because cse has to care about repeated subsequences anywhere in the expression, whereas for simplify you want to combine repeated subsequences that are next to each other (into a power). I don't think shortest or longest matters without taking that into account. But maybe some of the ideas, like the matrix, could be reused. I'm not sure.

This also doesn't take into account at all inverses and simplifying things like a*b*c*(b*c)**-2. I remember discussing this last summer in the context of free groups, so maybe there are already algorithms in that module to handle this?

asmeurer · 2018-03-19T21:15:51Z

Another thing that can be done regarding inverses would be to minimize the number of negative powers (this seems to me to be a reasonable heuristic for "more simple", at least assuming other things like combining don't occur). For instance,

(a**-1 * b**-1)**-1 -> b * a
(a * b**-1)**-1 -> b * a**-1
a**-1 * b**-1 -> (b * a)**-1

valglad · 2018-03-20T11:17:44Z

The x2 in your example isn't correct (it's (x1, x0) or (x0, x1))

Oh yeah... Oops.
I did actually find an example where the shortest sequence gives fewer multiplications ((a, b, c, d, b, c, d, a, b, c d) assuming I did it right this time) and it feels like there could be examples where the longest gives fewer so it probably really doesn't matter for cse.

This also doesn't take into account at all inverses and simplifying things like a*b*c*(b*c)**-2. I remember discussing this last summer in the context of free groups, so maybe there are already algorithms in that module to handle this?

In the group theory module, grouped powers are always expanded automatically, so that a*b*c*(b*c)**-2 becomes a*b*c**-1*b**-2. Come to think of it, I somehow forgot about inverse powers in this PR, so nc_simplify wouldn't detect the simplification to a*(b*c)**-1. I should probably do something about it.

Another thing that can be done regarding inverses would be to minimize the number of negative powers

And this also sounds like something I should do. Thanks for the suggestion 👍

Generally, the group theory module still doesn't have a simplifying algorithm for group elements and this PR is me finally getting around to it since summer. I thought that I might just as well sort out non-commutative expressions first.

So I guess I should do a bit more work here, to include the negative powers and see if using a matrix will make things more efficient. Might do this later this week.

asmeurer · 2018-04-24T20:00:50Z

Any objections to merging this? I think it's a good start for noncommutative simplification.

valglad · 2018-04-25T10:06:24Z

I've rewritten it from scratch and the new version is much better: it takes into account the things we talked about, can choose simplifications a bit more cleverly and is essentially as efficient as this one. I've just been debugging and commenting the code quite slowly but it's essentially ready now. I'll run some more tests and hopefully will commit it in a couple of days - that would be better to merge.

This is a new version of nc_simpliy. Now inverses are handled so that nc_simplify(a*b*(b**-1*a**-1)**2) = (a*b)**-1 and nc_simplify(c**-1*b**-1*a**-1) = (a*b**c)**-1. Negative powers are generally handled better, including this sort of thing: nc_simplify(a*b**3*a*b**3*a*b) = (a*b**3)**3*b**-2. Additionally, simplifications are now chosen on the basis of what will leave the fewest arguments rather than the earliest and longest possible repeated subterm as before. The repetitions themselves are found by computing a matrix of overlaps of prefixes and suffixes of the word instead of repeated scanning the word for each possible repeated subterm length.

valglad · 2018-04-26T17:35:17Z

Just as an example, if you define:

expr = a**3*b*a**4*b*a**4*b*a**2*b*a**2*b*a**2*b*a**2*b*a**2*b*a**2
for i in range(10):
    expr *= a*b

the latest version simplifies it to a**3*(b*a**4)**2*(b*a**2)**6*(a*b)**10 in 0.010651111602783203 seconds while the previous one gave a*(a**2*b*a**2)**2*a*(a*b*a)**6*a*((a*b)**4*a*b)**2 in 0.008729934692382812. On some examples the latest version is faster but it often takes between 0.001 and 0.005 seconds more on expression I tried - overall though, it seems to be of the same order and does actually give better simplifications.

There are some specific cases the function doesn't handle right now. E.g. it can simplify a*b*a*b*a*b**2 to (a*b)**3*b but leaves a*b*a*b**2 unsimplified and though nc_simplify(b**-1*a**-1*(a*b)**2) = (a*b)**-1, nc_simplify(c**-1*b**-1*a**-1*(a*b)**2) = (a*b*c)**-1*(a*b)**2. They just don't naturally fit into the flow of the algorithm but I will think about how to make them work as well later. These are quite specific and nc_simplify can do decently on most expressions so should be good to merge when the tests pass.

Abdullahjavednesar · 2018-04-29T18:50:00Z

@asmeurer, I think this is ready to merge can you please have a look.

asmeurer · 2018-04-30T20:01:25Z

I ran a coverage test (./bin/coverage_report.py sympy/simplify/tests/test_simplify.py) and a few lines here aren't covered.

Also maybe the tests could be structured so that they are self checking, like

def _check(expr, simplified):
    assert nc_simplify(expr) == simplified)
    assert expand(expr) == expand(simplified)


assert check(a*b*a*b*a*b*c*(a*b)**3*c, ((a*b)**3*c)**2)
...

valglad · 2018-05-13T19:42:41Z

I'm not sure I understand about the coverage. I had never run this before but now I tried with and without the test I added, and the numbers in sympy/covhtml/index.html in the row for test_simplify didn't change (it says 97%, whatever that means...). An auxiliary function for testing does make sense though.

asmeurer · 2018-05-13T23:19:11Z

The percentage is just the number of lines that are covered, which isn't helpful since it's looking at the whole file. What you should do is open the covhtml/sympy_simplify_simplify.html file and see which lines in the function aren't covered by any tests.

This commit makes sure the test covers all lines in the new code and writes the test in a slightly shorter way via an auxiliary subfunction.

valglad · 2018-05-14T17:37:56Z

Ah, got it. The new commit should deal with that.

asmeurer · 2018-05-15T20:58:17Z

sympy/simplify/tests/test_simplify.py

+    x = Symbol('x')
+
+    def _check(expr, simplified, deep=True):
+        assert nc_simplify(expr, deep=deep) == simplified


Were you going to add an expand sanity check here?

At first I was, but then expand((c*a)**-1) == (c*a*)**-1 and not expand((c*a)**-1 == a**-1*c**-1 so I'd have to add a keyword argument to ignore the expand check for some of the tests. I didn't in the end because all of the expressions I used are sufficiently short that I know with certainty the simplified version is correct. But I guess I could add an ignore keyword argument, and have the check anyway just in case, if you think it's worth it. Or, alternatively, expand((c*a)**-1 should probably be a**-1*c**-1 - I think I could write the PR for it today already, and then if it's merged first, I wouldn't need an additional keyword argument.

Wait, I don't even need a separate PR, I can do it right here.

There is a test in core/tests/test_expand.py that does expand(A*B*(A*B)**-1) == A*B*(A*B)**-1 for non-commutative A, B after this issue. So I'm somewhat unsure if that was intentional. After the change I'm proposing to do it'll be expand(A*B*(A*B)**-1) == 1.

Looks like the bug was a wrong result, so the fix was likely to just disable noncommutative expanding for negative powers. But doing the correct thing and actually expanding seems better. I agree expand(A*B*(A*B)**-1) should give 1.

This commit makes sure negative powers are expanded properly, e.g. expand((a*b)**-1) == b**-1*a**-1, and adds an expand check to the tests for nc_simplify.

asmeurer · 2018-05-25T18:17:53Z

Can this be merged? Everything looks good on my end but I want to double check that there wasn't anything else you planned to do.

valglad · 2018-05-25T21:46:52Z

Yep, this is the final version (though I might do bit more work on this later, after it's merged).

homocomputeris · 2018-07-17T22:53:48Z

If NC expressions are now simplified, maybe it is easier to get their coefs? #14833

valglad · 2018-07-18T10:18:15Z

@homocomputeris I don't see why simplification would make it harder. Is there some reason to believe this might be the case?

homocomputeris · 2018-07-18T10:28:07Z

@valglad I had a typo: if there are now simplified, I believe, it shouldn't be hard to get correctly their coefs from an expression.

valglad · 2018-07-18T10:40:51Z

@homocomputeris Ah, I see. I don't think that would help. Have a look at my first comment in the issue - this 0 was an intended feature when there are conflicting coefficients, and simplifying the multiplicative terms wouldn't resolve the conflict.

valglad added 2 commits March 16, 2018 20:36

don't do anything to special functions

b5fbe2c

Abdullahjavednesar added the PR: sympy's turn label Apr 27, 2018

increasing test coverage

d4bcf21

This commit makes sure the test covers all lines in the new code and writes the test in a slightly shorter way via an auxiliary subfunction.

asmeurer reviewed May 15, 2018

View reviewed changes

expand negative powers and add expand check to tests

312e7d0

This commit makes sure negative powers are expanded properly, e.g. expand((a*b)**-1) == b**-1*a**-1, and adds an expand check to the tests for nc_simplify.

asmeurer merged commit a262f1c into sympy:master May 26, 2018

valglad deleted the nc_simplify branch July 18, 2018 10:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simplify: Simplification of non-commutative expressions #14520

simplify: Simplification of non-commutative expressions #14520

valglad commented Mar 16, 2018 •

edited

Loading

asmeurer commented Mar 16, 2018

valglad commented Mar 17, 2018

asmeurer commented Mar 17, 2018

asmeurer commented Mar 17, 2018

asmeurer commented Mar 17, 2018

valglad commented Mar 17, 2018 •

edited

Loading

asmeurer commented Mar 19, 2018 •

edited

Loading

asmeurer commented Mar 19, 2018 •

edited

Loading

valglad commented Mar 20, 2018

asmeurer commented Apr 24, 2018

valglad commented Apr 25, 2018

valglad commented Apr 26, 2018 •

edited

Loading

Abdullahjavednesar commented Apr 29, 2018

asmeurer commented Apr 30, 2018

valglad commented May 13, 2018 •

edited

Loading

asmeurer commented May 13, 2018

valglad commented May 14, 2018

asmeurer May 15, 2018

valglad May 16, 2018

valglad May 16, 2018

valglad May 16, 2018

asmeurer May 16, 2018

asmeurer commented May 25, 2018

valglad commented May 25, 2018

homocomputeris commented Jul 17, 2018 •

edited

Loading

valglad commented Jul 18, 2018

homocomputeris commented Jul 18, 2018

valglad commented Jul 18, 2018 •

edited

Loading

simplify: Simplification of non-commutative expressions #14520

simplify: Simplification of non-commutative expressions #14520

Conversation

valglad commented Mar 16, 2018 • edited Loading

asmeurer commented Mar 16, 2018

valglad commented Mar 17, 2018

asmeurer commented Mar 17, 2018

asmeurer commented Mar 17, 2018

asmeurer commented Mar 17, 2018

valglad commented Mar 17, 2018 • edited Loading

asmeurer commented Mar 19, 2018 • edited Loading

asmeurer commented Mar 19, 2018 • edited Loading

valglad commented Mar 20, 2018

asmeurer commented Apr 24, 2018

valglad commented Apr 25, 2018

valglad commented Apr 26, 2018 • edited Loading

Abdullahjavednesar commented Apr 29, 2018

asmeurer commented Apr 30, 2018

valglad commented May 13, 2018 • edited Loading

asmeurer commented May 13, 2018

valglad commented May 14, 2018

asmeurer May 15, 2018

Choose a reason for hiding this comment

valglad May 16, 2018

Choose a reason for hiding this comment

valglad May 16, 2018

Choose a reason for hiding this comment

valglad May 16, 2018

Choose a reason for hiding this comment

asmeurer May 16, 2018

Choose a reason for hiding this comment

asmeurer commented May 25, 2018

valglad commented May 25, 2018

homocomputeris commented Jul 17, 2018 • edited Loading

valglad commented Jul 18, 2018

homocomputeris commented Jul 18, 2018

valglad commented Jul 18, 2018 • edited Loading

valglad commented Mar 16, 2018 •

edited

Loading

valglad commented Mar 17, 2018 •

edited

Loading

asmeurer commented Mar 19, 2018 •

edited

Loading

asmeurer commented Mar 19, 2018 •

edited

Loading

valglad commented Apr 26, 2018 •

edited

Loading

valglad commented May 13, 2018 •

edited

Loading

homocomputeris commented Jul 17, 2018 •

edited

Loading

valglad commented Jul 18, 2018 •

edited

Loading