feat: add complexity measure on expressions #2389

gwhitney · 2022-01-15T23:31:53Z

As suggested by @josdejong in #1913, it would be useful to have a function measuring the complexity of mathematical expressions (i.e., Node objects). Besides its possible use for the purposes of simplification, it could potentially also be of benefit in ordering subexpressions, or for clients wanting to process multiple expressions in order of complexity.

In particular for simplification, we could imagine marking a rule as applying only if the eventual result produced by the application is of lower complexity than the eventual result produced by continuing without the rule application. There could even be "bidirectional" rules. A conceivable example of such: n1*(n2 + n3) <-> n1*n2 + n1*n3. If either side (or both sides!) of the rule matches, then the matching side is replaced, and simplification proceeds both with the original node and the replacement node(s), with the eventually least complex winning. (I think we must be cautious about combinatorial explosion here in the search space, but it's not possible to experiment until there is a complexity measure.) Some real-ish examples of wanting to go either way in this rule: x*(3x^3 + 5x^2) would generally be written out as 3x^4 + 5x^3, whereas I think x*z + x *y would usually be simplified to x*(y+z). (Note that even the very simplistic complexity measure of "operator count" guides which way to go with this distributive law "correctly" in these two examples.)

It seems as though https://www.mdpi.com/2073-8994/13/2/188 provides something of an overview of possible complexity measures with references to a number of papers giving specific details of certain measures. Comments as to which measure(s) of complexity would be most useful for mathjs would be very welcome.

The text was updated successfully, but these errors were encountered:

josdejong · 2022-01-23T10:19:19Z

Thanks for starting this proposal Glen, well explained!

We can start with a simple complexity rule (like count the number of terms and operations or so), and refine that afterwards where needed.

Like you state, what is most difficult to me is to determine how many steps in a certain direction we should try out (very computationally heavy), since it is possible that an intermediate step may increase the complexity, but a simplification step afterwards will simplify very well. Just thinking aloud here: maybe it is possible that we label specific rules to allow temporary increasing the complexity, but not all rules, to prevent an explosion of possibilities that needs to be tried out.

It would be nice to collect a set of real-world examples to get a better understanding of complexity and temporary increase of complexity.

@ericman314 do you have any thoughts on this matter?

ericman314 · 2022-01-23T14:17:40Z

Interesting, the complexity metric could be used in an A* type algorithm for searching for a simplified expression while still allowing intermediate expressions which are more complex, but would search more preferably toward lower complexity.

Also, regarding simplification itself, operations like factoring are hard (it's what makes modern cryptography work, after all). Berlekamp's algorithm is widely used in computer algebra systems but it looks complicated (to me at least). The pattern matching framework might not be able to factor every expression. Sometimes an alternating mixture of expanding and then factoring would also be needed. And @gwhitney is right to say that sometimes factoring results in a simpler expression, and sometimes expanding does. Sometimes it's subjective, which is why some CAS softwares have separate functions for both.

It could be more performant to have explicit "expand" and "collect" functions which, rather than using the generic pattern matching routine, operate on a simpler set of rules and perform just one kind of simplification. Then, the simplify routine could try these functions when the pattern matching gets stuck.

josdejong · 2022-01-23T16:29:50Z

It could be more performant to have explicit "expand" and "collect" functions

🤔 yes, that makes sense indeed. When doing these things manually your know when doing an expanding step which in the end can be simplified again, so we could add that information to the rules I think.

gwhitney · 2022-01-27T21:14:31Z

I agree polynomial factorization is beyond the scope of the current simplify() and that it's not clear that it belongs there; perhaps factor(), expand(), and collect() should be separate functions. (I don't have the bandwidth at the moment to attack any of these three; if you want (a) placeholder issue(s) for any/all of them, I could file it/them if you like.

On the specific topic of this issue, sounds good: I will start with a PR for a leafComplexity function in simplify.utils (once it's exposed per #2282) that just counts the number of leaf nodes of an expression tree. I do think that generally, but not always, people prefer the expression with fewer leaves as "simpler." That does lead to the question: with simplify() as it currently stands, if the output would be of higher complexity, should it change behavior to just return the original input? At the moment I don't necessarily think so, it seems to me that simplify() is canonicalizing reasonably well.

josdejong · 2022-01-30T09:02:50Z

that just counts the number of leaf nodes of an expression tree

Sounds good, starting with the simplest possible implementation👍💪

gwhitney · 2022-02-04T01:27:14Z

I was thinking about the naming here. "leafComplexity" seems a bit cumbersome for simply the number of leaves of an expression tree. One possibility would be to overload math.size() for Node objects and document that it gives the number of leaves. Or have the function be simplify.utils.leaves() (or whatever is analogous to the final scheme adopted in #2282). Or if leaves() seems like it should return a list of the leaves, it could be simplify.utils.leafCount(). When you get a chance to let me know what name you'd like and where in the api it should go, I will submit the PR for this function. Thanks.

josdejong · 2022-02-06T11:01:35Z

Interesting ideas. Using math.size() would be nice if there is only one logical explanation of what the "size" of a Node is. I think though that there could be various interpretations (the total of all recursive leaves, or the number of arguments in case of A FunctionNode, ...). So I think it would be best to to use math.size() for this but introduce a new function leafCount() for this. We could also think though if we want to create a method Node.prototype.leafCount() but let's keep in mind that we not want to bloat these classes. If it is just a one liner and saves a lot of other overhead it could be ok.

This function returns the number of leaves in the parse tree of an expression. The motivation is to provide an initial complexity measure that could be used to decide whether or not to apply some simplification rules. Docs, embedded docs, and test cases are provided. Resolves josdejong#2389.

gwhitney · 2022-02-07T18:10:49Z

As you can see in the PR, I opted for a regular math.leafCount() function because I thought that would be easier to pass as an option value to simplify() than a class method, presuming we get to the point that the objective function for simplification can be specified.

This function returns the number of leaves in the parse tree of an expression. The motivation is to provide an initial complexity measure that could be used to decide whether or not to apply some simplification rules. Docs, embedded docs, and test cases are provided. Resolves #2389.

josdejong · 2022-02-16T10:14:13Z

Yes, I like it, thanks Glen!

Looking forward to the next step, actually using the function to do smart things in simplify 😁

gwhitney mentioned this issue Jan 15, 2022

feat: add notation for array simplification rules #2390

Open

josdejong added the feature label Jan 23, 2022

gwhitney mentioned this issue Feb 7, 2022

feat: Add leafCount function #2411

Merged

josdejong closed this as completed in #2411 Feb 16, 2022

gwhitney mentioned this issue Feb 16, 2022

Allow tentative/bidirectional simplify rules #2423

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add complexity measure on expressions #2389

feat: add complexity measure on expressions #2389

gwhitney commented Jan 15, 2022

josdejong commented Jan 23, 2022

ericman314 commented Jan 23, 2022

josdejong commented Jan 23, 2022

gwhitney commented Jan 27, 2022

josdejong commented Jan 30, 2022 •

edited

Loading

gwhitney commented Feb 4, 2022

josdejong commented Feb 6, 2022

gwhitney commented Feb 7, 2022

josdejong commented Feb 16, 2022

feat: add complexity measure on expressions #2389

feat: add complexity measure on expressions #2389

Comments

gwhitney commented Jan 15, 2022

josdejong commented Jan 23, 2022

ericman314 commented Jan 23, 2022

josdejong commented Jan 23, 2022

gwhitney commented Jan 27, 2022

josdejong commented Jan 30, 2022 • edited Loading

gwhitney commented Feb 4, 2022

josdejong commented Feb 6, 2022

gwhitney commented Feb 7, 2022

josdejong commented Feb 16, 2022

josdejong commented Jan 30, 2022 •

edited

Loading