Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add complexity measure on expressions #2389

Closed
gwhitney opened this issue Jan 15, 2022 · 9 comments · Fixed by #2411
Closed

feat: add complexity measure on expressions #2389

gwhitney opened this issue Jan 15, 2022 · 9 comments · Fixed by #2411
Labels

Comments

@gwhitney
Copy link
Collaborator

As suggested by @josdejong in #1913, it would be useful to have a function measuring the complexity of mathematical expressions (i.e., Node objects). Besides its possible use for the purposes of simplification, it could potentially also be of benefit in ordering subexpressions, or for clients wanting to process multiple expressions in order of complexity.

In particular for simplification, we could imagine marking a rule as applying only if the eventual result produced by the application is of lower complexity than the eventual result produced by continuing without the rule application. There could even be "bidirectional" rules. A conceivable example of such: n1*(n2 + n3) <-> n1*n2 + n1*n3. If either side (or both sides!) of the rule matches, then the matching side is replaced, and simplification proceeds both with the original node and the replacement node(s), with the eventually least complex winning. (I think we must be cautious about combinatorial explosion here in the search space, but it's not possible to experiment until there is a complexity measure.) Some real-ish examples of wanting to go either way in this rule: x*(3x^3 + 5x^2) would generally be written out as 3x^4 + 5x^3, whereas I think x*z + x *y would usually be simplified to x*(y+z). (Note that even the very simplistic complexity measure of "operator count" guides which way to go with this distributive law "correctly" in these two examples.)

It seems as though https://www.mdpi.com/2073-8994/13/2/188 provides something of an overview of possible complexity measures with references to a number of papers giving specific details of certain measures. Comments as to which measure(s) of complexity would be most useful for mathjs would be very welcome.

@josdejong
Copy link
Owner

Thanks for starting this proposal Glen, well explained!

We can start with a simple complexity rule (like count the number of terms and operations or so), and refine that afterwards where needed.

Like you state, what is most difficult to me is to determine how many steps in a certain direction we should try out (very computationally heavy), since it is possible that an intermediate step may increase the complexity, but a simplification step afterwards will simplify very well. Just thinking aloud here: maybe it is possible that we label specific rules to allow temporary increasing the complexity, but not all rules, to prevent an explosion of possibilities that needs to be tried out.

It would be nice to collect a set of real-world examples to get a better understanding of complexity and temporary increase of complexity.

@ericman314 do you have any thoughts on this matter?

@ericman314
Copy link
Collaborator

Interesting, the complexity metric could be used in an A* type algorithm for searching for a simplified expression while still allowing intermediate expressions which are more complex, but would search more preferably toward lower complexity.

Also, regarding simplification itself, operations like factoring are hard (it's what makes modern cryptography work, after all). Berlekamp's algorithm is widely used in computer algebra systems but it looks complicated (to me at least). The pattern matching framework might not be able to factor every expression. Sometimes an alternating mixture of expanding and then factoring would also be needed. And @gwhitney is right to say that sometimes factoring results in a simpler expression, and sometimes expanding does. Sometimes it's subjective, which is why some CAS softwares have separate functions for both.

It could be more performant to have explicit "expand" and "collect" functions which, rather than using the generic pattern matching routine, operate on a simpler set of rules and perform just one kind of simplification. Then, the simplify routine could try these functions when the pattern matching gets stuck.

@josdejong
Copy link
Owner

It could be more performant to have explicit "expand" and "collect" functions

🤔 yes, that makes sense indeed. When doing these things manually your know when doing an expanding step which in the end can be simplified again, so we could add that information to the rules I think.

@gwhitney
Copy link
Collaborator Author

I agree polynomial factorization is beyond the scope of the current simplify() and that it's not clear that it belongs there; perhaps factor(), expand(), and collect() should be separate functions. (I don't have the bandwidth at the moment to attack any of these three; if you want (a) placeholder issue(s) for any/all of them, I could file it/them if you like.

On the specific topic of this issue, sounds good: I will start with a PR for a leafComplexity function in simplify.utils (once it's exposed per #2282) that just counts the number of leaf nodes of an expression tree. I do think that generally, but not always, people prefer the expression with fewer leaves as "simpler." That does lead to the question: with simplify() as it currently stands, if the output would be of higher complexity, should it change behavior to just return the original input? At the moment I don't necessarily think so, it seems to me that simplify() is canonicalizing reasonably well.

@josdejong
Copy link
Owner

josdejong commented Jan 30, 2022

that just counts the number of leaf nodes of an expression tree

Sounds good, starting with the simplest possible implementation👍💪

@gwhitney
Copy link
Collaborator Author

gwhitney commented Feb 4, 2022

I was thinking about the naming here. "leafComplexity" seems a bit cumbersome for simply the number of leaves of an expression tree. One possibility would be to overload math.size() for Node objects and document that it gives the number of leaves. Or have the function be simplify.utils.leaves() (or whatever is analogous to the final scheme adopted in #2282). Or if leaves() seems like it should return a list of the leaves, it could be simplify.utils.leafCount(). When you get a chance to let me know what name you'd like and where in the api it should go, I will submit the PR for this function. Thanks.

@josdejong
Copy link
Owner

Interesting ideas. Using math.size() would be nice if there is only one logical explanation of what the "size" of a Node is. I think though that there could be various interpretations (the total of all recursive leaves, or the number of arguments in case of A FunctionNode, ...). So I think it would be best to to use math.size() for this but introduce a new function leafCount() for this. We could also think though if we want to create a method Node.prototype.leafCount() but let's keep in mind that we not want to bloat these classes. If it is just a one liner and saves a lot of other overhead it could be ok.

gwhitney added a commit to gwhitney/mathjs that referenced this issue Feb 7, 2022
  This function returns the number of leaves in the parse tree of an
  expression. The motivation is to provide an initial complexity measure
  that could be used to decide whether or not to apply some simplification
  rules. Docs, embedded docs, and test cases are provided.

  Resolves josdejong#2389.
@gwhitney
Copy link
Collaborator Author

gwhitney commented Feb 7, 2022

As you can see in the PR, I opted for a regular math.leafCount() function because I thought that would be easier to pass as an option value to simplify() than a class method, presuming we get to the point that the objective function for simplification can be specified.

josdejong pushed a commit that referenced this issue Feb 16, 2022
This function returns the number of leaves in the parse tree of an
  expression. The motivation is to provide an initial complexity measure
  that could be used to decide whether or not to apply some simplification
  rules. Docs, embedded docs, and test cases are provided.

  Resolves #2389.
@josdejong
Copy link
Owner

Yes, I like it, thanks Glen!

Looking forward to the next step, actually using the function to do smart things in simplify 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants