Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed evaluate=False from the Lark-based LaTeX parser #25515

Merged
merged 8 commits into from Aug 16, 2023

Conversation

wermos
Copy link
Contributor

@wermos wermos commented Aug 15, 2023

References to other Issues or PRs

Addresses #24116 for the Lark-based LaTeX parser.

Brief description of what is fixed or changed

Originally, the ANTLR LaTeX parser returned unevaluated expressions. There were concerns that unevaluated expressions can create all sorts of issues , so we wanted to change this behavior.

I made it so that both evaluate=False and evaluate=True work with the Lark-based LaTeX parser, with evaluate=True being the default.

Other comments

Release Notes

  • parsing
    • The Lark-based LaTeX parser now evaluates the input string and returns the result. For example, "\sqrt{\frac{12}{6}}" will re turn \sqrt(2) now. The previous behavior can achieved by using the evaluate(False) context manager.

Added evaluated expression pair lists for simple expressions, fractional expressions, and power expressions.

Prepended "unevaluated" to all the existing expression lists.

Removed a few unnecessary tests from `MISCELLANEOUS_EXPRESSION_PAIRS` because they test for features we won't support.

Added tests for the evaluated expression pair lists for simple expressions and fractional expressions.
…uare root expressions, factorial expressions, sum expressions, and common function expressions, and binomial expressions.
@sympy-bot
Copy link

sympy-bot commented Aug 15, 2023

Hi, I am the SymPy bot (v170). I'm here to help you write a release notes entry. Please read the guide on how to write release notes.

Your release notes are in good order.

Here is what the release notes will look like:

  • parsing
    • The Lark-based LaTeX parser now evaluates the input string and returns the result. For example, "\sqrt{\frac{12}{6}}" will re turn \sqrt(2) now. The previous behavior can achieved by using the evaluate(False) context manager. (#25515 by @wermos)

This will be added to https://github.com/sympy/sympy/wiki/Release-Notes-for-1.13.

Click here to see the pull request description that was parsed.
<!-- Your title above should be a short description of what
was changed. Do not include the issue number in the title. -->

#### References to other Issues or PRs
<!-- If this pull request fixes an issue, write "Fixes #NNNN" in that exact
format, e.g. "Fixes #1234" (see
https://tinyurl.com/auto-closing for more information). Also, please
write a comment on that issue linking back to this pull request once it is
open. -->
Addresses #24116 for the Lark-based LaTeX parser.

#### Brief description of what is fixed or changed

Originally, the ANTLR LaTeX parser returned unevaluated expressions. There were concerns that unevaluated expressions can [create all sorts of issues ](https://github.com/sympy/sympy/issues/24116#issue-1398475575), so we wanted to change this behavior.

I made it so that both `evaluate=False` and `evaluate=True` work with the Lark-based LaTeX parser, with `evaluate=True` being the default.

#### Other comments


#### Release Notes

<!-- Write the release notes for this release below between the BEGIN and END
statements. The basic format is a bulleted list with the name of the subpackage
and the release note for this PR. For example:

* solvers
  * Added a new solver for logarithmic equations.

* functions
  * Fixed a bug with log of integers. Formerly, `log(-x)` incorrectly gave `-log(x)`.

* physics.units
  * Corrected a semantical error in the conversion between volt and statvolt which
    reported the volt as being larger than the statvolt.

or if no release note(s) should be included use:

NO ENTRY

See https://github.com/sympy/sympy/wiki/Writing-Release-Notes for more
information on how to write release notes. The bot will check your release
notes automatically to see if they are formatted correctly. -->

<!-- BEGIN RELEASE NOTES -->
* parsing
    * The Lark-based LaTeX parser now evaluates the input string and returns the result. For example, `"\sqrt{\frac{12}{6}}"` will re turn `\sqrt(2)` now. The previous behavior can achieved by using the `evaluate(False)` context manager.
<!-- END RELEASE NOTES -->

Update

The release notes on the wiki have been updated.

@wermos
Copy link
Contributor Author

wermos commented Aug 15, 2023

cc @sylee957 and @Upabjojr for review

…ols instead of the `sympy` functions like `Add`, `Mul` and `Pow`.
@github-actions
Copy link

github-actions bot commented Aug 15, 2023

Benchmark results from GitHub Actions

Lower numbers are good, higher numbers are bad. A ratio less than 1
means a speed up and greater than 1 means a slowdown. Green lines
beginning with + are slowdowns (the PR is slower then master or
master is slower than the previous release). Red lines beginning
with - are speedups.

Significantly changed benchmark results (PR vs master)

Significantly changed benchmark results (master vs previous release)

       before           after         ratio
     [8059df73]       [77010f3b]
     <sympy-1.12^0>                 
-      87.1±0.8ms       56.5±0.7ms     0.65  integrate.TimeIntegrationRisch02.time_doit(10)
-      87.1±0.6ms       55.0±0.4ms     0.63  integrate.TimeIntegrationRisch02.time_doit_risch(10)
-     2.11±0.01ms        655±0.9μs     0.31  polys.TimePREM_LinearDenseQuadraticGCD.time_op(3, 'sparse')
-     10.3±0.02ms      1.95±0.01ms     0.19  polys.TimePREM_LinearDenseQuadraticGCD.time_op(5, 'sparse')
-       366±0.8μs       81.6±0.3μs     0.22  polys.TimePREM_QuadraticNonMonicGCD.time_op(1, 'sparse')
-     4.82±0.01ms        362±0.9μs     0.08  polys.TimePREM_QuadraticNonMonicGCD.time_op(3, 'sparse')
-      10.6±0.4ms         1.09±0ms     0.10  polys.TimePREM_QuadraticNonMonicGCD.time_op(5, 'sparse')
-     6.25±0.01ms      3.94±0.01ms     0.63  polys.TimeSUBRESULTANTS_LinearDenseQuadraticGCD.time_op(2, 'sparse')
-     27.2±0.02ms      12.0±0.02ms     0.44  polys.TimeSUBRESULTANTS_LinearDenseQuadraticGCD.time_op(3, 'sparse')
-     6.97±0.03ms         1.17±0ms     0.17  polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(1, 'sparse')
-     16.1±0.02ms      9.16±0.01ms     0.57  polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(2, 'sparse')
-      210±0.08ms       70.5±0.1ms     0.34  polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(3, 'sparse')
-     6.37±0.01ms          534±2μs     0.08  polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(3, 'sparse')
-     27.7±0.04ms          856±1μs     0.03  polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(5, 'sparse')
-         611±2μs        197±0.9μs     0.32  polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(1, 'sparse')
-     6.46±0.01ms        202±0.7μs     0.03  polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(3, 'sparse')
-     17.0±0.02ms        206±0.8μs     0.01  polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(5, 'sparse')
-       161±0.3μs       88.5±0.2μs     0.55  solve.TimeMatrixOperations.time_rref(3, 0)
-       309±0.4μs        107±0.4μs     0.35  solve.TimeMatrixOperations.time_rref(4, 0)
-      31.7±0.1ms      13.7±0.05ms     0.43  solve.TimeSolveLinSys189x49.time_solve_lin_sys

Full benchmark results can be found as artifacts in GitHub Actions
(click on checks at the top of the PR).

@wermos
Copy link
Contributor Author

wermos commented Aug 15, 2023

@sylee957 Some tests for the evaluate=False portion are failing. Could it be due to the changes I made like converting sympy.Add to + and sympy.Mul to * and so on?

…cal symbols instead of the `sympy` functions like `Add`, `Mul` and `Pow`."

This reverts commit b703a70.
@wermos
Copy link
Contributor Author

wermos commented Aug 15, 2023

@sylee957 I've reverted the change that you suggested because tests were failing due to it. I'm not convinced that we should remove support for evaluate=False yet, so it's important that we support both for now.

@sylee957
Copy link
Member

sylee957 commented Aug 15, 2023

The reason that we don’t have to support that is because evaluate(false) context manager can support that behavior orthogonally.

I don’t think that we should really add any options for sake of the future. And we are also very unsure supporting evaluate=False in the future as well, because that doesn’t work with lots of other sympy features, so I don’t want to make the surface area more larger for maintanance.

As you see, you always have to drill evaluate=False everywhere to support that. This is a mess, and although drilling down the property seems easy, it is one of the red flag that you are not factoring out the features well.

@wermos
Copy link
Contributor Author

wermos commented Aug 15, 2023

The reason that we don’t have to support that is because evaluate(false) context manager can support that behavior orthogonally.

When I removed the sympy.Add constructor call and added + to the transformer for the add node, the tests were failing. Things like "1 + 1" were being parsed as "2" even with evaluate=False. So I don't think that the evaluate(False) context manager is working as intended in those cases. I'm not sure if this is a bug in my code or something to do with the context manager.

And we are also very unsure supporting evaluate=False in the future as well, because that doesn’t work with lots of other sympy features, so I don’t want to make the surface area more larger for maintanance.

That's a good point. While I personally have no issue with making the change, I'd feel better if we got Aaron and Francesco's thoughts on this. Perhaps we can discuss the matter on the Matrix chat. I'll bring it up there, and also cc them here.

cc @asmeurer @Upabjojr for thoughts: Do we want to support evaluate=False at all?

@sylee957
Copy link
Member

sylee957 commented Aug 15, 2023

I also recognize that some of the code path is not consistently returning SymPy objects


return int(tokens[0])

>>> from sympy import *
>>> from sympy.parsing.latex.lark import parse_latex_lark
>>> parse_latex_lark('1').__class__
<class 'int'>

My opinion is that these should be the analysis why x + y and Add(x, y) behave differently, and I also suggest to change that to return sympy Integer consistently, such that SymPy parser always returns sympy object anyway, and users don't get some surprise.

@oscarbenjamin
Copy link
Contributor

The reason that we don’t have to support that is because evaluate(false) context manager can support that behavior orthogonally.

I think that evaluate(false) is something that should be discouraged except when used around a block that does nothing more than create a simple expression so something like

with evaluate(False):
      expr = parse_latex(text)

is probably a bad idea. The evaluate(False) might break all kinds of things if any part of the parse_latex implementation tries to create expressions as part of computing anything.

Writing any nontrivial code for the internals of SymPy that could reasonably be expected to work under the impact of evaluate(False) is not easy.

@sylee957
Copy link
Member

sylee957 commented Aug 16, 2023

The evaluate(False) might break all kinds of things if any part of the parse_latex implementation tries to create expressions as part of computing anything.

Passing the keyword evaluate=False everywhere has same effect, and breaks the form of the expression if someone really changes the transformer code. I'd try to avoid the worst to make the code more complicated, and in fact, it was more worse inconsistent before, for example, I recognize that things like Eq didn't get evaluate=evaluate consistently propagated.

I really don't get what's the problem of shoving the option evaluate=False, unless someone pays million dollar to implement evaluate=False option here, and even if it so, I'd try to keep the code simple such that we can find a better way of reusability

In fact, I don't really had a need to use evaluate=False when parsing SymPy expressions,
and I really don't want to make parser more complicated with options, than a thin wrapper of lark with grammar and transformer.

@oscarbenjamin
Copy link
Contributor

Passing the keyword evaluate=False everywhere has same effect

It is not the same effect because using evaluate(False) applies globally to all expressions that are created e.g.:

In [5]: cat t.py
def func(a, b, evaluate=None):
    stuff1 = a + b
    stuff2 = Add(a, b, evaluate=evaluate)
    return stuff1, stuff2

In [8]: func(x, x)
Out[8]: (2x, 2x)

In [9]: func(x, x, evaluate=False)
Out[9]: (2x, x + x)

In [10]: with evaluate(False):
    ...:     y = func(x, x)
    ...: 

In [11]: y
Out[11]: (x + x, x + x)

I agree that there are problems with unevaluated expressions but I think that evaluate(False) is much worse than passing evaluate=False explicitly because it can break any code that does not expect it from outside. I think that the use of evaluate(False) should definitely not be encouraged. If we want to support non-evaluation then passing evaluate=False should be the way to do it.

@sylee957
Copy link
Member

I ask that high level functions of SymPy shouldn't that SymPy needs evaluate=False option at all,
or it is better to make the SymPy classes don't evaluate by default, or have same interface of constructing the objects,
than implementing options outside that supports the behavior.

It already looks like having that as user option, is very unsatisfactory because the behavior it is complicated if evaluate=False is used shallowly or deeply.
and I'd rather introduce users with DIY approach, than using options of SymPy functions that skeptically implements that.

@sylee957
Copy link
Member

sylee957 commented Aug 16, 2023

And anyway I think that your example is artificial though
Semantically, if I see code like that, I rather criticize the code about what the *** are you using + than using Add(a, b) if you support evaluate= option.
(And in fact, using context manager had corrected some other inconsistency that you can't pass evaluate=False to +)

And in fact, this shows the reason that having options about evaluate in SymPy high level functions, is very prone to mistakes, and we shouldn't do it.

def func(a, b, evaluate=None):
    stuff1 = a + b
    stuff2 = Add(a, b, evaluate=evaluate)
    return stuff1, stuff2

@sylee957
Copy link
Member

sylee957 commented Aug 16, 2023

The major problem about this is the roadmap problem, than its implementation detail.
And we really don't need to support legacy options like evaluate=False at all because we are building something new.

The real problem about roadmap is that we are not sure about we are not sure about how SymPy is going to handle eager evaluation,
And we have no real analysis about if that option is wanted from users, or that is worth the cost, or if users are even more unhappy about that sort of controls and want more.
(For example, we had a request before about the evaluation could be configured more selectively, and that will be nightmare for you because the option you think, isn't general enough to be happy)

You should drop The previous behavior can achieved by using the evaluate(False) context manager. from the release notes if that also isn't worth to be announced.

@oscarbenjamin
Copy link
Contributor

I think that we should take this discussion to another issue that is specifically for discussing what to do about evaluate=False etc.

I do want to respond to this though:

And we have no real analysis about if that option is wanted from users

It is very clear that many users want the option to prevent automatic evaluation somehow and that they would like SymPy to be able to make that work without everything just breaking. Especially there are many users who want this in the context of parsing. Those users really want to be able to parse an expression and get something that looks like an exact representation of what was parsed and that can then be printed back out in such a way that still looks exactly the same so e.g. (modulo whitespace, and braces etc) they want the following invariant:

>>> latex(parse_latex(string)) == string
True

@wermos
Copy link
Contributor Author

wermos commented Aug 16, 2023

I think that we should take this discussion to another issue that is specifically for discussing what to do about evaluate=False etc.

Just thought I'd chime in: The relevant issue for that is #24116.

@sylee957
Copy link
Member

sylee957 commented Aug 16, 2023

My opinion is that ‘we start minimal’
Adding things that seems trivial can be done anytime, but removing them because that doesn’t look good is more difficult.
I don’t want to make the discussion more longer because of attempts to shove the option here.

However, as far as I have done during previous job, which actually had involved parsing SymPy expressions unevaluated.

I’ve just done by referencing the Transformer and overriding the rules individually,
Users know how to code, so we don’t have to supply everything. In fact, just a thin wrapper of transformer and grammar is enough for it, and just using evaluate=False was somewhat limited for me.

@wermos
Copy link
Contributor Author

wermos commented Aug 16, 2023

they want the following invariant:

>>> latex(parse_latex(string)) == string

While I got the sentiment, I'd like to point out that, as it's stated, this is not a meaningful metric:

>>> string = r"\int\limits_0^1 \sin x dx"
>>> sympy_repr = parse_latex(string)
>>> print(sympy_repr)
Integral(sin(x), (x, 0, 1))
>>> latex_string = latex(sympy_repr)
>>> print(latex_string)
\int\limits_{0}^{1} \sin{\left(x \right)}\, dx
>>> latex_string == string
False

@sylee957
Copy link
Member

I may request to fix the inconsistency issue of integers here

#25515 (comment)

Try to change int to Integer. We should always return SymPy objects anyway.
Python numbers behave differently in division.

@wermos
Copy link
Contributor Author

wermos commented Aug 16, 2023

I may request to fix the inconsistency issue of integers here

Alright, I'll open a separate PR for that. I was originally planning to do that as a part of this PR, but it seems that we have a few blocking discussions that we need to resolve before this PR can find its way into master.

@sylee957
Copy link
Member

I don’t think that this PR is blocking though. We agree to drop the evaluate=False as default, and we don’t have to add options here.

@wermos
Copy link
Contributor Author

wermos commented Aug 16, 2023

I don’t think that this PR is blocking though. We agree to drop the evaluate=False as default, and we don’t have to add options here.

And what about sympy.Add vs +, etc.?

@sylee957
Copy link
Member

I think that you need to fix the issues about integer returns before trying that.
i think that there was a gain from testing out the arithmetic operator though, it revealed such hidden issue

@sylee957 sylee957 merged commit 229da78 into sympy:master Aug 16, 2023
57 checks passed
@asmeurer
Copy link
Member

Should the issue be closed?

@asmeurer
Copy link
Member

Passing evaluate=False through everywhere does suck, because it's not the default so it has to manually be disabled everywhere. It would be much better if the classes just didn't evaluate, so the parser could just use Add(...), Mul(...) and so on, then have a single call to doit() at the end to evaluate everything. This is how things work in the matrix expressions, but not anywhere else. Changing this design in the core is not easy, though. The best we have right now is the evaluate(False) context manager which basically forces the core to work like this. Of course, we could have also fixed this by keeping the evaluate=False everywhere, but just evaluating by default at the end, and having a flag to disable it. But this fix ends up being cleaner and you can still achieve the same result with evaluate(False).

At any rate, I don't see an issue with with evaluate(False): parse_latex(...) as a supported pattern. evaluate(False) breaks things that try to do symbolic computation, but the latex parser shouldn't be doing that, and if it does it's probably a design mistake. It's probably worth at least having some tests for this. In particular, we do need to make sure that we are constructing expressions that would exactly correspond to the input when evaluate(False) is used, and that we aren't doing any symbolic math inside of the parser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants