Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expression builder pattern for evaluation control #25620

Open
sylee957 opened this issue Sep 2, 2023 · 11 comments
Open

Expression builder pattern for evaluation control #25620

sylee957 opened this issue Sep 2, 2023 · 11 comments
Labels
evaluate unevaluated Issues related to unevaluated expressions

Comments

@sylee957
Copy link
Member

sylee957 commented Sep 2, 2023

Problems of evaluate=False

Passing evaluate=evaluate causes mistake

I have some examples taken from the ongoing development of LaTeX parser:

    def relation(self, tokens):
        relation_type = tokens[1].type
        if relation_type == "EQUAL":
            return sympy.Eq(tokens[0], tokens[2])
        ...

    def add(self, tokens):
        return sympy.Add(tokens[0], tokens[2], evaluate=False)

where all other methods have evaluate implemented,
but forgot to implement at Eq

And also, the other mistake not easy to notice is that:

        elif len(tokens) == 3:
            return sympy.Add(tokens[0], -tokens[2], evaluate=False)

    def mul(self, tokens):
        if len(tokens) == 2:
            return sympy.Mul(tokens[0], tokens[1], evaluate=False)
        elif len(tokens) == 3:
            return sympy.Mul(tokens[0], tokens[2], evaluate=False)

where unary negation -tokens[2] is used inside sympy.Add(tokens[0], -tokens[2], evaluate=False),
which causes that specific part to ignore the option and evaluate regardless.

So the major problem is

  • evaluate keyword argument is optional, so someone can forget to implement that. It is logical error and cannot be detected by any means of lint, type checking, or runtime error.
  • evaluate keyword cannot be used with arithmetic operators, +, -, *, so we can't use arithmetic operators of SymPy in this context. If someone is using that, it is always an logical error.

Passing evaluate=evaluate is verbose

For example

def dot_product(v1: list, v2: list):
    return Add(*(Mul(x, y) for x, y in zip(v1, v2)))

becomes

def dot_product(v1: list, v2: list, evaluate=True):
    return Add(*(Mul(x, y, evaluate=evaluate) for x, y in zip(v1, v2)), evaluate=evaluate)

And this has 4 different sympy functions, so

def pythagorian(x):
    return Add(Pow(sin(x), 2), Pow(cos(x), 2))

becomes much more worse like:

def pythagorian(x, evaluate=False):
    return Add(Pow(sin(x, evaluate=evaluate), 2, evaluate=evaluate), Pow(cos(x, evaluate=evaluate), 2, evaluate=evaluate))

because there are 4 different sympy functions used,
And there isn't any way to eliminate the verbosity, in any means, either by assigning to temporarily variables

sinxpow2 = Pow(sin(x, evaluate=evaluate), 2, evaluate=evaluate)
cosxpow2 = Pow(cos(x, evaluate=evaluate), 2, evaluate=evaluate)
return Add(...)

or either by assigning to temporary lambda

mysin = lambda x: sin(x, evaluate=False)
mycos = ...
...
return myadd(mypow(...), mypow(...))

Suggestion: Expression builder

Introduction

  1. Try to write the following base classes sympy/sandbox/abstract.py
from typing import Protocol
from sympy import Expr


class AritBuilder(Protocol):
    def add(self, *args) -> Expr:
        ...

    def mul(self, *args) -> Expr:
        ...

    def pow(self, base, expo, /) -> Expr:
        ...


class FunctionBuilder(AritBuilder, Protocol):
    def sin(self, arg, /) -> Expr:
        ...

    def cos(self, arg, /) -> Expr:
        ...

    def sqrt(self, arg, /) -> Expr:
        ...

    def log(self, base, expo, /) -> Expr:
        ...

Although they use the type annotations, it is not scary and can always be optional
You can switch to Any than Expr if SymPy has some problems of inferring some types.

  1. Try to write the modules

sympy/sandbox/evaluatefalse.py

import sympy

def add(*args):
    return sympy.Add(*args, evaluate=False)

def mul(*args):
    return sympy.Mul(*args, evaluate=False)

def pow(base, expo):
    return sympy.Pow(base, expo, evaluate=False)

def sin(arg):
    return sympy.sin(arg, evaluate=False)

def cos(arg):
    return sympy.cos(arg, evaluate=False)

def sqrt(arg):
    return sympy.sqrt(arg, evaluate=False)

def log(base, expo):
    return sympy.log(base, expo, evaluate=False)

sympy/sandbox/evaluatetrue.py

import sympy

def add(*args):
    return sympy.Add(*args)

def mul(*args):
    return sympy.Mul(*args)

def pow(base, expo):
    return sympy.Pow(base, expo)

def sin(arg):
    return sympy.sin(arg)
# sin = sympy.sin
# from sympy import sin

def cos(arg):
    return sympy.cos(arg)

def sqrt(arg):
    return sympy.sqrt(arg)

def log(base, expo):
    return sympy.log(base, expo)

What we done so far, is that we split the sympy functions into evaluate=False and evaluate=True.

  1. Write your functions that supports the optional evaluation

For example, dot_product can be implemented as:

from sympy.sandbox.abstract import AritBuilder, FunctionBuilder
from sympy.sandbox import evaluatefalse, evaluatetrue
import sympy


def dot_product(B: AritBuilder, v1: list, v2: list):
    return B.add(*(B.mul(x, y) for x, y in zip(v1, v2)))


dot_product(evaluatefalse, [1, 2, 3], [4, 5, 6])
# 1*4 + 2*5 + 3*6
dot_product(evaluatetrue, [1, 2, 3], [4, 5, 6])
# 32

The thing you notice is that module itself, goes to the function signature
and it allows that you can just switch the modules like evaluatefalse to give different effects of evaluation.

Similarly, the pythagorian identity function can also be implemented like this

def pythagorian(B: FunctionBuilder, x):
    return B.add(B.pow(B.sin(x), 2), B.pow(B.cos(x), 2))

pythagorian(evaluatefalse, sympy.S.Pi / 3)
# cos(pi/3)**2 + sin(pi/3)**2
pythagorian(evaluatetrue, sympy.S.Pi / 3)
# 1

The only thing you notice is that everything is changed to B.add, B.mul, ...
which may slightly be verbose, however, it should be familar for some people who may already used sympy.sin, sympy.cos.

However, the point I have is that, if we use evaluate=False, it should always be applied simultaneously, all true or all false.
So we can always substitute that keyword argument, with some form of object oriented programming, to encapsulate that keyword argument.

Customization

We can also give the very general solution for user customizability,

For example, if some user want to evaluate only on sin, cos, pow but not on other functions,
users can customize the behavior very generally, and SymPy have little cost to support that generality.

sympy/sandbox/evaluatesome.py

from sympy.sandbox.evaluatefalse import add, mul, sqrt, log
from sympy.sandbox.evaluatetrue import sin, cos, pow

__all__ = ['add', 'mul', 'pow', 'sin', 'cos', 'sqrt', 'log']
from sympy.sandbox import evaluatesome

pythagorian(evaluatesome, sympy.S.Pi / 3)
# 1/4 + 3/4

Users may be able to do much more general thing, about changing the algebra of SymPy itself, than a simple control of evaluation, like:

sympy/sandbox/evaluatetropical.py

import sympy

def add(*args):
    return sympy.Min(*args)

def mul(*args):
    return sympy.Add(*args)

def pow(base, n, /):
    return mul(*([base] * n))
from sympy.sandbox import evaluatetropical

dot_product(evaluatetropical, [1, 2, 3], [4, 5, 6])
# 5

Advantages

Safety:

  • The pattern keeps everything about evaluate= cohesively in one place, like in a module or in a class, which makes it less prone to mistakes.
  • This pattern is strongly type checkable, and can be more powerful if SymPy is static typed. If you can play around with pyright or mypy with the example, you may notice that it quickly raises warnings about type incompatibility, if evaluate... does not have some required function, or functions does not match the number of arguments, return types, ...

Generality:

  • We achieve very general customization, and users can get detailed hint about how to customize that, if we use type system.

Performance:

  • Except for having some additional function calls, there should be little or no performance cost for using this.

Verbosity:

  • Although it seems like we need to write every 2 variants of SymPy functions, which evaluates and which does not,
    however, if we properly support evaluate=False in SymPy, like in example return Add(Pow(sin(x, evaluate=evaluate), 2, evaluate=evaluate), Pow(cos(x, evaluate=evaluate), 2, evaluate=evaluate))
    there is no way to eliminate the verbosity in any means,
    So keeping the variants in one place, like modules, is optimal in that respect.

Backward Compatibility:

  • It does not need to change any of the existing SymPy classes, and subsumes the existing evaluate=False option.

Notes

  • The usage of module systems, Protocol may look esoteric for some readers.
    I can easily implement my suggestion with traditional inheritance, ABC, ... however, I didn't like to do that
    because I think that protocols and module systems are minimal with respect to other object-oriented solutions, and keeps many interesting properties (being singleton, need no construction, …) that starts to break when you use traditional object oriented programming.

  • Although we can implement in more dynamic way, like inspecting sympy module by __dir__ or __dict__, this can have some drawback (like losing the static type systm) and we can encounter other issues that happens dynamically, like false positives, naming conflicts, cyclic imports.
    So I'd just suggest to define the builder functions for every list of sympy functions. Even if it looks a bit stupid, it should be very foolhardy.

  • We can try to substitute bad function namings of SymPy, like class sin, class Heaviside, and make them consistent with snake_case if someone implements this.

This is my technical suggestion to replace evaluate(False).
I may receive any questions, if this fits the implementation like making latex_parser more cleaner,
or could apply in dirty corners of evaluate in SymPy.

@sylee957 sylee957 added evaluate unevaluated Issues related to unevaluated expressions labels Sep 2, 2023
@oscarbenjamin
Copy link
Collaborator

This is my technical suggestion to replace evaluate(False).
I may receive any questions, if this fits the implementation like making latex_parser more cleaner,

I think something like this is exactly what is needed for end users to specify how to create unevaluated expressions. We should add something like this now and make use of it for things like parse_latex if that helps to support parsing without evaluation. It can also just be used internally so the user still does

# uses ExprBuilder internally:
expr = parse_latex(text, evaluate=True/False)

Ultimately what is needed though is that an ExprBuilder could return an expression type with none of its methods using automatic evaluation e.g. this creates an expression without evaluation

def add(*args):
    return sympy.Add(*args, evaluate=False)

but all of the methods like subs etc would still evaluate. Longer term I would want to have an analogue of ExprBuilder that returns expressions that only ever evaluate explicitly (e.g. with .doit or other more fine grained methods). This can just be done by making a separate ExprBuilder for true unevaluated expressions though so there is no problem in adding an ExprBuilder like what is suggested above and then also adding more different builders later. The important point for now is just how the interface looks from an end user perspective.

@Upabjojr
Copy link
Contributor

Upabjojr commented Sep 4, 2023

Usage of nested lists is way easier to use and maintain, i.e. Mul(x, y) ==> [Mul, x, y].

Have a look at the Mathematica parser.

@sylee957
Copy link
Member Author

sylee957 commented Sep 4, 2023

I think that the S-expressions are more useful for serializing the symbolic expressions.
For example, MathJSON uses the idea of S-expressions.

For parsers, or evaluation controls, I'm afraid that S-expression can just be making another intermediate language that needs to be evaluated later, and this may not may not solve the problem.
I think that it's just the dictionary that defines the evaluation for mathematica parser

_node_conversions = {
"Times": Mul,
"Plus": Add,
"Power": Pow,
"Log": lambda *a: log(*reversed(a)),
"Log2": lambda x: log(x, 2),
"Log10": lambda x: log(x, 10),
"Exp": exp,
"Sqrt": sqrt,
"Sin": sin,
"Cos": cos,
"Tan": tan,
"Cot": cot,
"Sec": sec,
"Csc": csc,
"ArcSin": asin,
"ArcCos": acos,
"ArcTan": lambda *a: atan2(*reversed(a)) if len(a) == 2 else atan(*a),
"ArcCot": acot,
"ArcSec": asec,
"ArcCsc": acsc,
"Sinh": sinh,
"Cosh": cosh,
"Tanh": tanh,
"Coth": coth,
"Sech": sech,
"Csch": csch,
"ArcSinh": asinh,
"ArcCosh": acosh,
"ArcTanh": atanh,
"ArcCoth": acoth,
"ArcSech": asech,
"ArcCsch": acsch,
"Expand": expand,
"Im": im,
"Re": sympy.re,
"Flatten": flatten,
"Polylog": polylog,
"Cancel": cancel,
# Gamma=gamma,
"TrigExpand": expand_trig,
"Sign": sign,
"Simplify": simplify,
"Defer": UnevaluatedExpr,
"Identity": S,
# Sum=Sum_doit,
# Module=With,
# Block=With,
"Null": lambda *a: S.Zero,
"Mod": Mod,
"Max": Max,
"Min": Min,
"Pochhammer": rf,
"ExpIntegralEi": Ei,
"SinIntegral": Si,
"CosIntegral": Ci,
"AiryAi": airyai,
"AiryAiPrime": airyaiprime,
"AiryBi": airybi,
"AiryBiPrime": airybiprime,
"LogIntegral": li,
"PrimePi": primepi,
"Prime": prime,
"PrimeQ": isprime,
"List": Tuple,
"Greater": StrictGreaterThan,
"GreaterEqual": GreaterThan,
"Less": StrictLessThan,
"LessEqual": LessThan,
"Equal": Equality,
"Or": Or,
"And": And,
"Function": _parse_Function,
}

and the difference between this or lark transformer, or mathematica parser, is very subtle like dictioary vs function vs object vs ...

@asmeurer
Copy link
Member

asmeurer commented Sep 5, 2023

People are going to want to be able to create expressions with operators. No one wants to type out Add(Pow(sin(x), 2), Pow(cos(x), 2)) instead of sin(x)**2 + cos(x)**2.

@oscarbenjamin
Copy link
Collaborator

People are going to want to be able to create expressions with operators.

For that true unevaluated expressions are needed i.e. expressions whose methods do not evaluate (so it should not be Basic).

No one wants to type out Add(Pow(sin(x), 2), Pow(cos(x), 2)) instead of sin(x)**2 + cos(x)**2

Actually in many contexts like e.g. the LaTeX parser you do generally want to handle all of these programmatically and so this is fine. Many users would not like it but anyone doing programmatic things would probably be fine with it and it would still be useful within SymPy itself.

@Upabjojr
Copy link
Contributor

Upabjojr commented Sep 6, 2023

For that true unevaluated expressions are needed i.e. expressions whose methods do not evaluate (so it should not be Basic).

There have long been suggestions to make SymPy classes unevaluated by default. Unfortunately this requires a lot of core rewriting and is unlikely to happen.

Matthew Rocklin made some attempts at creating a parallel system of expressions costructors and destructors years ago, have a look at the unify module.

I believe the easiest path to unevaluated expressions is to just use lists.

@oscarbenjamin
Copy link
Collaborator

There have long been suggestions to make SymPy classes unevaluated by default. Unfortunately this requires a lot of core rewriting and is unlikely to happen.

I don't think that we can just make the classes unevaluated by default because they are widely used and it would break compatibility. What is needed is a class that represents inert non-evaluating expressions that is separate from the existing classes. Evaluated expressions (which many users do want even if they are not always desirable) can be implemented over the top of unevaluating expressions.

@sylee957
Copy link
Member Author

sylee957 commented Sep 6, 2023

No one wants to type out Add(Pow(sin(x), 2), Pow(cos(x), 2)) instead of sin(x)**2 + cos(x)**2.

There is also another gotcha or limitation of using arithmetic under evaluate(False).
For example, Python always puts the addition in left-associated forms:

with evaluate(False):
    expr = a + b + c + d
sstr(expr, order='none')
# '((a + b) + c) + d'

And it is not possible to generate Add(a, b, c, d) with any means of using parenthesis

@asmeurer
Copy link
Member

asmeurer commented Sep 6, 2023

That's another good point. I think what we would really want for evaluate(False) is a macro system, that rewrites the code under the block. That isn't actually possible in Python, though, so we just fake it. It would probably be possible to do this with something like https://github.com/lihaoyi/macropy, but that's an extension to the language, so I don't think we really want to go down that path.

Actually, even with a completely unevaluated core, it's impossible to distinguish x + y + z and (x + y) + z. They're syntactically equivalent. So the core would have to choose between always left associate or always flatten for expressions created with operators.

@oscarbenjamin
Copy link
Collaborator

Actually, even with a completely unevaluated core, it's impossible to distinguish x + y + z and (x + y) + z. They're syntactically equivalent. So the core would have to choose between always left associate or always flatten for expressions created with operators.

With unevaluated expressions the Python expression x + y + z should just become Add(Add(x, y), z) because that literally represents the way that the Python expression evaluates. There are many situations where flattening is not desirable. For one thing if you want the expression to represent an actual computation then in most contexts addition is necessarily computed as a binary operation. In e.g. floating point addition is not associative so there is a need to be able to distinguish (x + y) + z and x + (y + z). A completely unevaluated core should not flatten automatically.

There can be an explicit flatten function for flattening and other explicit methods that do obvious things like 0 + x -> x etc. You would have a bunch of different elementary functions like that but then also convenience functions that combine the most commonly desired combinations including one function that does the equivalent of the evaluation that the automatically evaluating expressions would do.

@Upabjojr
Copy link
Contributor

Upabjojr commented Sep 8, 2023

With unevaluated expressions the Python expression x + y + z should just become Add(Add(x, y), z) because that literally represents the way that the Python expression evaluates.

Well, Add(Add(x, y), z) and Add(x, y, z) are equivalent mathematically. Though I would recommend keeping the possibility of distinct expression trees for the two expressions, because we may wish to apply structural tree-algorithms in some cases, so distinguishing a node with three arguments from a nested node may still be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
evaluate unevaluated Issues related to unevaluated expressions
Projects
None yet
Development

No branches or pull requests

4 participants