Skip to content
This repository has been archived by the owner on Sep 26, 2021. It is now read-only.

sync elementary assumptions on expressions with Sage #81

Closed
rwst opened this issue Jul 20, 2015 · 8 comments
Closed

sync elementary assumptions on expressions with Sage #81

rwst opened this issue Jul 20, 2015 · 8 comments
Labels
Milestone

Comments

@rwst
Copy link
Member

rwst commented Jul 20, 2015

Ignoring Sage's assume statements is a bug.

sage: assume(x>0)
sage: x.is_positive()
False

sage: u = function('u',x); y = x*u; DE = x*diff(y,x) == y + sqrt(x**2 + y**2)
sage: desolve(DE, u)
x == _C*e^(x*arcsinh(x^2*u(x)/sqrt(x^4))/sqrt(x^2))
sage: assume(x>0)
sage: desolve(DE, u)
x == _C*e^(arcsinh(x^2*u(x)/sqrt(x^4)))
(which is a bug and should give...)
x == _C*e^arcsinh(u(x))

The Sage side is http://trac.sagemath.org/ticket/19035
As to assumptions without Maxima, try to leave most of the handling in Sage, just copy to Pynac and use. Although this concerns Sage only(?), let's have a look at what SymPy does beforehand.
sympy/sympy#2508
https://github.com/sympy/sympy/wiki/Assumptions

In principle the full spectrum of hypothesis-testing methods ranges from simple flag/binary result (like in GiNaC) to a full blown automatic proof system with Yes/No/undecidable/unknown, trading speed for correctness/knowledge. Also, there seems no way to "just have a fast system that does not know much, but knows when it knows nothing", i.e. a fast system without false negatives.

SymPy's new assumption code uses SAT algorithms and a knowledge base with a facade decoupled from the objects in question; it still hard-coded meta facts like Pynac; it gives back True/False/None; their Python implementation reportedly is already too slow even before they will include relation handling. Also (from shiprabanga's comment from Apr 5 2014) the newest implementation does not distinguish between known undecidable and unknown, both are None.

The additional constraints to any effort in Pynac are:

  • compatibility with Sage/Maxima assumptions (symbolic/assumptions.py); both systems have to coexist until Maxima is completely removed, i.e. never, because it will certainly be kept as alternative
  • any restriction from Python functions? probably not

On the up side, with respect to SymPy the API is fixed (symbolic/assumptions.py), there are no aspirations to present reasoning capablities to the user, apart from the Expression.is_... methods, bool(relation) and the assumption consistency check. This way the (slow) buildup of constraints and the consistency check on addition of assumptions is confined to Sage's assume() command while Pynac's finite questioning API should allow for efficient caching of results. Finally the user won't expect bool(rel) to be super fast (it's not atm). This means the consistency check bears the burden of producing useful facts for efficient later questioning. I'm completely speculating here. However, it seems safe to assume we can transfer assumptions from Sage in a useful way as a first step.

@rwst rwst added this to the 0.5 milestone Jul 20, 2015
@rwst rwst changed the title implement assumptions on expressions sync elementary assumptions on expressions with Sage Jul 28, 2015
@rwst rwst added bug and removed enhancement labels Jul 28, 2015
@asmeurer
Copy link

The SymPy assumptions system is still in a state of flux. You should take good ideas from it, but don't assume that it is, in its current state, well-designed or even what we would like to have. I'm definitely convinced in the approach I take in sympy/sympy#2508 in terms of its deductive power, but the performance is such that I still think some of the implementation details could be improved.

I'm fairly convinced that a layered approach is the right way to go. Full featured algorithms should be balanced against faster but less capable algorithms. This is something we are still trying to fully figure out how to do right in SymPy. I suspect that it's probably good to expose this to the user somehow.

In this thread, I talk about distinguishing between mathematical and structural algorithms. You can get away with x == 0 => False by saying that == is defined to mean structural equality. This also lets you say that == always gives True or False. There is no "undecided". This is unlike something like equals() which would do mathematical equality checking, which is undecidable in general.

Probably having separate assumptions "entry points" is smart as well. Sometimes, when you write, say ask(expr > 0), you just want to know if expr is trivially positive (like expr = 1.0), and you actually don't care if it returns "unknown" or "undecided" when it could really be true. In other cases, you want it to try as hard as possible. An answer of "unknown" is undesirable. We are trying to remove usage of assumptions from SymPy's core, so that automatic evaluation of expressions at object creation time doesn't call the assumptions system, which can in general do arbitrary deductions. For instance, we might want sqrt(x**2) to detect when x is positive and return x in that case. But if x is a large expression that takes a lot of deductions to deduce is positive, we don't want to do that when we create the object; we only want to do it if we explicitly call simplify() on it (or in SymPy, refine()). On the other hand, sqrt(pi**2) should probably reduce to pi, because it doesn't take any deduction to know that pi is positive; we just hard-code that fact in the object. There is a line to be drawn somewhere, and I'm not even sure if simple symbolic numbers like pi is the right place in this case, but you get the idea.

Regarding the notion of making a distinction between "known undecidable" and "unknown", for logical purposes, the two are identical. I can't imagine any algorithm that

requires knowledge of which is the case for a None output to be tractable for any practical purposes.

It is possible, I believe, to guess an answer about this, although some additional algorithms are needed which "know" more about the assumptions themselves. What you need is something that will find a model for the given query and its negation. I don't know of any smart algorithms to do this beyond random value checking, perhaps using the SAT solver model as a guide.

This may just be a question of API. For instance, say f(x) is some expression and suppose that solve(f(x) = 0) returns [] (meaning the empty set, or no solutions). Depending on your algorithms, this could mean that f(x) has no zeros, or it could mean that the solve heuristics couldn't find them (or that they can't be represented symbolically, like solve(cos(x) - x = 0)). If you tie this into the assumptions solver, it's probably a useful idea to know which is which. One case means you know Or(f(x) = 0, f(x) != 0) (basically, nothing), and another case means you know f(x) != 0 (the negation of the predicate you are solving for). The solve function needs an API that can return a logical predicate indicating what it knows that it knows about the solution it returns. This is a bit fuzzy because I haven't thought too deeply about it yet. Let me know if you have any thoughts on it.

@rwst
Copy link
Member Author

rwst commented Jul 29, 2015

Have you had a look at SMT solvers? They have all the layers, from boolean to linear to nonlinear and mixed.
http://www.issac-symposium.org/2015/Slides/Abraham.pdf
http://leodemoura.github.io/slides.html
https://en.wikipedia.org/wiki/Satisfiability_modulo_theories
http://www.dagstuhl.de/de/programm/kalender/semhp/?semnr=15471

As to exposing the functionality to users I found a Python interface to these solvers,
https://github.com/pysmt/pysmt

So, while this answers how to find a model for a given query (copy the methods developed by SMT solvers), the question remains how to include the SMT solver functionality in the CAS. In Sage it's probably easier because we are not restricted to Python, but we won't simply attach yet another package either because lot of functionality needed already is there (SAT, Gröbner bases, LP, CAD) or at least optionally installable, no need to duplicate it.

It remains to define the restricted but fast interface for internal use. I'm not clear on this at the moment.

@asmeurer
Copy link

I haven't had time to think too hard about it yet, but I definitely think an SMT solver would make certain things easier.

@asmeurer
Copy link

I just read the Abraham slides. I think this is definitely the way forward. The only thing is we have to make basic deductions very fast (like y^2 + 1 < 0 -> UNSAT from the presentation).

@rwst
Copy link
Member Author

rwst commented Jul 30, 2015

For fast basic deductions we have in GiNaC/Pynac class-wide info member functions that hardcode logic for classes add/mul/power/symbol/function/numeric and the usual flags which are binary, i.e. false can mean unknown. As they are class-wide and not cached they just walk the expression tree for every request, which is fast enough because most of the time for any typical computation (95-99%) is spent outside Pynac, mostly within Python or Maxima.

@asmeurer
Copy link

i.e. false can mean unknown

That's a bad idea. If you do that, then you can't do any kind of non-trivial logical deductions. Remember that x = false is the same as not x = true. So in this case, both true and false can mean "unknown", i.e., actually the other. To put that more practically, say you have a fact x <-> y and you get that x = false. Normally, you'd be able to deduce that y = false as well. If you have a second fact not y -> z you would be able to deduct z = true. But if x = false can also mean x = unknown then you also have z = unknown, i.e., you haven't really deducted anything.

You have to treat true and false and facts and their negations on equal footing. Either can be useful for deducting things, even deducting things in the positive (because facts themselves can have negations in them).

@rwst
Copy link
Member Author

rwst commented Jul 30, 2015

You would be right, but this is really a quite basic implementation. Really only the true answer is trusted and if we would need e.g. to know if something is not real then we would have to implement an irreal info flag. So effectively there are four values, true/unknown for x, true/unknown for not x. The only Sage assumptions that will probably have an effect on this subsystem will be variable/function domain declarations and assumptions of form variable in domain/anonymous function in domain (where domain is from an open but limited vocabulary) and their equivalents as relations like x>0. Everything else I dare to trust to a future SMT solver. This looks limited but (with a recent commit) this

sage: x = var('x', domain='real')
sage: x.is_real()
True
sage: (x^2+1).is_positive()
True
sage: ((1+x+x^2)^100).is_positive()
True
sage: p=((1+x+x^2)^100).expand()
sage: %timeit ret=p.is_positive()
The slowest run took 17.23 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 1.16 µs per loop
sage: %timeit ret=p.is_positive()
The slowest run took 23.00 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 1.04 µs per loop

@asmeurer
Copy link

Separating out "is this true" from "is this false" can potentially be a performance optimization, although I'm personally skeptical about it, one, because you in general need to know both anyway to make deductions, and two, because it introduces some major gotchas.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants