New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proof of concept for the new assumptions #2508
Conversation
This is a basic prototype of a new ask, currently accessible as sympy.assumptions.newask.newask(), that uses satisfiable() entirely to determine the truth or falsehood of the given proposition under the given assumptions. See https://code.google.com/p/sympy/issues/detail?id=3929 for more details on the idea. As a first prototype, this is still in a very proof-of-concept stage. In particular: - I don't reuse any of the existing ask logic or machinery. As such, almost everything that works currently with ask() won't work with newask(). - I have so far only implemented the fact positive implies real. Unless a given proposition can be deduced from this fact, or is a tautology or a contradiction, newask will return None. - I have in no way attempted to make a nice API for this. In fact, the current API is rather terrible: it's just a chain of if statements. But the basic idea is there, namely, that ask() should extract all relevant facts based on the predicates present in the proposition and the assumptions, and then use satisfiable() to determine satisfiable(And(proposition, assumptions, relevant_facts)) and satisfiable(And(~proposition, assumptions, relevant_facts)). If one is True and the other False, then we know that proposition is True (or False); if both are True, we return None, and if both are False, we conclude that the assumptions and the relevant_facts give a contradiction. At this point, I would like some input on how to structure this API so that it becomes scalable, extensible, and readable.
This is required because some relevant facts may in turn generate new relevant facts, which didn't appear before. This happens as soon as you start to add non-trivial facts, like Equivalent(Q.zero(x*y), Q.zero(x) | Q.zero(y)), because they add new predicates to the system.
We always add relevant facts because they might be needed. For instance, if we only added the fact Equivalent(Q.zero(x*y), ~Q.nonzero(x*y)) when both predicates appeared in the system, we would never be able to compute newask(Q.zero(x) | Q.zero(y), Q.nonzero(x*y)), because it requires going from two steps, first, adding the fact Equivalent(Q.nonzero(x*y), ~Q.zero(x*y)), and second, adding the fact Equivalent(Q.zero(x*y), Q.zero(x) | Q.zero(y)). This second fact can only be added after the first is added, because only at that point do we have Q.zero(x*y). Without this, we would have to add many redundant facts (like Equivalent(Q.nonzero(x*y), ~Q.zero(x) & ~Q.zero(y))). On the other hand, we now avoid adding completely redundant implications. The Q.zero fact, aside from the obvious implications on the already existing Q.positive and Q.real, and the new Q.nonzero, includes one nontrivial fact, namely Q.zero(Mul) is equivalent to Or([Q.zero(arg) for arg in Mul.args]). This fact, while easy to add, allows for very powerful new assumptions, all of which don't work with ask(), but more importantly, are either very difficult or impossible to add to ask() because the way the current handler system works, it is not so easy to add handler to make things like ask(Or(x, y)) give True when neither ask(x) nor ask(y) would give True (as is the case for ask(Q.zero(x) | Q.zero(y), Q.zero(x*y))). For instance: >>> from sympy.assumptions.newask import newask >>> newask(Q.zero(x) | Q.zero(y), Q.zero(x*y)) True >>> newask(Implies(Q.zero(x), Q.zero(x*y))) True >>> newask(Q.zero(x) | Q.zero(y), Q.nonzero(x*y)) False >>> ask(Q.zero(x) | Q.zero(y), Q.zero(x*y)) >>> ask(Implies(Q.zero(x), Q.zero(x*y))) >>> ask(Q.zero(x) | Q.zero(y), Q.nonzero(x*y)) >>> Notice how the use of satisfiable() lets us ask essentially equivalent things in different ways, and still get the right answer, even though we only told the system about the fact in one way.
… ask() And add one test that I forgot (that does work with ask(), since it's so simple).
while relevant_facts != old_relevant_facts: | ||
old_relevant_facts, relevant_facts = (relevant_facts, | ||
get_relevant_facts(proposition, assumptions & old_relevant_facts, | ||
context)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So newask works by forward chaining? How efficient is this as the number of active propositions increases? What is performance like? I can imagine the set of facts becoming large as we extend this system to all of Q.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need a dependency graph between predicates. Does even-ness help you to predict zero-ness? Sparsity in this graph would help performance.
We could also group predicates by valid classes on which they relate. Can MatrixExprs be positive? Can Bools be even?
What is performance like? Have you benchmarked? How does SymPy's SAT solver compare to other C-based SAT solvers. Is this something we could include as an optional dependency like matplotlib? |
The main issue I see is that this method forward chains all known facts at runtime. As you've demonstrated this is very powerful but I'm concerned that it will become very slow as we add all of the predicates. I wonder if there is a way to restrict the set of active predicates intelligently (or by hand). I also wonder if this is important or not. |
In general though I'm excited. The code is much more understandable and capable of a number of things which felt broken in old-new assumptions. |
Thanks for your comments Matthew. Yes, you are right, this is forward chaining. I'm glad you pointed that out, because I think with this realization we can look into how other systems manage their data structures for forward chaining. The important thing to note here is that we are not forward chaining on assumptions. This is basically what the old assumptions do (and I guess basically the new ones as well). So for instance, if we know that Regarding the data-structure, yes, it absolutely needs to be represented in data. I stated that the current API is terrible. Not only is it non-scalable and non-extensible, we may be missing out on some things that we can only do if we are represented as data (similar to how the current handlers misses out on things by not representing facts as data). For instance, if we represent our facts as data correctly, it should be possible to generate the graph you mentioned automatically. This means we need to be careful about our data structures, though. If we represent "Q.zero(Mul(*args)) iff any(Q.zero(arg))" as a I think the Regarding performance, I didn't look at it at all, for a few reasons:
Also, from what I've heard, with good SAT solvers, you should just dump everything you know into them, and they will do a pretty good job of ignoring the irrelevant stuff. We'll have to wait until we have much more relevant fact extraction to see if this really is true, but if it is, then trying to avoid inserting irrelevant facts may actually waste time, and worse, may prevent some answer from being deduced that otherwise would be. I also have no idea what really would be the performance bottlenecks. I basically put one piece of heavy machinery, You may be right though that the actual fact extraction will be the bottleneck. I can't really speak to the performance of this yet because I know that the data structures used for this will be completely different than my
It's much slower. Compare |
By the way, it would be more like the optional dependency of |
Another thing that requires some thought is how to properly handle handlers that aren't really logical facts, but are rather computations. A simple example is https://github.com/sympy/sympy/blob/master/sympy/assumptions/handlers/order.py#L121. A more complicated example might be using |
Previously it wasn't really finding the fixed point. Now it does the right thing.
Also add a test that uses several levels of forward chaining.
The answer to this is yes. Knowing about even-ness doesn't mean that we know something is even. It could also mean that we know something is not even, in which case we know that it is not zero. Two concepts in the graph should be connected if there is an implication in either direction, and in fact it's pointless to make the graph directed. Remember that |
# infinite loop in the future. | ||
relevant_facts = True | ||
old_relevant_facts = False | ||
while relevant_facts != old_relevant_facts: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we may run into problems here at some point. The only reason we hit a fixed point is that And
removes (exact) duplicate arguments automatically. But what happens when we start adding an infinite number of duplicate arguments none of which look exactly like each other? Or even if not infinite, many? We already are starting to do this: the relevant_facts will always contain both Equivalent(Q.zero(x), ~Q.nonzero(x))
and Equivalent(Q.nonzero(x), ~Q.zero(x))
if it contains one. Maybe we don't need to care, if the SAT solver really is good at ignoring redundancies, but I highly suspect that we will.
I think the more rigorous way (if we run into this problem) will be to use satisfiable
to determine if the new part really does contain new logical information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason we hit a fixed point is that And removes (exact) duplicate arguments automatically
Do we want And
to perform logic here? If we're doing a lot of this each round it may make sense to drop out of SymPy within a lot of Assumptions. I'm imagining a polys type infrastructure here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only reason And does this is that it uses a frozenset internally, so by design it cannot hold duplicate arguments.
I don't know what the code of using full expressions is here. I do know that in the SAT solver, everything is converted to a much more efficient set of integers representation, so you don't need to worry about that.
This gives the simplest example yet of something that doesn't work with ask() (simplest in the sense that there are no logical operators involved) >>> from sympy.assumptions.newask import * >>> ask(Q.zero(x), Q.zero(x**2)) >>> newask(Q.zero(x), Q.zero(x**2)) True This one probably could be made to work with the ask() handlers system, but it would require digging through the expressions to find the x**2, and it would be messy.
I just started playing with using Also, apparently |
I can imagine a few major relations that you'd want to be able to describe including
But we're sure to want more and we're almost certainly not going to get all of them at design time (presumably this was a problem with oldnew assumptions). Which means that we need some sort of interface or infrastructure to describe these later on. Or at worst we need the part of the code in which all of these systems are implemented to be easy to add to in the vein of Mul.flatten. |
I respect this, but I also think that some long-range thinking about what might be or not be feasible is, I think, worth thinking about now to make sure that this is a path worth pursuing. Such thoughts might also push us in the right directions. |
I'm not worried about the SAT solver's performance. I'm worried about how much time it will take Python/SympPy to construct all of the facts. I'm worried about the linear time part of this algorithm, not the NP hard part :) If we use more complex fact finders (e.g. things that depend on solve as you mention) then we'll really want to be intelligent about which paths we pursue. |
But it's kind of awesome that it told you. |
I guess I'm worried about the SAT solver performance because to me, that's the harder one to improve.
Yes, though it didn't actually tell me why. I only knew that was what was wrong because I had already noticed it.
To me, these are the same thing (one uses Again, remember that we need to do forward chaining against both directions of implications, because we don't know if an assumption will end up being true or false. In terms of classical logic, we need to think about both modus ponens and modus tollens. |
Update this branch with master, fix merge conflicts and replace deprecated is_bounded with is_finite
@@ -129,6 +131,9 @@ def ask(proposition, assumptions=True, context=global_assumptions): | |||
else: | |||
key, expr = Q.is_true, sympify(proposition) | |||
|
|||
if isinstance(proposition, (bool, BooleanAtom)): | |||
return sympify(proposition) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why BooleanTrue
instead of bool
?
This makes the output less consistent. In both branches, master and this one, ask
returns bool
.
This change is making ask(True) is True
False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember why I made this change. I agree that ask should return a bool, since it doesn't make sense in a boolean context (
Lines 90 to 93 in 98459c5
# The rule of thumb is: | |
# "If the boolean in question can be replaced by an arbitrary symbolic | |
# Boolean, like Or(x, y) or x > 1, use S.true. Otherwise, use True" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. You changed two tests to respect this change. https://github.com/sympy/sympy/pull/2508/files#diff-abc5430bcb3f79fa33a7df36470717fcR1950
Fix formatting and typos
ask(bool) now returns bool instead of BooleanAtom
A singly applied predicate is a free predicate applied everywhere to a | ||
single expression. For instance, Q.zero(x) and Or(Q.positive(x*y), | ||
Q.negative(x*y)) are singly applied, but Or(Q.positive(x), Q.negative(y)) | ||
and Or(Q.positive, Q.negative(y)) are not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a typo? Or(Q.positive, Q.negative(y))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. I'm pointing out that this predicate is not singly applied because Q.positive is not applied and Q.negative(y) is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated with master
Everything in this test doesn't work with ask, and most things would be | ||
very difficult or impossible to make work under the current handlers | ||
model. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely clear about this docstring. I'm assuming that it's saying that everything in the test doesn't work with the old assumptions, but it does with the new assumptions? If so, I think it could be clearer. Especially since the definition of the word "current" once this is merged is ambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this must have been written before I integrated the stuff with ask
itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks good. I'm okay with the merge now.
On 6 Jun 2015, at 19:34, Aaron Meurer wrote:
assert satask(~Q.positive(x)) is False
~Q.positive(x)))raises(ValueError, lambda: satask(Q.real(x),
+- assert satask(Q.zero(x), Q.nonzero(x)) is False
- assert satask(Q.positive(x), Q.zero(x)) is False
- assert satask(Q.real(x), Q.zero(x)) is True
- assert satask(Q.zero(x), Q.zero(x*y)) is None
- assert satask(Q.zero(x*y), Q.zero(x))
+def test_zero():
- """
- Everything in this test doesn't work with ask, and most things
would be- very difficult or impossible to make work under the current
handlers- model.
- """
I modified it.
Reply to this email directly or view it on GitHub:
https://github.com/sympy/sympy/pull/2508/files#r31872394
Aside from the few comments above, I'm okay with this. |
I'll merge this in some time if no objections are raised. |
Proof of concept for the new assumptions
@asmeurer In SymPy's SAT solver have we included any optional dependency like suggested here? Also, if you have some suggestion of any dependency which we may use, please tell me, I will research on it.
|
Yes, I believe we currently use pycosat as an optional dependency. |
@asmeurer Great. Could you tell me the PRs where it is implemented? |
@asmeurer Also, in this PR thread could you point out if any issue is not solved which is of immediate attention for us? |
The biggest question here is how far we can get this to scale without changing the way the algorithm works. Last year's GSoC project helped a lot, but I still don't know if it will scale to the full set of assumptions used by SymPy. I think the general idea of writing down facts as logical expressions is a good one, although I think we should use a better syntax for them (#2508 (comment)). It's also unclear to me how to represent certain facts without having a combinatorial explosion of clauses (like certain facts that use XOR). We also will eventually want some way to extend the facts externally, so that not everything has to go in sathandlers.py, though that should probably be left until we know it can scale. |
Hi all,
I read maybe 0.1% of all the conversations coming through my inbox for
sympy, but this one caught my eye.
It's also unclear to me how to represent certain facts without having a
combinatorial explosion of clauses (like certain facts that use XOR).
@aaron: you can manage the encoding to CNF quite readily
- https://en.wikipedia.org/wiki/Tseytin_transformation
Hope you are all doing well!
Cheers,
Christian
…On Tue, Apr 21, 2020 at 3:27 PM Aaron Meurer ***@***.***> wrote:
The biggest question here is how far we can get this to scale without
changing the way the algorithm works. Last year's GSoC project helped a
lot, but I still don't know if it will scale to the full set of assumptions
used by SymPy.
I think the general idea of writing down facts as logical expressions is a
good one, although I think we should use a better syntax for them (#2508
(comment)
<#2508 (comment)>). It's
also unclear to me how to represent certain facts without having a
combinatorial explosion of clauses (like certain facts that use XOR). We
also will eventually want some way to extend the facts externally, so that
not everything has to go in sathandlers.py, though that should probably be
left until we know it can scale.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2508 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABW56ZJJ6QZ7CAEHRLP4BTRNXXTRANCNFSM4AIRMJTQ>
.
|
It's good to see that you still follow SymPy, if only 0.1%. Yes, we've had an open issue for Tseitin for a while. It just hasn't been implemented. I'm also not sure if the extra variables from it would end up causing their own problems. Maybe the solution is to only use it for instances where it really saves a lot, like XOR, and not bother for simple things like small expressions that can just be distributed into CNF. |
Problems in solving or in actual handling? If you aren't using model
counting, but just SAT checks (with solution extraction) you should be able
to safely project away those auxiliary variables. Though you are right in
terms of the problems with solving -- the extra variables may make it more
difficult to solve.
Crude estimates of the expanded formula should be easy enough to compute,
and perhaps you could use that to only partially Tseiten-ize...also an
option to expand until you hit a limit and then Tseiten from then on. But
exploration here, and exploration on improving the CNF encoding (in the
other thread related to this) *all* should be informed by proper data:
i.e., what actually needs improving. Solving time? Encoding time? Something
else? Need some decent profiling before diving in I reckon.
Cheers!
…On Tue, Apr 21, 2020 at 3:46 PM Aaron Meurer ***@***.***> wrote:
It's good to see that you still follow SymPy, if only 0.1%.
Yes, we've had an open issue for Tseitin for a while. It just hasn't been
implemented. I'm also not sure if the extra variables from it would end up
causing their own problems. Maybe the solution is to only use it for
instances where it really saves a lot, like XOR, and not bother for simple
things like small expressions that can just be distributed into CNF.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2508 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABW564TIDED43ZFFCY6Z53RNXZ23ANCNFSM4AIRMJTQ>
.
|
Here is a proof of concept of how I think the new assumptions should be done,
as per https://code.google.com/p/sympy/issues/detail?id=3929. Take a read of
the commit messages. The basic idea is that instead of writing handlers that
look for expressions like Q.zero(x*y) and return new expressions that it knows
are true like Q.zero(x) and Q.zero(y), the handler system returns a set of
predicate statements, and everything is done using satisfiable(). This not
only makes things cleaner, and gives us a single point of contact to make the
assumptions faster (the dpll algorithm), it also allows computing things that
are impossible to do under the current handler system. For instance
It's impossible to write a handler that says "x is zero or y is zero, but we
don't know which one".
I would like some feedback at this point. This code is completely proof of
concept. The API is terrible (basically nonexistant). What I've got now is
completely unscalable. It already is getting out of hand, and I've only got
four assumptions, real, positive, zero, and nonzero.
I've completely ignored the existing ask() code, because I didn't want to
spend a lot of time at this point trying to understand it (it's quite
complicated), so I basically wrote the prototype from scratch. Even so, it's
quite simple, and I think once you grok how the satisfiable bit works, it is
easy to understand the rest.
So my biggest questions now are, do people like this idea? Do you foresee any
major issues with it? Most importantly, what would be a good way to organize
the new-style "handlers" (relevant fact extraction) so that it is scalable,
extensible, and readable?
Update:
newask
has been changed tosatask