Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More efficient to_cnf #7174

Open
asmeurer opened this issue Oct 28, 2013 · 8 comments
Open

More efficient to_cnf #7174

asmeurer opened this issue Oct 28, 2013 · 8 comments

Comments

@asmeurer
Copy link
Member

asmeurer commented Oct 28, 2013

to_cnf needs to be more efficient. For some expressions, it takes forever and produces huge expressions, which tend to have tons of unnecessary clauses (lots of x | ~x).

I googled, and http://gauss.ececs.uc.edu/Courses/c626/lectures/BDD/st.pdf looks like it might be an algorithm. Any kind of optimization would probably help. Right now, it just does the naive distribution.

@asmeurer
Copy link
Member Author

asmeurer commented Oct 28, 2013

The algorithm apparently works by adding new symbols to the expression. The guarantee is made that the new expression is satisfiable if and only if the original was, and by the same model. This is a little different than what to_cnf does, so probably this should be a separate function, or a flag in to_cnf. It would probably be a little surprising if to_cnf returned an expression that is not exactly logically equivalent to the input, but this is definitely what we need in the call to to_cnf in satisfiable().

The algorithm in the paper claims to be O(m), where m is the number of operators (And(x, y, z) is counted as two operators). A modified algorithm that efficiently handles duplicate subexpressions is O(m*log(m)). The naive algorithm of distributing ands and ors is I believe O(2**m) (or something like that). See for instance https://en.wikipedia.org/wiki/Conjunctive_normal_form#Conversion_into_CNF .

@haz
Copy link
Contributor

haz commented Oct 28, 2013

Any conversion to CNF using Tseitin (or a related approach) will have auxiliary variables introduced. The encoding time goes down, but the solving time increases (sometimes substantially). Probably good to keep that in mind...

@asmeurer
Copy link
Member Author

asmeurer commented Oct 28, 2013

Oh I didn't know that. So we would need to find a heuristic to balance the two.

@asmeurer
Copy link
Member Author

asmeurer commented Oct 28, 2013

Do you know of any better references for this, aimed more toward implementation? It seems like in many (most?) cases, adding new variables is unnecessary. For instance, if I have a conjunction of a bunch of implications or double implications, that is basically already in cnf. As a somewhat less trivial example, if I have Implies(p, x), where p is in dnf and x is a variable, then the conversion is not so bad, because ~p is in cnf, and you then just have to distribute the x to each term.

By the way, another question. If the input to satisfiable is an Or, would it be more efficient to just recursively call satisfiable on each argument until an answer is found? This seems like it might be more efficient than converting it to cnf, especially since if it happens to be in dnf, then the answer would be found instantly. Even if you have a huge Or that ends up being unsatisfiable (the worst case), it still seems like this would be more efficient than converting to cnf, at least given the current naive to_cnf. But I didn't test it, so I could be wrong.

@haz
Copy link
Contributor

haz commented Oct 28, 2013

Better references? There should be, but I just know it as general lore around the SAT solving community. Your best bet is to analyze the types of input you are expecting -- if it's more towards DNF, then flatten to that and use model enumeration to solve your query. If it's more towards CNF, then flatten to that and use a SAT solver. If it really is stuck half-way in between (and flattening to either causes a combinatorial explosion), then what I've seen is either Tseiten encodings or circuit solvers. The former introduces the new variables, and the latter just runs DPLL on top of the arbitrary formula.

Because flattening is a recursive procedure, it shouldn't be so hard to create a DP that estimates the final CNF (or DNF) size without actually computing it. That could be used at run time to determine what technique is best.

@asmeurer
Copy link
Member Author

**Blocking:** 7175  

Referenced issues: #7175
Original comment: http://code.google.com/p/sympy/issues/detail?id=4075#c6
Original author: https://code.google.com/u/asmeurer@gmail.com/

@asmeurer
Copy link
Member Author

asmeurer commented Nov 5, 2013

Is it a bad idea for satisfiable() to always recurse across an Or (even ones that are easy to flatten)? It seems to me like the analogue of solve() always recursing across a Mul. One disadvantage I can think of is that you will no longer be guaranteed to get a full model, as DPLL2 seems to do. Could it ever be far less efficient than converting to CNF first and running the normal solver, as it currently does?

@asmeurer
Copy link
Member Author

asmeurer commented Aug 1, 2015

There are two pull requests that do this #8495 and #2852.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants