New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More efficient to_cnf #7174
Comments
The algorithm apparently works by adding new symbols to the expression. The guarantee is made that the new expression is satisfiable if and only if the original was, and by the same model. This is a little different than what to_cnf does, so probably this should be a separate function, or a flag in to_cnf. It would probably be a little surprising if to_cnf returned an expression that is not exactly logically equivalent to the input, but this is definitely what we need in the call to to_cnf in satisfiable(). The algorithm in the paper claims to be O(m), where m is the number of operators (And(x, y, z) is counted as two operators). A modified algorithm that efficiently handles duplicate subexpressions is O(m*log(m)). The naive algorithm of distributing ands and ors is I believe O(2**m) (or something like that). See for instance https://en.wikipedia.org/wiki/Conjunctive_normal_form#Conversion_into_CNF . |
Any conversion to CNF using Tseitin (or a related approach) will have auxiliary variables introduced. The encoding time goes down, but the solving time increases (sometimes substantially). Probably good to keep that in mind... |
Oh I didn't know that. So we would need to find a heuristic to balance the two. |
Do you know of any better references for this, aimed more toward implementation? It seems like in many (most?) cases, adding new variables is unnecessary. For instance, if I have a conjunction of a bunch of implications or double implications, that is basically already in cnf. As a somewhat less trivial example, if I have Implies(p, x), where p is in dnf and x is a variable, then the conversion is not so bad, because ~p is in cnf, and you then just have to distribute the x to each term. By the way, another question. If the input to satisfiable is an Or, would it be more efficient to just recursively call satisfiable on each argument until an answer is found? This seems like it might be more efficient than converting it to cnf, especially since if it happens to be in dnf, then the answer would be found instantly. Even if you have a huge Or that ends up being unsatisfiable (the worst case), it still seems like this would be more efficient than converting to cnf, at least given the current naive to_cnf. But I didn't test it, so I could be wrong. |
Better references? There should be, but I just know it as general lore around the SAT solving community. Your best bet is to analyze the types of input you are expecting -- if it's more towards DNF, then flatten to that and use model enumeration to solve your query. If it's more towards CNF, then flatten to that and use a SAT solver. If it really is stuck half-way in between (and flattening to either causes a combinatorial explosion), then what I've seen is either Tseiten encodings or circuit solvers. The former introduces the new variables, and the latter just runs DPLL on top of the arbitrary formula. Because flattening is a recursive procedure, it shouldn't be so hard to create a DP that estimates the final CNF (or DNF) size without actually computing it. That could be used at run time to determine what technique is best. |
Referenced issues: #7175 |
Is it a bad idea for satisfiable() to always recurse across an Or (even ones that are easy to flatten)? It seems to me like the analogue of solve() always recursing across a Mul. One disadvantage I can think of is that you will no longer be guaranteed to get a full model, as DPLL2 seems to do. Could it ever be far less efficient than converting to CNF first and running the normal solver, as it currently does? |
to_cnf needs to be more efficient. For some expressions, it takes forever and produces huge expressions, which tend to have tons of unnecessary clauses (lots of x | ~x).
I googled, and http://gauss.ececs.uc.edu/Courses/c626/lectures/BDD/st.pdf looks like it might be an algorithm. Any kind of optimization would probably help. Right now, it just does the naive distribution.
The text was updated successfully, but these errors were encountered: