These are SAT solvers I wrote for teaching the COMP4418 class at UNSW. The code should be more accessible compared to real-world SAT solvers and illustrate the improvements of unit propagation and conflict-driven clause learning.
A visualisation of the search tree as well as the lecture slides are here, too.
This repository contains three different SAT solvers:
sat-naive.cc
implements the naive algorithm from Slide 21.sat-up.cc
implements the algorithm from Slide 25 with the Watched-Literal Scheme.sat-cdcl.cc
implements the algorithm from Slide 31 with the Watched-Literal Scheme and the FirstUIP scheme for choosing conflict clauses.
The files represent the evolution of our SAT algorithm in the lecture. Each of the files is stand-alone.
They are in written in C++ 2011. On Linux you can compile them with
$ c++ -O3 -DNDEBUG -Wall -std=c++11 sat-naive.cc -o sat-naive
$ c++ -O3 -DNDEBUG -Wall -std=c++11 sat-up.cc -o sat-up
$ c++ -O3 -DNDEBUG -Wall -std=c++11 sat-cdcl.cc -o sat-cdcl
The flag -O3
tells the compiler to optimise the program, -DNDEBUG
disables
assertions (the assert()
statements in the code), -Wall
tells the compiler
to warn about potential mistakes, and -std=c++11
enables the C++ 2011
standard.
You may get some warnings regarding comparison between signed and unsigned
integers because I've used int
throughout the code whereas the return type
of std::vector<T>::size()
is the unsigned integer std::size_type
. You can
ignore these warnings.
Then run ./sat-[naive|up|cdcl] [-e] <cnf-file>
. The optional flag -e
makes the solver enumerate all solutions. The input file must be in DIMACS CNF
format.
There are probably still a few bugs in the solvers. Let me know if you find one.
All three solvers are structured as follows and actually largely identical:
- Type definitions
- Global variables
- Solver functions
- Main function
Here's some further explanation of the types and functions.
Map<Key, Value>
: In SAT solving, maps (that is, key-value pairs) are very handy. Luckily the keys are integers or variables or literals, which can be represented as integers.Map<Key, Value>
is an abstraction layer for such a array-based map whereKey
is a type that can be converted to int andValue
is an arbitrary type.PriorityQueue<T, Less>
: A priority queue allows to insert elements and take out the element with highest priority in an efficient way. This implementation uses a min-heap. You may safely skip this data structure. All that's important is that it provides a functiontop()
to access the highest-priority element, and functionsInsert()
andRemove()
to add or remove elements from the queue. If you're interested in how a min-heap works, Wikipedia is probably a good resource.Var
: A propositional variable is just represented as a positive integer (the integer0
is used to represent a 'null' value). This type allows for a little abstraction so we don't confuse other integers with variables.Value
: The value to which a partial interpretation may map a variable: true, false, or unassigned.Lit
: A variable or its negation is represented as a non-zero integer (the integer0
is used to represent a 'null' value).Clause
: A clause is a set of literals. We implement it as an array. To make sure no literal occurs twice, we normalise it, that is, we sort it and remove any duplicates.ClauseRef
: A reference to a clause is a positive integer that identifies a clause (the integer0
is used to represent a 'null' value). We will store clauses in a global array. AClauseRef
is the index of a clause in this array. The also provide aclause()
function to obtain the clause associated with a reference. If you're familiar with C or C++ pointers: we could use pointers just as well. 32 bit integers should suffice though and safe a little space over 64 bit pointers.
Watchers
[sat-up.cc
andsat-cdcl.cc
only]: In the Watched-Literal Scheme there are two distinguished literals for each clause (except for the empty and unit clauses, of course). When we set a literal x to False and want to propagate this assignment, it suffices to check those clauses which watch x. The Watchers data structure allows us to register a clause as a watcher for a literal. It is implemented as an array of arrays: for each Lit, there is an array of ClauseRefs.
clauses_
: Is the list of non-empty and non-unit clauses in our formula, that is, of all clauses with at least two literals.empty_clause_
: Is true when the formula contains the empty clause, in which case it is unsatisfiable.var_order_
: Is a priority queue of variables. For our implementations, the ordering itself it not important. All that matters is that we have way of selecting variables to make decisions. Choosing a smarter order (see Slide 34) is one way to improve our solvers.trail_
: The literals in the order of assignment. Implemented as array of literals. That is, it corresponds to the partial interpretation /I/ in our algorithms, but it additionally represents the ordering in which the assignments were made.level_size_
: The size of the different decision levels. For each decision, a new level is created to ease backtracking. The size of the level is the number of literals that were assigned at the level. In the naive algorithm, there is exactly one literal per decision level. The other two algorithms do unit propagation after each assignment, which may lead to additional assignments after a decision. We want to memorize to which level which assignment belongs for backtracking.trail_head_
[sat-up.cc
andsat-cdcl.cc
only]: The index of the first element of trail_ that has not been propagated yet. Newly assigned literals enter a queue for unit propagation, the head of which is wheretrail_head_
points to.model_
: The values of variables. Represents the same data astrail_
, but instead of being ordered chronologically,model_
allows for fast access of the value of a given variable.level_
[sat-cdcl.cc
only]: The level at which a variable was set. Represents the similar data aslevel_size_
together withtrail_
, but instead of being ordered chronologically,level_
allows for fast access of the level of a given variable. Useful for conflict analysis.cause_
[sat-cdcl.cc
only]: The clause that caused a given variable to be set by unit propagation. Useful for conflict analysis.watchers_
[sat-up.cc
andsat-cdcl.cc
only]: Provides fast access to clauses that are watching a given literal.analyze_queue_
[sat-cdcl.cc
only]: Auxiliary array for conflict analysis, which is only global for performance reasons.ROOT_LEVEL
: The root level of the decision tree.
current_level()
andNewLevel()
: We associate with every decision a new level, which simplifies backtracking.AddLiteral()
: When an assignment is made, the corresponding global variables are updated:trail_
andmodel_
and, insat-cdcl.cc
, alsolevel_
andcause_
.AddClause()
: Adds a clause from the input formula, or, insat-up.cc
andsat-cdcl.cc
, newly learnt conflict clauses. If the clause is empty,empty_clause_
is set, and if it is a unit clause, it is handed toAddLiteral()
. Otherwise, if it contains at least two literals, it is added toclauses_
. Insat-up.cc
andsat-cdcl.cc
, the clause is furthermore registered as watcher for its first two literalsSelectVar()
: Takes the next variable out ofvar_order_
. If there is none, the nullVar
is returned.Backtrack()
: Undoes the assignments made since the given level. That is, thetrail_
andlevel_size_
and, insat-up.cc
andsat-cdcl.cc
,trail_head_
and need to be shrinked. And for each un-assigned literal, the entry for that variable inmodel_
is reset.Propagate()
[sat-up.cc
andsat-cdcl.cc
only]: Propagates all literals that haven't been propagated yet, that is, those that have been added totrail_
aftertrail_head_
. This itself may enqueue new literals on the trail, which are also propagated iteratively. Propagating a literal works by inspecting all clauses that watch the negation of that literal, usingwatches_
.Analyze()
[sat-cdcl.cc
only]: Analyses a conflict and returns a new conflict clause. The conflict clause is organised in a way that after backtracking to the destination level, the first literal of the clause is not falsified, but all the other ones are. (This first literal is the unique implication point.)Solve()
[different forsat-naive.cc
,sat-up.cc
,sat-cdcl.cc
]: Implements the algorithm from Slide 21, 25, and 31, respectively.
ReadCnf()
andSkipComments()
: Parse the DIMACS CNF file and add the read clauses withAddClause()
.main()
: Open and process the DIMACS CNF file, run the solver, print the solution.
The watched-literal scheme and clause learning are the most important concepts of SAT solvers, but they are not the only ingredients that make modern SAT solvers fast.
The next conceptual improvement would be to use a smarter variable selection
heuristic. All you need to do is to replace the std::less<Var>
parameter in
PriorityQueue<Var, std::less<Var>>
with a different ranking.
Using std::less<Var>
means that at the moment variables with a small IDs are
preferred over ones with higher IDs -- which is, of course, completely
unjustified.
A better heuristic would be to score variables by their activity: assign them
some numeric scores which are increased when the variable plays a role in
a conflict or unit propagation, for example.
Other potential improvements are phase-saving or randomized restarts. As with the variable selection heuristic, these additions are relatively easy to implement. However, tweaking the parameters (e.g., when to restart) might be tricky.
Finally, another way to improve performance is to improve the code. The most
promising function to start with is probably Propagate()
. I didn't try to
optimise it in any way, but since the solver spends most of its time on unit
propagation, every improvement here should make a big difference.
I haven't profiled the solvers though and there may (also) be other high-impact
optimisations.
You can play around with the visualisation of the search tree. You can install it locally as well with Emscripten, a compiler back end for Clang that produces Javascript.
There are also the lecture slides for which I wrote these solvers.