COMP 3649 Assignment 1
– Solver for 3 - SAT, Implemented Imperatively
A Note about the Assignments
This course includes three major programming assignments. This one will use an imperative language. The second will use a logic language (Prolog). The third will use a functional language (Haskell). You will implement solutions to the same problem three times in three different languages.
See the course outline for important information about the assignment component of the course. This includes grading policy, late policy, etc.
All assignments are to be completed individually. A student’s submission must be their own work. If you are having trouble, consult with your instructor.
Background: Boolean Satisfiability
A Boolean expression^1 (“formula”) is said to be satisfiable if there exists an assignment or partial assignment of truth values to its variables such that the entire expression is true.
For example, given the following expression E:
xz’+ (x’ + y)’z + z(x + y)
... the assignment:
x = true, y = false, z = true
... satisfies the formula. Hence, E is satisfiable.
The partial assignment:
x = true, z = false
... also shows E to be satisfiable (in this case, the value of y is irrelevant).
(^1) For the purposes of this assignment, we denote variables by single lowercase letters. Boolean “not” is denoted by (’), Boolean “and” by juxtaposition, and Boolean “or” by (+). Parentheses are used to group sub-expressions.
Of course, not all expressions are satisfiable. As a trivial example, consider:
An unsatisfiable formula is logically equivalent to “false”, regardless of truth assignments to its constituent variables. Given a more complex expression, it is not always immediately clear whether or not it is satisfiable. A lengthy process may be required to either (1) find a satisfying truth assignment, or (2) exhaust all possibilities, in which case the formula is shown to be unsatisfiable.
Although it is not proved here, note that all Boolean expressions can be converted to Conjunctive Normal Form (CNF).
For example, the following formula is in CNF:
(w’ + x + y’ + z)(w + y)(z)(x + y + z)
A variable (e.g. x) or its negation (e.g. x’) is called a clause. A formula is in CNF when it is expressed as the conjunction of one or more terms, where each clause is the disjunction of one or more literals.
Although it is not proved here, note also that all Boolean expressions can be converted to 3 - Conjunctive Normal Form (3-CNF). A formula is in 3-CNF when it is in CNF and every clause contains a disjunction of exactly three literals.
For example, although the previous formula is not in 3-CNF, the following is:
(x + y + z’)(x’ + y + z’)(x + y’ + z)
Quick exercise: to ensure you understand the 3-CNF satisfiability problem (which we will abbreviate as 3-SAT), find a truth assignment (or partial assignment) which shows the above to be satisfiable, or demonstrate that it is unsatisfiable.
The 3-SAT and SAT (for general CNF) problems are actually quite important in CS. First, they are of great theoretical importance, since they were the first two problems to be proved NP-complete^2. Second, they are central to fields such as AI, automated theorem proving, and automated verification of hardware and software.
(^2) If you haven’t yet run across this clause, you probably will soon.
Your task is to implement a 3-SAT solver. Given a valid 3-CNF expression, it must either find a truth assignment (or partial truth assignment) which satisfies the expression, or, finding none, report that the formula is unsatisfiable. Note that there may be multiple satisfying assignments, but the program is only to report one, should any exist.
The Executable Program:
You may work in any development environment you like, but for submission and marking purposes the final version must be build-able (easily) and executable on INS^3. The program is to read its input from the standard input stream (e.g. the keyboard or a redirected file). The program must write its results to the standard output^4.
An example invocation might look like:
$ solver (x + y + z’)(x’ + y + z’)(x + y’ + z) .
. output written here . $
$ solver < testFormulaFile .
. output written here . $
The program processes one valid input expression per run.
(^3) If you are unsure how to log in to and work on MRU’s INS Linux server, please contact your instructor or an instructional assistant (IA). (^4) Alternatively, in the event of invalid input, error messages (and error messages only) may be written to the standard error stream.
An input expression must be a valid 3-CNF expression. That is, it must consist of a sequence of zero^5 or more parenthesized terms, where each clause is the disjunction of exactly three literals (variable or negation of variable). A variable is denoted by a single lowercase letter.
Whitespace may be inserted at any point in the input, and is ignored. Note that an expression may span multiple lines. Input is terminated by end-of-file (EOF).
The program must perform full error handling. Any input which does not match the above description exactly must be rejected by the program, with a suitable error message.
In the case of unsatisfiability, the program must simply write the following messageto the standard output and terminate:
Otherwise, the program must write one satisfying truth assignment (or partial assignment) and terminate. The assignment must be formatted as illustrated in the example below, sorted by variable name in ascending alphabetical order^6 :
satisfiable by [w=true, x=false, y=false, z=true]
Choice of Language
Your solution must be developed using an imperative programming language. In particular, you are restricted to using one of C, C++ or Java.
It is expected that if you choose C, you will develop your solution in a procedural style. Conversely, if you choose Java then you are expected to follow an object-oriented approach. Solutions written in C++ can follow either a procedural or an object-oriented style. It is suggested that you research the meanings of the terms “procedural programming” and “object-oriented programming”. Consult with your instructor if you are unsure.
You may use any language features or libraries which are standard to the language, as long as you are comfortable with them and can use them to good effect. Other third-party libraries must not be used.
(^5) Empty input (apart from whitespace), i.e. a zero-clause formula, is valid input and is considered trivially satisfiable. (^6) If the input is a zero-clause formula, the output would be satisfiable by 
Most of the design and implementation decisions are left to you. Your instructor may devote some class time to discussing some design issues. Students are encouraged to ask questions, after thinking about the issues themselves. The nature of such questions will influence the quality of in-class discussions.
Note that you are discouraged from attempting a simple, exhaustive (“brute force”) approach. In this scenario, a program would successively generate assignments and try each one in turn. This approach works, but it performs poorly. Note that for a formula with n variables, there are 2n possibilities to try. Thus, determining unsatisfiability is O(2n), which is very poor!
Although your program is not expected to perform in a perfectly optimal way, obviously poor algorithm or data structure choices will be penalized.
The recommended approach is to utilize a backtracking algorithm, which can be implemented recursively. The idea is to pick an unassigned variable and choose its assignment (either true or false). Then, any clause which becomes satisfied by this assignment can be removed from the expression! Conversely, a clause which becomes false under this variable assignment can be removed from all the terms which contain it. The new expression can then be processed recursively. If this process eventually leads to a 0 - clause formula, the expression has been satisfied (and an assignment or partial assignment has been generated). On the other hand, if a dead-end is reached, the process can backtrack to a previous assignment choice (say, true) and recursively attempt the other (say, false). A dead-end is reached when a clause becomes empty, i.e. when none of its original literals can be satisfied under the candidate assignment. Although the worst- case behaviour in this design is still exponential, the average-case behaviour will be better because more than one possibility can be ruled out per step.
Note that various techniques and heuristics can be used to speed up this algorithm^7. One technique is called “unit propagation”. In this approach, 1-clause terms are identified. Obviously, their assignments are forced to be either true or false, and they are removed. Doing this may produce further 1-clause terms which can then also be stripped. So, performing unit propagation at each recursive step in the backtracking algorithm can, in many cases, prune the set of possibilities which must be considered in subsequent steps. This technique is part of the well-known DPLL algorithm for 3-SAT.
It is suggested that you research backtracking algorithms and maybe the DPLL algorithm before proceeding with your design.
The design of the program must be modular, so decompose the problem properly. Individual module (function, method, class, etc.) cohesion must be high, meaning that
(^7) It is strongly recommended that you attempt any such algorithmic refinements only after solving the problem correctly and well using a more straightforward backtracking algorithm. Get a simpler but reasonably efficient working first! Then, save it in case you need to revert back later.
each must have one clear purpose. Coupling between modules must be low. Subroutine length must not exceed approximately 30 lines of code, and the internal algorithm design of each must be clear. Poor, non-modular designs will incur severe penalties.
Documentation and Coding
There is no official course standard for documentation. However, your code must be clearly documented. Exactly how to do this is left to your discretion, but seek guidance from your instructor if you are unsure. Poorly documented code will be penalized.
The exact method by which each procedure/method works must be concisely but clearly and fully explained.
Code which is hard to read will receive a heavy penalty. Your code must be self- documenting, well formatted, logically structured and easy to modify.
Your instructor may release a small number of sample inputs close to the due date. In the mean time, it is your responsibility to devise test cases for your code. Use previously learned skills of black- and white-box analysis, automated unit testing, etc.
You are welcome to share your test cases (but never your code) with other students. In fact, pooling of good test data is encouraged.
Submission and Grading
You are being given over three weeks to complete this assignment. Start early and budget your time wisely. Students who procrastinate should expect a low mark.
Detailed submission instructions will be given a few days before the due date.
Your submission will be graded on a variety of factors, including:
program functionality, including the ability to generate satisfying assignments, detect unsatisfiability, and handle invalid input; design and code quality; effective use of control, data and modularization constructs in the chosen language; documentation quality and code readability.
The speed of program execution is not the main marking criterion. However, programs must not be unnecessarily slow. Programs which perform particularly poorly (e.g. due to bad algorithm decisions like implementing a brute force approach, bad data representation decisions or silly coding practices) will be penalized. Thoughtful data structure and algorithm design choices will be rewarded.
Davis-Putnam procedure (DP)
 The algorithm known as the Davis-Putnam procedure is described in Algorithm 1. The algorithm works on a propositional formula φ which is in CNF form. The algorithm treats φ as a set of clauses that can be manip- ulated. There are some rules that define the algorithm, which I’ve denoted with I, II and III.
Rule I deals with simplifying the formula or determining satisfiability by analysing one-literal clauses. There are three cases to be distinguished.
An atomic formula p occurs as a one-literal clause in both positive and negated form. In this case the resulting formula φ is unsatisfiable, because the empty clause can be deduced by means of resolution.
Case 1 does not apply and an atomic formula p occurs as a one- literal clause. Then all clauses that contain p can be deleted and ¬p can be removed from the remaining clauses.
Case 1 does not apply and an atomic formula ¬p occurs as a one- literal clause. Then all clauses that contain ¬p can be deleted and p can be removed from the remaining clauses. In modern literature this rule is now usually referred to as the Unit Propagation rule.
Rule II deals with redundant clauses. In modern literature this is usually referred to as the Pure Literal rule and nowadays done as a preprocessing step for efficiency reasons. If only p or only ¬p occurs in the formula φ then all clauses which contain p or ¬p may be deleted.
Rule III uses the inference rule known as resolution in mathematical logic. Resolution can be applied to two clauses that have complementary literals L and L' , such that L ≡ ¬L' . Via a resolution step the two clauses can be combined in a single clause, called the resolvent that contains all the literals without L and L' .
In general this means that (C ∨ L) ∧ (C' ∨ L' ) is reduced to (C ∨ C'), with C and C' the remainder of the clause that contains L and L' respectively. Resolution may be applied at only one complementary literal pair at once and repeating literals in the result may be removed.
Intuitively, the validity of this rule can be explained by the fact both clauses that contain C and C' have to be true in order to satisfy the instance. If the resolvent is true, then this means that at least C or C' has to be true, thus one of the two clauses is satisfied. In either case you can initialize the variable that belongs to L and L' in such a way that the other clause is satisfied as well. Therefore, L and L' can be eliminated in the resolvent.