<a href="https://colab.research.google.com/github/byui-cse/cse480-notebooks/blob/master/11_4_Answer_Selected_Questions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Answer Selected Questions
## 19 March 2021

## 1

The question is one I'm asking you, after showing you something amazing --- that **multivariable polynomials can simulate algorithms**!

In 1976, James Jones, Daihachiro Sato, Hideo Wada, and Douglas Wiens published a polynomial with 26 variables (all the letters of the alphabet!).

Amazingly, the set of **positive values** of this polynomial is exactly the set of all prime numbers!


### Here it is!

$$\begin{array}{rcl}
(k + 2)(1 &-& [wz + h + j - q]^2\\
          &-& [(gk + 2g + k + 1)(h + j) + h - z]^2\\
	  &-& [16(k + 1)^3(k + 2)(n + 1)^2 + 1 - f^2]^2\\
	  &-& [2n + p + q + z - e]^2\\
	  &-& [e^3(e + 2)(a + 1)^2 + 1 - o^2]^2\\
	  &-& [(a^2 - 1)y^2 + 1 - x^2]^2\\
	  &-& [16r^2 y^4(a^2 - 1) + 1 - u^2]^2\\
	  &-& [n + l + v - y]^2\\
	  &-& [((a + u^2(u^2 - a))^2 - 1)(n + 4dy)^2 + 1 - (x + cu)^2]^2\\
	  &-& [(a^2 - 1)l^2 + 1 - m^2]^2\\
	  &-& [q + y(a - p - 1) + s(2ap + 2a - p^2 - 2p - 2) - x]^2\\
	  &-& [z + pl(a - p) + t(2ap - p^2 - 1) - pm]^2\\
	  &-& [ai + k + 1 - l - i]^2\\
	  &-& [p + l(a - n - 1) + b(2an + 2a - n^2 - 2n - 2) - m]^2)\end{array}$$


#### Question to Answer

Given $k=0$, how would you find some possible values for $w, z, h, j, q$, and $g$, that *could* result in the value $2$, the first prime number, for the entire polynomial?

Recall from Chapter 1, you can write one line of Python code that searches over all x, y, z in range(3) such that the equation below is true:

$3x^2 - 2xy - y^{2}z - 7 = 0.$

In [None]:
print({(x, y, z) for x in range(3) for y in range(3) for z in range(3) if 3*x*x - 2*x*y - y*y*z - 7 == 0 })

## 2

How does every language in NP reduce to 3SAT? Theorem 16.5 on page 255, the Cook-Levin theorem, is hard for me to understand, and Ganesh doesn't explain it very well. Please?

### Answer

You're right, it's pretty confusing.

On page 256 Ganesh says **Here is how SAT enters the picture**, and then goes on to say
* We can capture the evolution from ID$_i$ to ID$_{i+1}$ through a *3CNF* Boolean formula of polynomial length $\phi_i$. The construction of this formula is described in many references [42] in this field, and we don't repeat that. Fortunately, this single formula can capture *all the nondeterministic evolutions* from layer $i$ to layer $i+1$ in one shot.
* We can also introduce formula $\phi_0$ to capture the constraints on ID$_0$ and formula $\phi_{ACC}$ to capture the constraints on the final ID containing the accepting ID.
* Thus, the entire "pile of IDs" depicted in Figure 16.8 can be captured by a formula:
  $$\Phi = \phi_0\wedge \phi_1\wedge \ldots \phi_i\wedge\ldots \phi_{ACC}$$
  This formula encodes all the nondeterministic evolutions from start to finish of the NDTM. *All the NDTM paths* are rolled into this single formula.

This is an amazing claim! Let's repeat some of the argument Ganesh only says you can find in many references:

The upshot of Cook's Theorem is that *SAT* is **AT LEAST** as hard as any other problem in *NP*!
 
In his proof, Cook gave the details of the transformation (reduction) that consists of Sudoku-encoding-like systems of clauses that express in logical form the complete operation of a Turing machine.
 
The trick is to express, in the logical language of predicates and propositions, **EVERYTHING** this Turing machine can and cannot do!


#### What Can an NP TM Do (and Not Do)?

It can compute for at most $p(n)$ steps, where $p$ is some polynomial and $n$ is the problem size.
 
At any time $t, 0 \le t \le p(n)$.
 
It can also be in only one state at a time, but not two or more.

Let $S(t,q) =$ "at time $t$, TM is in state $q$ (where $q \in Q$)."
 
This type of expression in the predicate calculus is equivalent to but somewhat easier to "get" than a purely propositional formula.
 
Expressed logically, we would say:
 
$S(t,q) \rightarrow \lnot S(t,q^{\prime})\ \mbox{for}\ q,q^{\prime} \in Q, q \ne q^{\prime}, t = 0\ldots p(n).$
 
How many of these clauses are there?

$(p(n) + 1) \cdot |Q| \cdot (|Q| - 1).$

Remember that these clauses are easily and efficiently (in *polynomial* time) convertible to CNF, as SAT requires. 
 
Using the logical equivalence $A \rightarrow B \equiv \lnot A \lor B$, conditionals like $S(t,q) \rightarrow \lnot S(t,q^{\prime})$, can be converted to the equivalent desired clause form: $\lnot S(t,q) \lor \lnot S(t,q^{\prime})$.


#### Only the Beginning

What else can a TM do?
 
The TM's cells can hold only one symbol at a time.
 
$T(t,c,s) = $ "at time $t$, cell $c$ contains symbol $s$."
 
How do we express uniqueness of cell contents?
 
$T(t,c,s) \rightarrow  \lnot T(t,c,s^{\prime})$

where $t = 0\ldots p(n)$ as before; and
 
$c = -p(n), -p(n) + 1, \ldots, -1, 0, 1, \ldots, p(n)$; and
 
$s,s^{\prime} \in A$ (the tape Alphabet), $s \ne s^{\prime}$.
 
Note that cells run from $-p(n)$ to $p(n)$, which simply means that in the time allotted to the TM, it cannot scan any farther left than the cell at $-p(n)$ or any farther right than cell number $p(n)$.

How many of these clauses are there?

$(p(n) + 1) \cdot (2p(n) + 1) \cdot |A| \cdot (|A| - 1)).$


#### Single Scanning

The TM's read/write head can scan, at any time $t$, one and only one cell.
 
$H(t,c) = $ "at time $t$, the Head scans cell number $c$."
 
$H(t,c) \rightarrow \lnot H(t,c^{\prime})$
 
$t = 0\ldots p(n)$;
 
$c = -p(n), -p(n) + 1, \ldots, -1, 0, 1, \ldots, p(n)$;
 
$c \ne c^{\prime}$.
 
How many of these clauses are there?
 
$(p(n) + 1) \cdot (2p(n) + 1) \cdot (2p(n)).$

#### Guaranteed Prohibited

So much for what a TM is supposed to do.
 
How about guaranteeing the TM will **NOT** do what it's **NOT** supposed to do?
 
For example, it cannot change a symbol that it is not currently scanning.
 
$(T(t,c,s) \land \lnot H(t,c)) \rightarrow  T(t + 1,c,s);$
 
$t = 0\ldots p(n)$;
 
$c = -p(n), -p(n) + 1, \ldots, -1, 0, 1, \ldots, p(n)$;
 
$s,s^{\prime} \in A, s \ne s^{\prime}$.
 
At time $t$ cell $c$ may contain symbol $s$, and yet the head is not scanning cell $c$, so the symbol remains unchanged at time $t + 1$.
 
(Remember, using the logical equivalence of the conditional and DeMorgan's laws, we can still very easily convert this type of clause to CNF.) 
 
You determine how many of these clauses there are!

#### Conventions

* At time $t = 0$, the TM (by convention) scans cell 0.

$H(0, 0)$

* At time $t = 0$, the TM (again, by convention) is in state 0.

$S(0, 0)$

* At time $t = 0$, cells 1 through $n$ contain an instance of the problem at hand.

$T(0,i,s_i)\ i = 1,2,\ldots,n$

At time 0, cell $i$ contains $s_i$, the $i^{th}$ symbol in the instance string the generic transformation takes as input.

* If theTM has $m$ states then (assume without loss of generality that) state $m$ is the accept state.

$S(p(n),m)$


#### Termination and Legal Transitions

* The TM must have halted by time $p(n)$, and it must have answered "yes" $(1)$.

$T(p(n), 0, 1)$.

This means at time $p(n)$ cell 0 has symbol 1 ("yes").

* The TM makes legal state transitions according to its "program" ($\delta$ transition function) --- which we can represent as a set of quintuples $\langle q, s, q^{\prime}, s^{\prime}, d \rangle$.
 
$q, q^{\prime} \in Q, s, s^{\prime} \in A, d \in \{L, R\}$
 
Recall what each quintuple means: whenever the TM is in state $q$ and reads the symbol $s$ on the tape, it enters state $q^{\prime}$, writes an $s^{\prime}$ in place of the $s$, and then moves one cell in the direction $d$.

#### Final Three

Three separate sets of clauses specify the three possible effects of being in a given state and reading a particular symbol.  A new state must be entered, a new symbol written, and a direction taken by the read/write head.

1. $(S(t,q) \land T(t,c,s) \land H(t,c)) \rightarrow S(t + 1, q^{\prime})$
2. $(S(t,q) \land T(t,c,s) \land H(t,c)) \rightarrow T(t + 1, c, s^{\prime}$
3. $(S(t,q) \land T(t,c,s) \land H(t,c)) \rightarrow H(t + 1, c^{\prime})$

 Again, $t = 0\ldots p(n), c = -p(n),\ldots,p(n), q \in Q$ and $s \in A.$
 
For each combination of $t$, $c$, $q$, and $s$, there is a unique value for $q^{\prime}, s^{\prime},$ and $c^{\prime}$ specified by the program. The transformation computes these values by looking them up in a table containing the set of quintuples.
 
This completes the transformation.

#### What We Have

* A system of clauses generated by the generic transformation given a specific program instance and problem instance.
* The clauses can be generated in polynomial time --- using a computer running a high-level language, a series of nested loops with indices limited as above is perfectly capable of churning out these clauses about as quickly as it can print them.
* Correctness of the transformation is proved (rigorously) by a much more detailed method, in Cook's original proof.
* The key point --- the clauses limit the TM to behaving just as it should!

#### If So

* If this is the case, the clauses can be satisfied only if the machine halts before the polynomial time limit with a 1 occupying cell 0.  This, in turn, is possible only if the original problem instance has the answer "yes".
* The extremely large (but still polynomial-sized) set of clauses generated by Cook's transformation not only shows that SAT is NP-complete, but also demonstrates the use of both predicate and propositional calculus as encoding languages.
* The great difficulty in solving the Satisfiability Problem is surely due to this encoding power.
* Perhaps Satisfiability encodes other processes that could not possibly themselves be solved in polynomial time!

#### By the Way

Since you may have tried (and failed) to find a Sudoku solution with ```pycrytosat``` ---

see

http://swtv.kaist.ac.kr/courses/cs453-fall12/sudoku.pdf

and

https://www.researchgate.net/publication/228566840_Optimized_CNF_encoding_for_sudoku_puzzles

for more information.


#### 3SAT is NP-Complete

Since SAT can be transformed into 3SAT, and SAT is NP-Complete, therefore so is 3SAT.

* each clause with one or two literals becomes one with a literal replicated until the number is 3.
 
E.g., $(x_1 \lor x_2)$ becomes $(x_1 \lor x_2 \lor x_2)$.

* each clause with more than three literals gets split into several clauses by adding dummy variables that preserve the satisfiability or unsatisfiability of the original clause.
 
E.g., $(x_1 \lor x_2 \lor x_3 \lor x_4)$ becomes $(x_1 \lor x_2 \lor z) \land (\lnot z \lor x_3 \lor x_4).$

## 3

This is a question about 16.7.1 NP-Hard Problems can be Undecidable (Pitfall in Proofs).

Could you go over the proof of Theorem 16.7.1 that the language *Diophantine* is NP-Hard?

### Answer

Sure!

Let's abbreviate *Diophantine*, and call it the language $D = \{\langle p \rangle\ |\ p$ is a polynomial in several variables having an integral root$\}$.

A sample polynomial: $6x^3z^2 + 3xy^2 - x^3 - 10.$

Recall that $D$ was proven to be undecidable.

Let's see how to prove it is also NP-Hard --- which means that all problems in NP are polynomial-time reducible to it, even though it may not be in NP itself.

So, how do we show that all problems in NP are polynomial time reducible to D?

#### The Proof

There is a polytime mapping reduction from SAT to D, as follows:

Consider a CNF boolean formula $\phi$:
* Each literal of $\phi$, $x$, maps to integer variable $x$.
* Each literal $\overline{x}$ maps to integer expression $(1 - x)$.
* Each $\lor$ in a clause maps to $*$ (times).
* Each clause is mapped as above, and then **squared**.
* Each $\land$ maps to $+$ (plus).
* The resulting equation is set to 0.

For example: $\phi = (x \lor y) \land (x \lor \overline{y}) \land (\overline{x} \lor \overline{y})$ maps to

$E = (xy)^2 + (x(1-y))^2 + ((1-x)(1-y))^2 = 0.$

To show this is a mapping reduction, show $\phi$ is satisfied **if and only if** $E$ has integral roots.

#### Forward Direction

$\phi$ is satisfied $\rightarrow E$ has integral roots:

For any assignment of a variable $v$:

* If $v$ is true, assign $v$ in the integer domain the value 0;
* If $v$ is false, assign $v$ the integer value 1.

In the example, $x$ = true, $y$ = false, satisfies $\phi$, and so the integer assignment $x = 0, y = 1$ ensures that

* $(xy)^2$ is zero;
* $(x(1-y))^2$ is zero;
* $((1-x)(1-y))^2$ is zero;

The sum of these three expressions is zero, thus satisfying $E$.

#### Reverse Direction

$E$ has integral roots $\rightarrow \phi$ is satisfied:

Note that $E = 0$ means that each product term in the integer domain is 0, since squares can't be negative.

For example, $x = 45, y = 0,$ still means $(xy)^2$ is 0.

The boolean assignment is found as follows:

* For every integer variable $x$ that is **zero**, assign the corresponding boolean variable $x$ to **true**.
* For every integer variable $x$ that is **non-zero**, assign the corresponding boolean variable $x$ to **false**.

For example, $x = 0$ in a product term $(xy)$, assigning $x =$ **true** ensures $(x \lor y)$ is true.  Also, in $(x(1 - y))$ if $x = 45$ and $y = 1$, assigning $y =$ **false** and $x =$ **false** ensures that $(x \lor \overline{y})$ is true.

This construction ensures that $E = 0$ **exactly when** the corresponding formula $\phi$ has a satisfying assignment.

#### Emphasize the Pitfall

NP-Hard problems can be undecidable!

What happens if someone shows a language $L$ to be in NP-Hard but neglects to show $L$ is in NP, and yet claims $L$ is in NP-Complete?

That neglect means that someone may be claiming that something undecidable is decidable!

Recall that **all** NP-Complete problems are decidable.

NP-Completeness proofs **cannot** be correct unless the language in question is shown to belong to NP.