# Test-Driven Development in ACL2

Professional programmers know a simple fact: If your code isn't **extensively** tested, it's not working. In fact, this is the biggest reason why code reuse is so important. It's not that writing a linked list is hard. You learned how to do that in your first or second programming class. The reason why we use `List<Integer>` instead of writing our own linked list of integers in Java is that `List<Integer>` has been tested extensively, so we can be confident that it works.

Let me be very clear. Testing is not a thing that you do in class because your professors make you. **Testing is something that you should always do as a a programmer.** In fact, learning how to to test is an essential part of becoming a professional. It's what separates us from the folks who write code like this:

<img src="https://miro.medium.com/max/500/0*vsvLVt-w4WivlTOn.jpg">

To illustrate how to test functions in ACL2, let's begin with the definition of two functions we defined in a previous tutorial: 

In [1]:
(defsnapshot triangular-definition)

(definec triangular (n :nat) :nat
  (if (zp n)
      0
      (+ (triangular (- n 1)) n)))

(defsnapshot zeta2-definition)

(definec zeta2 (n :nat) :rational
  (if (zp n)
      0
      (+ (zeta2 (- n 1)) 
         (/ 1 (* n n)))))

ACL2S !>>(DEFSNAPSHOT TRIANGULAR-DEFINITION)

Summary
Form:  ( DEFLABEL TRIANGULAR-DEFINITION ...)
Rules: NIL
Time:  0.00 seconds (prove: 0.00, print: 0.00, other: 0.00)
 TRIANGULAR-DEFINITION
ACL2S !>>(DEFINEC TRIANGULAR (N NAT)
                  NAT
                  (IF (ZP N)
                      0 (+ (TRIANGULAR (- N 1)) N)))

Form:  ( TEST-DEFINITION TRIANGULAR ... )
Form:  ( TEST-BODY-CONTRACTS TRIANGULAR... ) 
Form:  ( TEST-FUNCTION-CONTRACT TRIANGULAR ...) 
Testing: Done 
Elapsed Run Time: 0.55 seconds
Form:  ( ADMIT-DEFINITION TRIANGULAR ... )
Time:  0.01 seconds (prove: 0.00, print: 0.00, other: 0.01)
Form:  ( PROVE-FUNCTION-CONTRACT TRIANGULAR ... )
Time:  0.08 seconds (prove: 0.03, print: 0.00, other: 0.05)
Form:  ( PROVE-BODY-CONTRACTS TRIANGULAR ... )
Time:  0.00 seconds (prove: 0.00, print: 0.00, other: 0.00)
Elapsed Run Time: 0.14 seconds
Function Name : TRIANGULAR 
Termination proven -------- [*] 
Function Contract proven -- [*] 
Body Contracts proven ----- [*]
 T
AC

## Unit Tests

At the very least, you should always write several **unit tests** for each of your functions. Professional programmers routinely write dozens of unit tests per function, but I will only ask you to write five for each.

A **unit test** simply checks that the function returns the right value for some given inputs. I usually test that the function does the right thing for a few inputs that I compute by hand. Later, if I discover a bug in the code, I add that particular input to my collection of tests, so the collection of tests is always growing.

Here is a simple unit test for `triangular`. We know that `(triangular 3) = 1+2+3 = 6`. In ACL2, you can perform this unit test using the function `check-expect`, as follows:

In [4]:
(check-expect (triangular 3) 6)

ACL2S !>>(CHECK-EXPECT (TRIANGULAR 3) 6)
 :PASSED

The output should look like this:

    ACL2S !>>(CHECK-EXPECT (TRIANGULAR 3) 6)
     :PASSED

The `:PASSED` is the indication that the test passes, as expected. 

To see what happens when ACL2 discovers an error, you can change the value "6" in that `check-expect`:

In [5]:
(check-expect (triangular 3) -9999999)

ACL2S !>>(CHECK-EXPECT (TRIANGULAR 3) -9999999)

Error in CHECK-EXPECT: Check failed (values not equal).
First value:  6
Second value: -9999999


It can sometimes be more convenient to leave the expected value as an expression without fully computing it, as in the following:

In [6]:
(check-expect (triangular 5) (+ 1 2 3 4 5))

ACL2S !>>(CHECK-EXPECT (TRIANGULAR 5)
                       (+ 1 2 3 4 5))
 :PASSED

## Properties and Randomized Testing

Every programming language comes with a unit-testing framework, e.g., JUnit for Java. These are great for checking what your code does for specific inputs, but sometimes programmers we often have more general beliefs about our programs that should also be checked. For example, you may believe that no matter what the input to your function, it will **never** return null. To test properties like this, you need to test with a large number of inputs, and this is where testing with **random inputs** makes sense.

For instance, you may remember that $1+2+\cdots+n=\frac{n(n+1)}{2}$. ACL2 provides the command `test?` to test properties like this. What ACL2 will do is to try many different random values of $n$ and see if the property holds for each value. Let's see how it works!


In [8]:
(test? (equal (triangular n) 
              (/ (* n (+ n 1)) 2)))

ACL2S !>>(TEST? (EQUAL (TRIANGULAR N)
                       (/ (* N (+ N 1)) 2)))

**Summary of Cgen/testing**
We tested 223 examples across 1 subgoals, of which 222 (222 unique)
satisfied the hypotheses, and found 3 counterexamples and 219 witnesses.

We falsified the conjecture. Here are counterexamples:
 [found in : "top"]
 -- ((N 8/49))
 -- ((N #C(2/3 1)))
 -- ((N -45))

Cases in which the conjecture is true include:
 [found in : "top"]
 -- ((N '((T) T)))
 -- ((N '(A)))
 -- ((N '(1 . T)))

Test? found a counterexample.


The result probably looks something like this:

    **Summary of Cgen/testing**
    We tested 223 examples across 1 subgoals, of which 222 (222 unique)
    satisfied the hypotheses, and found 3 counterexamples and 219 witnesses.
    
    We falsified the conjecture. Here are counterexamples:
     [found in : "top"]
     -- ((N 8/49))
     -- ((N #C(2/3 1)))
     -- ((N -45))
    
    Cases in which the conjecture is true include:
     [found in : "top"]
     -- ((N '((T) T)))
     -- ((N '(A)))
     -- ((N '(1 . T))
    
    Test? found a counterexample.

This is not what we expected! ACL2 discovered that our property is not true. (Actually, a cynic would say that **is** as as expected. Programmers are often mistaken in their beliefs about their programs.)

ACL2 proactively helps us debug the program by giving us random inputs where the program failed to satisfy the property. In particular, we can see above that the program does not work correctly when `n` is `8/49`, `#c(2/3 1)` (which is a complex number), or `-45`. This should give you a very good idea of what went wrong. We had intended `n` to be a natural number, but ACL2 is finding that our property is not true in some cases where `n` is a rational, or a complex number, or a negative integer. What we can see is that our property is buggy, though if we're lucky the program is correct.

Let's fix the property. We believe that $1+ 2+\cdots+n=\frac{n(n+1)}{2}$, but only when $n$ is a natural number. This is where logic comes in. We can express this property using logical implication. Oh, one more thing: The ACL2 built-in `(natp n)` is true precisely when `n` is a natural number.

In [10]:
(test? (implies (natp n)
                (equal (triangular n) 
                       (/ (* n (+ n 1)) 2))))

ACL2S !>>(TEST? (IMPLIES (NATP N)
                         (EQUAL (TRIANGULAR N)
                                (/ (* N (+ N 1)) 2))))

**Summary of Cgen/testing**
We tested 3000 examples across 3 subgoals, of which 2939 (2939 unique)
satisfied the hypotheses, and found 0 counterexamples and 2939 witnesses.

Cases in which the conjecture is true include:
 [found in : "top"]
 -- ((N 732))
 -- ((N 14))
 -- ((N 767))

Test? succeeded. No counterexamples were found.

When you submit that test to ACL2, you should see something like the following:

    **Summary of Cgen/testing**
    We tested 3000 examples across 3 subgoals, of which 2939 (2939 unique)
    satisfied the hypotheses, and found 0 counterexamples and 2939 witnesses.
    
    Cases in which the conjecture is true include:
     [found in : "top"]
     -- ((N 732))
     -- ((N 14))
     -- ((N 767))
    
    Test? succeeded. No counterexamples were found.

The important line is the last one. It says that no counterexamples were found, which means that all random values of `n` satisfied the property; they all passed! In the first paragraph, ACL2 tells us that it tried 3,000 random values of `n`, of which 2,939 were actually different values of `n`. Among those 2,939 values were 732, 14, and 767. They all passed, so that's 2,939 out of 2,939. You should probably feel pretty confident that the property is right. It's still possible that it fails in some strange cases, but the odds are in your favor.

We can do the same thing with `(zeta2 n)`. Euler discovered that $\zeta(2) = \pi^2/6$. $\zeta(2)$ is an infinite sum, and `(zeta2 n)` is an approximation to $\zeta(2)$ by adding up some of those terms. So we have that

$$(zeta2\, n) \le \zeta(2) = \frac{\pi^2}/6 \le \frac{3.1416^2}{6}$$

We can test this with ACL2:

In [12]:
(test? (<= (zeta2 n) (/ (expt 3.1416 2) 6)))

ACL2S !>>(TEST? (<= (ZETA2 N) (/ (EXPT 3927/1250 2) 6)))

**Summary of Cgen/testing**
We tested 3000 examples across 3 subgoals, of which 2762 (2762 unique)
satisfied the hypotheses, and found 0 counterexamples and 2762 witnesses.

Cases in which the conjecture is true include:
 [found in : "top"]
 -- ((N '(-9/14 1/2 -1)))
 -- ((N '(2 . T)))
 -- ((N NIL))

Test? succeeded. No counterexamples were found.

This is a mixed success. ACL2 reports that

    **Summary of Cgen/testing**
    We tested 3000 examples across 3 subgoals, of which 2762 (2762 unique)
    satisfied the hypotheses, and found 0 counterexamples and 2762 witnesses.
    
    Cases in which the conjecture is true include:
     [found in : "top"]
     -- ((N '(-9/14 1/2 -1)))
     -- ((N '(2 . T)))
     -- ((N NIL))
    
    Test? succeeded. No counterexamples were found.

On the one hand, ACL2 did not find any counterexamples, so all random values of `n` succeeded. On the other hand, when you look at some of those random values of `n`, it's clear that ACL2 is **not** testing with appropriate values. After all, we intend that `n` is a natural number, so why are we testing when `n=NIL`?

Again, we missed a hypothesis in our property. Let's fix it.

In [13]:
(test? (implies (natp n)
                (<= (zeta2 n) (/ (expt 3.1416 2) 6))))

ACL2S !>>(TEST? (IMPLIES (NATP N)
                         (<= (ZETA2 N)
                             (/ (EXPT 3927/1250 2) 6))))

**Summary of Cgen/testing**
We tested 3000 examples across 3 subgoals, of which 2882 (2882 unique)
satisfied the hypotheses, and found 0 counterexamples and 2882 witnesses.

Cases in which the conjecture is true include:
 [found in : "top"]
 -- ((N 86))
 -- ((N 30))
 -- ((N 3))

Test? succeeded. No counterexamples were found.

That's more like it! This time ACL2 tested our property with 2,882 values of `n`, and the property passed all those tests. And, each one of those random values of `n` was a natural number, like 86, 30, and 3. It's still possible that the property fails for some other values of `n`, but this does make me confident that the property is true.

**Beware of overconfidence!** We sais above that "this does make me confident that the property is true," but we did not say "this does make confident that the program works." The property we mentioned, namely that `(zeta2 n)` is less than or equal to $\pi^2/6$ is not enough by itself to guarantee that the program works. For example, it could be the case that the function `(zeta2 n)` always returns 0, and this property would be true! In the testing community, these properties are known as **little theories**. They can help you gain confidence in your program, even though they do give you total guarantee of success. Such properties are very common in practice, but you should always remember their limitations. It's possible to give more reassuring properties. For example, if we could show that both of these properties are true, I would feel very confident in the program:

* $\zeta_2(n) \le \pi^2/6$
* $\epsilon > 0 \rightarrow [(\exists N) (n>N \rightarrow \pi^2/6 - \epsilon \le \zeta_2(n)]$

When it really matters that your program works correctly, you may need to test both of the properties above, but testing the second property is very much harder than the first property, because of the $\exists N$. (You may recall how much easier it is to prove properties with $\forall x$ than with $\exists x$, since for a $\exists x$ you actually have to *find* an $x$ with the right property.)

## Properties and Proofs

When it really matters that your program works correctly, randomized testing is better than just unit tests, but even randomized testing may not be enough. This is when you consider going the extra mile and **proving** that your property is true.

ACL2 is more than a programming language; it is a state-of-the-art theorem prover. In industry, ACL2 is used in various settings where correctness is essential, e.g., in the design of hardware chips or mission-critical software. We'll use ACL2 to (try to) prove the properties we tested above.

To prove a property in ACL2, you have two choices

* `(thm ...property...)`
* `(defthm name ...property...)`

The first form tries to prove the property and then essentially forgets about it. The second form gives the property a name, and remembers the property. If ACL2 remembers a property, it is able to use later when proving other properties. This is the way that ACL2 is used in industry. It is almost never the case that ACL2 discovers the proof of your property automatically. So what you have to do is prove a sequence of theorems that lead up to the property you want. Industrial proof efforts will often require hundreds (or thousands) of these intermediate theorems!

But let's just try to prove the first property and see what ACL2 can do.

In [14]:
(thm (implies (natp n)
              (equal (triangular n) 
                     (/ (* n (+ n 1)) 2))))

ACL2S !>>(THM (IMPLIES (NATP N)
                       (EQUAL (TRIANGULAR N)
                              (/ (* N (+ N 1)) 2))))

risk has been detected for a call of function ACL2::TEST-CHECKPOINT
(as possibly leading to an ill-guarded call of CGEN::UI); see :DOC
invariant-risk.


By the simple :definition NATP and the simple :rewrite rule 
ACL2::|(* (* x y) z)| we reduce the conjecture to

Goal'
(IMPLIES (AND (INTEGERP N) (<= 0 N))
         (EQUAL (TRIANGULAR N)
                (* N (+ N 1) 1/2))).

risk has been detected for a call of function ACL2::TEST-CHECKPOINT
(as possibly leading to an ill-guarded call of CGEN::UI); see :DOC
invariant-risk.


This simplifies, using the :definition SYNP, the :executable-counterpart
of BINARY-* and the :rewrite rules ACL2::|(* x (+ y z))|, ACL2::|(* x x)|,
ACL2::|(* y (* x z))|, ACL2::|(* y x)|, ACL2::|(+ y x)|, 
ACL2::BUBBLE-DOWN-*-MATCH-1 and ACL2::NORMALIZE-FACTORS-GATHER-EXPONENTS,
to

Goal''
(IMPLIES (AND (INTEGERP N) (<= 0 N))
         (E

Success! The last lines of the output look like this

    **Summary of Cgen/testing**
    We tested 2000 examples across 2 subgoals, of which 1930 (1930 unique)
    satisfied the hypotheses, and found 0 counterexamples and 1930 witnesses.

    Cases in which the conjecture is true include:
     [found in : "Goal"]
     -- ((N 667))
     -- ((N 7))
     -- ((N 1))
    
    Proof succeeded.

The last line is the most important one; it says that the proof succeeded, which means that our conjecture about `(triangular n)` is true. As you can see from the output, ACL2 first used randomized testing to check the property. After all, if testing revealed that the property is false, there's no point in trying to find a proof.

The output from ACL2 is actually quite verbose. Before getting this summary at the end, ACL2 does two things. First, it describes the proof that it found. In this particular case, ACL2 used mathematical induction on the variable `n`. Second, it list all the facts that it used in the proof. For example, this proof used the fact called `|(expt (+ x y) 2)|`, which is the familiar result from algebra that $(x+y)^2 = x^2 + 2xy + y^2$. That's what we meant earlier when we said that ACL2 uses previously-proved properties as it's trying to prove a new property.

If you use `thm` to prove a property property, ACL2 will essentially forget it after it's done with the proof, so it will not be able to use this fact later, as it tries to prove something else. If we want ACL2 to remember the property so it can be used later, you use `defthm` instead of `thm`. Let 's do that!

In [15]:
(defthm triangular-formula
  (implies (natp n)
           (equal (triangular n) 
                  (/ (* n (+ n 1)) 2))))

ACL2S !>>(DEFTHM TRIANGULAR-FORMULA
                 (IMPLIES (NATP N)
                          (EQUAL (TRIANGULAR N)
                                 (/ (* N (+ N 1)) 2))))

risk has been detected for a call of function ACL2::TEST-CHECKPOINT
(as possibly leading to an ill-guarded call of CGEN::UI); see :DOC
invariant-risk.


By the simple :definition NATP and the simple :rewrite rule 
ACL2::|(* (* x y) z)| we reduce the conjecture to

Goal'
(IMPLIES (AND (INTEGERP N) (<= 0 N))
         (EQUAL (TRIANGULAR N)
                (* N (+ N 1) 1/2))).

risk has been detected for a call of function ACL2::TEST-CHECKPOINT
(as possibly leading to an ill-guarded call of CGEN::UI); see :DOC
invariant-risk.


This simplifies, using the :definition SYNP, the :executable-counterpart
of BINARY-* and the :rewrite rules ACL2::|(* x (+ y z))|, ACL2::|(* x x)|,
ACL2::|(* y (* x z))|, ACL2::|(* y x)|, ACL2::|(+ y x)|, 
ACL2::BUBBLE-DOWN-*-MATCH-1 and ACL2::NORMALIZE-FACTORS-GATHER-EXPONENTS,
to

Goal''
(IM

Again, the proof succeeds. The output is similar to what we saw with `thm` (ACL2 did find the same proof, after all), but instead of finishing with "Proof succeeded", ACL2 finished with the name of the theorem. That's the name that it will use to remember this fact, `TRIANGULAR-FORMULA` in this case.

That worked very well! Let's do it again, this time with the property about `zeta2`:

In [17]:
(thm (implies (natp n)
              (<= (zeta2 n) (/ (expt 3.1416 2) 6))))

ACL2S !>>(THM (IMPLIES (NATP N)
                       (<= (ZETA2 N)
                           (/ (EXPT 3927/1250 2) 6))))

risk has been detected for a call of function ACL2::TEST-CHECKPOINT
(as possibly leading to an ill-guarded call of CGEN::UI); see :DOC
invariant-risk.


By the simple :definition NATP and the :executable-counterparts of
BINARY-* and EXPT we reduce the conjecture to

Goal'
(IMPLIES (AND (INTEGERP N) (<= 0 N))
         (<= (ZETA2 N) 5140443/3125000)).

risk has been detected for a call of function ACL2::TEST-CHECKPOINT
(as possibly leading to an ill-guarded call of CGEN::UI); see :DOC
invariant-risk.


Name the formula above *1.

Perhaps we can prove *1 by induction.  Two induction schemes are suggested
by this conjecture.  Subsumption reduces that number to one.  

We will induct according to a scheme suggested by (ZETA2 N).  This
suggestion was produced using the :induction rules ZETA2 and 
ZETA2-INDUCTION-SCHEME.  If we let (:P N) denote *1 above then the
induct

We were not as lucky this time! The proof 

    ******** FAILED ********

as ACL2 puts it. I.e., ACL2 was unable to find a proof of our property. However, you can see from the output that ACL2 tested the property 4,000 times, and each time the property worked. What we see here is that randomized testing hasn't found a bug, but ACL2 cannot find a proof that there aren't any bugs.

This is very common. The truth is we got quite lucky with the proof of `TRIANGULAR-FORMULA` above. As we mentioned earlier, this is where you would have to come up with some intermediate properties that ACL2 can use to find the proof of the property we want. The process requires you to look at the proof attempt, try to understand why it failed, and use that to discover a key lemma that may be useful. Often, the "Key Checkpoints" that ACL2 mentions at the end of the failed proof attempt are good places to start. The tutorial on "The Method" describes this process in more detail. But for this specific property, finding a proof in ACL2 would actually be quite difficult, so we will simply move on. But if you're really interested in this, a proof of this fact (called the Basel Problem) would make a wonderful Master's thesis!