---
title: Using the C Bounded Model Checker as a TLA+
date: 2024-10-07
---

[TLA+](https://learntla.com/) and Alloy are lower barrier entry software verification tools. They are typically used on systems or protocol level models rather than modelling the exact source. There are many bugs that can appear at this level and they are super useful for clarifying your thinking.

CBMC is a tool I'm pretty bullish on. It is a bounded model checker for C code. It more or less unrolls all loops in normal compileable C to some depth.

What really makes this great is that there is no new language for software engineers to learn or accept. C has already made it in. People accept C as a useful pragamtic language.

Ultimately, there is surprisingly little difference between ordinary programming and a logic if you are trying to see the similarities. A logical spec is not that different writing a programmatic check. CBMC takes it's specs largely in the form of regular asserts.

I think these advantages are also useful even if you aren't trying to prove things about C code. For all our bellyaching about C undefined behavior and so on, if you stick to a boring subset of the language and avoid pointers, it is a reasonable "Imp". I kind of think is more structured or typed than TLA+ is in it's design.

So I think it's interesting to consider using C via CBMC for tasks that you might normally use TLA+ for. 
You get all sorts of tooling, training, and idioms for free.

Instead of writing a logical specification of the allowable transitions, you write a simulator. Instead of writing a logical specification of bad behavior, throw in some asserts.

This simulator is by default executable / compileable and in fact swiftly executable if you don't want to use CBMC but instead fuzz.

Follow along on colab https://colab.research.google.com/github/philzook58/philzook58.github.io/blob/master/pynb/2024-10-07-high_cbmc.ipynb

# Die Hard

This is a collection of TLA+ examples https://github.com/tlaplus/Examples

One example is a water puzzle from Die Hard 3. https://github.com/tlaplus/Examples/blob/master/specifications/DieHard/DieHard.tla



In [None]:
%%file /tmp/DieHarder.tla

------------------------------ MODULE DieHard ------------------------------- 
(***************************************************************************)
(* In the movie Die Hard 3, the heroes must obtain exactly 4 gallons of     *)
(* water using a 5 gallon jug, a 3 gallon jug, and a water faucet.  Our    *)
(* goal: to get TLC to solve the problem for us.                           *)
(*                                                                         *)
(* First, we write a spec that describes all allowable behaviors of our    *)
(* heroes.                                                                  *)
(***************************************************************************)
EXTENDS Naturals
  (*************************************************************************)
  (* This statement imports the definitions of the ordinary operators on   *)
  (* natural numbers, such as +.                                           *)
  (*************************************************************************)
  
(***************************************************************************)
(* We next declare the specification's variables.                          *)
(***************************************************************************)
VARIABLES big,   \* The number of gallons of water in the 5 gallon jug.
          small  \* The number of gallons of water in the 3 gallon jug.


(***************************************************************************)
(* We now define TypeOK to be the type invariant, asserting that the value *)
(* of each variable is an element of the appropriate set.  A type          *)
(* invariant like this is not part of the specification, but it's          *)
(* generally a good idea to include it because it helps the reader         *)
(* understand the spec.  Moreover, having TLC check that it is an          *)
(* invariant of the spec catches errors that, in a typed language, are     *)
(* caught by type checking.                                                *)
(*                                                                         *)
(* Note: TLA+ uses the convention that a list of formulas bulleted by /\   *)
(* or \/ denotes the conjunction or disjunction of those formulas.         *)
(* Indentation of subitems is significant, allowing one to eliminate lots  *)
(* of parentheses.  This makes a large formula much easier to read.        *)
(* However, it does mean that you have to be careful with your indentation.*)
(***************************************************************************)
TypeOK == /\ small \in 0..3 
          /\ big   \in 0..5


(***************************************************************************)
(* Now we define of the initial predicate, that specifies the initial      *)
(* values of the variables.  I like to name this predicate Init, but the   *)
(* name doesn't matter.                                                    *)
(***************************************************************************)
Init == /\ big = 0 
        /\ small = 0

(***************************************************************************)
(* Now we define the actions that our hero can perform.  There are three   *)
(* things they can do:                                                     *)
(*                                                                         *)
(*   - Pour water from the faucet into a jug.                              *)
(*                                                                         *)
(*   - Pour water from a jug onto the ground.                              *)
(*                                                                         *)
(*   - Pour water from one jug into another                                *)
(*                                                                         *)
(* We now consider the first two.  Since the jugs are not calibrated,      *)
(* partially filling or partially emptying a jug accomplishes nothing.     *)
(* So, the first two possibilities yield the following four possible       *)
(* actions.                                                                *)
(***************************************************************************)
FillSmallJug  == /\ small' = 3 
                 /\ big' = big

FillBigJug    == /\ big' = 5 
                 /\ small' = small

EmptySmallJug == /\ small' = 0 
                 /\ big' = big

EmptyBigJug   == /\ big' = 0 
                 /\ small' = small

(***************************************************************************)
(* We now consider pouring water from one jug into another.  Again, since  *)
(* the jugs are not calibrated, when pouring from jug A to jug B, it      *)
(* makes sense only to either fill B or empty A. And there's no point in   *)
(* emptying A if this will cause B to overflow, since that could be        *)
(* accomplished by the two actions of first filling B and then emptying A. *)
(* So, pouring water from A to B leaves B with the lesser of (i) the water *)
(* contained in both jugs and (ii) the volume of B. To express this        *)
(* mathematically, we first define Min(m,n) to equal the minimum of the    *)
(* numbers m and n.                                                        *)
(***************************************************************************)
Min(m,n) == IF m < n THEN m ELSE n

(***************************************************************************)
(* Now we define the last two pouring actions.  From the observation       *)
(* above, these definitions should be clear.                               *)
(***************************************************************************)
SmallToBig == /\ big'   = Min(big + small, 5)
              /\ small' = small - (big' - big)

BigToSmall == /\ small' = Min(big + small, 3) 
              /\ big'   = big - (small' - small)

(***************************************************************************)
(* We define the next-state relation, which I like to call Next.  A Next   *)
(* step is a step of one of the six actions defined above.  Hence, Next is *)
(* the disjunction of those actions.                                       *)
(***************************************************************************)
Next ==  \/ FillSmallJug 
         \/ FillBigJug    
         \/ EmptySmallJug 
         \/ EmptyBigJug    
         \/ SmallToBig    
         \/ BigToSmall    

(***************************************************************************)
(* We define the formula Spec to be the complete specification, asserting  *)
(* of a behavior that it begins in a state satisfying Init, and that every *)
(* step either satisfies Next or else leaves the pair <<big, small>>       *)
(* unchanged.                                                              *)
(***************************************************************************)
Spec == Init /\ [][Next]_<<big, small>> 
-----------------------------------------------------------------------------

(***************************************************************************)
(* Remember that our heroes must measure out 4 gallons of water.            *)
(* Obviously, those 4 gallons must be in the 5 gallon jug.  So, they have  *)
(* solved their problem when they reach a state with big = 4.  So, we      *)
(* define NotSolved to be the predicate asserting that big # 4.            *)
(***************************************************************************)
NotSolved == big # 4

(***************************************************************************)
(* We find a solution by having TLC check if NotSolved is an invariant,    *)
(* which will cause it to print out an "error trace" consisting of a       *)
(* behavior ending in a states where NotSolved is false.  Such a           *)
(* behavior is the desired solution.  (Because TLC uses a breadth-first    *)
(* search, it will find the shortest solution.)                            *)
(***************************************************************************)
=============================================================================

In [14]:
%%file /tmp/diehard.c
#include <assert.h>


typedef enum {
    FILL_SMALL,
    FILL_BIG,
    EMPTY_SMALL,
    EMPTY_BIG,
    SMALL_TO_BIG,
    BIG_TO_SMALL,
} Action;

int min(int a, int b){
    return a < b ? a : b;
}


extern Action rand_action();



int main(){
    int big = 0;
    int small = 0;
    for(;;){
        assert(big >= 0 && big <= 5);
        assert(small >= 0 && small <= 3);
        assert(big != 4); // solved state     
        switch (rand_action()){
            case FILL_SMALL:
                small = 3;
                break;
            case FILL_BIG:
                big = 5;
                break;
            case EMPTY_SMALL:
                small = 0;
                break;
            case EMPTY_BIG:
                big = 0;
                break;
            case SMALL_TO_BIG:
                int old_big = big;
                big = min(big + small, 5);
                small -= big - old_big;
                break;
            case BIG_TO_SMALL:
                int old_small = small;
                small = min(big + small, 3);
                big -= small - old_small;
                break;
        }
    }
}

Overwriting /tmp/diehard.c


Get CBMC https://github.com/diffblue/cbmc/releases/tag/cbmc-6.3.1 . There are prepackaged versions (and a docker version).

In [None]:
! wget https://github.com/diffblue/cbmc/releases/download/cbmc-6.3.1/ubuntu-20.04-cbmc-6.3.1-Linux.deb && dpkg -i ubuntu-20.04-cbmc-6.3.1-Linux.deb

In [132]:
! cbmc /tmp/diehard.c --unwind 7 --no-unwinding-assertions --trace

CBMC version 6.0.1 (cbmc-6.0.1-5-g54c20cdb91) 64-bit x86_64 linux
Type-checking diehard
Generating GOTO Program
Adding CPROVER library (x86_64)
Removal of function pointers and virtual functions
Generic Property Instrumentation
Starting Bounded Model Checking
Passing problem to propositional reduction
converting SSA


Running propositional reduction
SAT checker: instance is SATISFIABLE
Building error trace
Running propositional reduction
SAT checker: instance is UNSATISFIABLE

** Results:
/tmp/diehard.c function main
[2m[main.assertion.1] [0mline 23 assertion big >= 0 && big <= 5: [32mSUCCESS[0m
[2m[main.assertion.2] [0mline 24 assertion small >= 0 && small <= 3: [32mSUCCESS[0m
[2m[main.assertion.3] [0mline 25 assertion big != 4: [31mFAILURE[0m
[2m[main.overflow.1] [0mline 41 arithmetic overflow on signed + in big + small: [32mSUCCESS[0m
[2m[main.overflow.2] [0mline 42 arithmetic overflow on signed - in big - old_big: [32mSUCCESS[0m
[2m[main.overflow.3] [0mline 42 arithmetic overflow on signed - in small - (big - old_big): [32mSUCCESS[0m
[2m[main.overflow.4] [0mline 46 arithmetic overflow on signed + in big + small: [32mSUCCESS[0m
[2m[main.overflow.5] [0mline 47 arithmetic overflow on signed - in small - old_small: [32mSUCCESS[0m
[2m[main.overflow.6] [0mline 47 arit

Ok, but many of the things I might want to use TLA+ for are concurrent processes.

If you write your C simulator in the right style, this is not hard to model.

TLA+ offers an imperative syntax called PlusCal that compiles down to the logical specification. You have to write your C simulator in a related style.

Each process has a state which _crucially_ includes a "program counter". It is easy to forget sometimes that a program counter is a thing because C and other high level languages make it implicit. But it is there and it is useful.

https://learntla.com/intro/conceptual-overview.html

In [83]:
%%file /tmp/wire.tla

---- MODULE wire ----
EXTENDS TLC, Integers

People == {"alice", "bob"}
Money == 1..10
NumTransfers == 2

(* --algorithm wire
variables
  acct \in [People -> Money];

define
  NoOverdrafts ==
    \A p \in People:
      acct[p] >= 0
end define;

process wire \in 1..NumTransfers
variable
  amnt \in 1..5;
  from \in People;
  to \in People
begin
  Check:
    if acct[from] >= amnt then
      Withdraw:
        acct[from] := acct[from] - amnt;
      Deposit:
        acct[to] := acct[to] + amnt;
    end if;
end process;
end algorithm; *)

====

Writing /tmp/wire.tla


Here I'm going to run TLA+ in command line mode as described here  https://learntla.com/topics/cli.html

In [None]:
%%file /tmp/wire.cfg
SPECIFICATION Spec

INVARIANT NoOverdrafts


In [77]:
! cd /tmp && wget https://github.com/tlaplus/tlaplus/releases/download/v1.7.4/tla2tools.jar #install tla+

--2024-10-07 20:35:54--  https://github.com/tlaplus/tlaplus/releases/download/v1.7.4/tla2tools.jar
Resolving github.com (github.com)... 140.82.113.4
Connecting to github.com (github.com)|140.82.113.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/50906927/b215e7c4-2d49-49c9-bcfa-634e26027bdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20241008%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241008T003554Z&X-Amz-Expires=300&X-Amz-Signature=ba93c3533e3330f9a278185ab99fab8abbeb21d4c33be0a409e73cabc6999a43&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3Dtla2tools.jar&response-content-type=application%2Foctet-stream [following]
--2024-10-07 20:35:54--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/50906927/b215e7c4-2d49-49c9-bcfa-634e26027bdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=r

In [85]:
%%bash
cd /tmp
java -cp tla2tools.jar pcal.trans wire.tla
java -jar tla2tools.jar -config wire.cfg wire.tla

pcal.trans Version 1.11 of 31 December 2020
Parsing completed.
Translation completed.
New file wire.tla written.
New file wire.cfg written.


TLC2 Version 2.19 of 08 August 2024 (rev: 5a47802)
Running breadth-first search Model-Checking with fp 68 and seed -4391951418103122557 with 1 worker on 16 cores with 15752MB heap and 64MB offheap memory [pid: 594435] (Linux 6.5.0-1027-oem amd64, Ubuntu 21.0.4 x86_64, MSBDiskFPSet, DiskStateQueue).
Parsing file /tmp/wire.tla
Parsing file /tmp/TLC.tla
Parsing file /tmp/Integers.tla
Parsing file /tmp/Naturals.tla
Parsing file /tmp/Sequences.tla
Parsing file /tmp/FiniteSets.tla
Semantic processing of module Naturals
Semantic processing of module Sequences
Semantic processing of module FiniteSets
Semantic processing of module TLC
Semantic processing of module Integers
Semantic processing of module wire
Starting... (2024-10-07 20:41:25)
Computing initial states...
Computed 2 initial states...
Computed 4 initial states...
Computed 8 initial states...
Computed 16 initial states...
Computed 32 initial states...
Computed 64 initial states...
Computed 128 initial states...
Computed 256 initial s

CalledProcessError: Command 'b'cd /tmp\njava -cp tla2tools.jar pcal.trans wire.tla\njava -jar tla2tools.jar -config wire.cfg wire.tla\n'' returned non-zero exit status 12.

The example here is just having two calls to withdrawing from Alice. Pretty simple.

Here is an attempt at a similar C program.

It has to be written in a somewhat odd style. 

Finagling C ints is actually super painful. It is also a bit verbose

In [130]:
%%file /tmp/bank.c

#include <assert.h>
#include <stdint.h>
#include <stdbool.h>

#define NumTransfers 2


#define PeopleCount 2
typedef enum { ALICE = 0, BOB = 1 } People;

typedef int8_t Money;

typedef enum { Check, Withdraw, Deposit, Done } Action;

typedef struct {
    People from;
    People to;
    Money amnt;
    Action label;
} procstate_t;

// Global state variables
// Basically arrays seem like ok stand ins for TLA+ key value maps
Money acct[PeopleCount];

procstate_t proc_states[NumTransfers];

extern uint8_t rand_proc();
extern Money rand_money() __CPROVER_ensures(__CPROVER_return_value >= 0 && __CPROVER_return_value <= 5);

// Function to initialize the process states
void initialize_processes() {
    for(int p = 0; p < PeopleCount; p++) {
        Money m = rand_money();
        m = m < 0 ? 0 : m % 10;
        acct[p] = m;
    }
    for (int i = 0; i < NumTransfers; i++) {
        //proc_states[i].from %= PeopleCount;
        //proc_states[i].to %= PeopleCount;
        proc_states[i].label = Check;
        proc_states[i].amnt = rand_money() % 5 + 4;   
    }
}


void wire(uint8_t procnum) 
__CPROVER_requires(procnum < NumTransfers)
{
    procstate_t *p = &(proc_states[procnum]);
    //printf("procnum: %d, from: %d, to: %d, amnt: %d, label: %d\n", procnum, p->from, p->to, p->amnt, p->label);
    switch (p->label) {
        case Check:
            if (acct[p->from] >= p->amnt) {
                p->label = Withdraw;
            } else {
                p->label = Done;
            }
            break;
        case Withdraw:
            acct[p->from] -= p->amnt;
            p->label = Deposit;
            break;
        case Deposit:
            acct[p->to] += p->amnt;
            p->label = Done;
            break;
        case Done:
            break;
    }
}

void check_no_overdrafts() {
    for (int p = 0; p < PeopleCount; p++) {
        // No account should have a negative balance
        assert(acct[p] >= 0);
    }
}

int main() {
    initialize_processes();
    for(;;){
        wire(rand_proc() % NumTransfers);
        check_no_overdrafts();
    }
}


Overwriting /tmp/bank.c


In [128]:
!cbmc /tmp/bank.c --unwind 10

CBMC version 6.0.1 (cbmc-6.0.1-5-g54c20cdb91) 64-bit x86_64 linux
Type-checking bank
file /tmp/bank.c line 52 function wire: function 'printf' is not declared
Generating GOTO Program
Adding CPROVER library (x86_64)
file <builtin-library-printf> line 14: implicit function declaration 'printf'
old definition in module bank file /tmp/bank.c line 52 function wire
signed int (void)
new definition in module <built-in-library> file <builtin-library-printf> line 14
signed int (const char *format, ...)
Removal of function pointers and virtual functions
Generic Property Instrumentation
Starting Bounded Model Checking


Passing problem to propositional reduction
converting SSA
Running propositional reduction
SAT checker: instance is SATISFIABLE
Running propositional reduction
SAT checker: instance is SATISFIABLE
Running propositional reduction
SAT checker inconsistent: instance is UNSATISFIABLE

** Results:
/tmp/bank.c function check_no_overdrafts
[2m[check_no_overdrafts.overflow.1] [0mline 75 arithmetic overflow on signed + in p + 1: [32mSUCCESS[0m
[2m[check_no_overdrafts.array_bounds.1] [0mline 77 array 'acct' lower bound in acct[(signed long int)p]: [32mSUCCESS[0m
[2m[check_no_overdrafts.array_bounds.2] [0mline 77 array 'acct' upper bound in acct[(signed long int)p]: [32mSUCCESS[0m
[2m[check_no_overdrafts.assertion.1] [0mline 77 assertion acct[p] >= 0: [31mFAILURE[0m

/tmp/bank.c function initialize_processes
[2m[initialize_processes.overflow.1] [0mline 34 arithmetic overflow on signed + in p + 1: [32mSUCCESS[0m
[2m[initialize_processes.no-body.rand_money] [0mline 35 no body f

We can narrow down on the only property we care about. This trace is much harder to read than the above.

In [131]:
!cbmc /tmp/bank.c  --unwind 10 --property  check_no_overdrafts.assertion.1 --no-unwinding-assertions --no-standard-checks --compact-trace

CBMC version 6.0.1 (cbmc-6.0.1-5-g54c20cdb91) 64-bit x86_64 linux
Type-checking bank
Generating GOTO Program
Adding CPROVER library (x86_64)
Removal of function pointers and virtual functions
Generic Property Instrumentation
Starting Bounded Model Checking


Passing problem to propositional reduction
converting SSA
Running propositional reduction
SAT checker: instance is SATISFIABLE
Building error trace

** Results:
/tmp/bank.c function check_no_overdrafts
[2m[check_no_overdrafts.assertion.1] [0mline 77 assertion acct[p] >= 0: [31mFAILURE[0m

Trace for check_no_overdrafts.assertion.1:
  [2m25:[0m acct[0l]=0 [2m(00000000)[0m
  [2m25:[0m acct[1l]=0 [2m(00000000)[0m
  [2m27:[0m proc_states[0l]={ .from=/*enum*/ALICE, .to=/*enum*/ALICE, .amnt=0,
    .$pad3=0, .label=/*enum*/Check } [2m({ 00000000 00000000 00000000 00000000, 00000000 00000000 00000000 00000000, 00000000, 00000000 00000000 00000000, 00000000 00000000 00000000 00000000 })[0m
  [2m27:[0m proc_states[0l].from=/*enum*/ALICE [2m(00000000 00000000 00000000 00000000)[0m
  [2m27:[0m proc_states[0l].to=/*enum*/ALICE [2m(00000000 00000000 00000000 00000000)[0m
  [2m27:[0m proc_states[0l].amnt=0 [2m(00000000)[0m
  [2m27:[0m proc_states[0l].$pad3=0 [2m(00000000

# Bits and Bobbles

Ok. Interesting experiment. It was not nearly as easy to do the second puzzle as I'd hoped.

I think one could get in the groove and clean up the C code to be more like the TLA+ code.

One can use `__CPROVER_assume` annotations to do the TLA+ style transition relation instead of an imperative style also.

Maybe Kani (the rust equivlent of CBMC) would be easier?





There is a spectrum of different levels of assurance in software verification.

The hard core stuff is using interactive theorem provers like Lean, Coq, Isabelle. It is maximally expressive with lots of proof burden.

You can also try to really connect your model up to your implementation as different levels of fineness.

High level models are often easier to check and prove things about. For concurrent or distributed systems, even these high level models can easily hold bugs, and there is benefit to exercising them.

Sometimes you can think of this as being at the protocol level. The sort of thing that might be a diagram in some document rather than the source code itself.

CBMC is the C based model checker. It is designed to symbolically execute C source code and check for bugs. It aims to be sound (with assumptions and caveats). If CBMC finishes all green, there really shouldn't be certain classes of bugs in the program.

Fuzzing tools are very effective, but it is a weaker guarantee if they come back all green. 


