---
title: DWARF Verification via Ghidra and CBMC
---

DWARF is debug data. It is [under appreciated](https://www.philipzucker.com/dwarf-patching/).

Debug data is a translation validation artifact. By design, it leaves breadcrumbs of how the low code connects to he high code.

I've been working on a tool pcode2c which uses ghidra pcode lifting and generates a specialized C interpreter for that binary. In other words, static binary translation to C. This is not decompilation really, because I make zero effort to make readable idiomatic C, recover loops, recover types. All of that decreases the connection to the original binary.


There are interesting papers on checking dwarf data. One approach is to compare `-O0` execution/debug data with higher optimization levels. Presumably `-O0` is the least likely to omit or incorrectly generate debug data.

You could also run an assembly program in a debugger in sync with a C interpreter (like https://github.com/kframework/c-semantics  https://www.cl.cam.ac.uk/~pes20/cerberus/ https://www.reddit.com/r/ProgrammingLanguages/comments/hf441y/an_interpreter_for_c/ ). Set a breakpoint at every position the debug data should have an opinion

These are all viable, but because of the particular hammer I've been forging, it makes sense to do this same process using CBMC, the C bounded model checker.

One annoying thing is that DWARf annotates with repsect to syntactic positions. The line table can encode line and column information in the source, which is nice because it is highly generic with repsect to language, but obviously a bummer because it can be very unclear semantically what a particular synatctic position corresponds to.


It is kind of cool though that also if I want to annotate the original source, it is in principle a very easy syntactic insertion. I've been writing a tool to do this but have lost steam on doing so, so I think it is best to just get the idea out the showing how this can be done manually.


https://www.mimuw.edu.pl/~alx/konstruowanie/ACSL-by-Example.pdf




In [19]:
%%file /tmp/all_equal.c
#include <stdbool.h>
#include <assert.h>
bool equal(int n, int a[], int b[]){
    for(int i = 0; i < n; i++){
        if(a[i] != b[i]){
            return false;
        }
    }
    return true;
}

int main(){
    int a[10]; // = {42}; //calloc(10*sizeof(int));
    assert(equal(10, a, a));
}


Overwriting /tmp/all_equal.c


In [20]:
!gcc -Og -g -Wall -Wextra -std=c11 /tmp/all_equal.c -o /tmp/all_equal

In [22]:
!objdump -l -d /tmp/all_equal | grep -A 20 '<equal>:'

0000000000001169 <equal>:
equal():
/tmp/all_equal.c:3
    1169:	f3 0f 1e fa          	endbr64 
/tmp/all_equal.c:4
    116d:	b8 00 00 00 00       	mov    $0x0,%eax
    1172:	eb 03                	jmp    1177 <equal+0xe>
/tmp/all_equal.c:4 (discriminator 2)
    1174:	83 c0 01             	add    $0x1,%eax
/tmp/all_equal.c:4 (discriminator 1)
    1177:	39 f8                	cmp    %edi,%eax
    1179:	7d 13                	jge    118e <equal+0x25>
/tmp/all_equal.c:5
    117b:	48 63 c8             	movslq %eax,%rcx
    117e:	44 8b 04 8a          	mov    (%rdx,%rcx,4),%r8d
    1182:	44 39 04 8e          	cmp    %r8d,(%rsi,%rcx,4)
    1186:	74 ec                	je     1174 <equal+0xb>
/tmp/all_equal.c:6
    1188:	b8 00 00 00 00       	mov    $0x0,%eax
/tmp/all_equal.c:10
    118d:	c3                   	ret    


In [16]:
! cbmc --unwinding-assertions --bounds-check --pointer-check /tmp/all_equal.c

CBMC version 5.95.1 (cbmc-5.95.1) 64-bit x86_64 linux
Parsing /tmp/all_equal.c
Converting
Type-checking all_equal
file /tmp/all_equal.c line 13 function main: function 'assert' is not declared
Generating GOTO Program
Adding CPROVER library (x86_64)
Removal of function pointers and virtual functions
Generic Property Instrumentation
Running with 8 object bits, 56 offset bits (default)
Starting Bounded Model Checking
Unwinding loop equal.0 iteration 1 file /tmp/all_equal.c line 3 function equal thread 0
Unwinding loop equal.0 iteration 2 file /tmp/all_equal.c line 3 function equal thread 0
Unwinding loop equal.0 iteration 3 file /tmp/all_equal.c line 3 function equal thread 0
Unwinding loop equal.0 iteration 4 file /tmp/all_equal.c line 3 function equal thread 0
Unwinding loop equal.0 iteration 5 file /tmp/all_equal.c line 3 function equal thread 0
Unwinding loop equal.0 iteration 6 file /tmp/all_equal.c line 3 function equal thread 0
Unwinding loop equal.0 iteration 7 file /tmp/all_equal


# PCode2C pt 2: DWARF Verification via Ghidra and CBMC

Comparative verification is a really useful paradigm

- Spec writing. It is easier for users to grok that you are compring two programs rather than bringing in a complicated logic.
- Comper correctness

My basic model of what a dwarf correctness property is is that it should be specifying optionally observable effects.

`{impl_defined} -> State -> Option State`
`{impl_defined} -> Interaction`
`type Interaction = Out Value interaction | Input (Value -> interaction)`

`type compiler = prog1 : HighCode -> (prog2 : LowCode, exec_high prog1 ~ exec_low prog2)`

What is `~` though?
Effects
