---
title: "Verified Assembly 2: Memory, RISC-V, Cuts for Invariants, and Ghost Code"
date: 2025-07-29
---

I've been hitting a good stride moving forward on this, which is good because I've been pretty mentally constipated on doing anything that I can even pretend is useful.

The idea is to make an as unopinionated as possible assembly verification system. By adding annotations inline to an assembly files, which remains GAS assemblable https://www.philipzucker.com/assembly_verify/ we can get something like a Dafny or Frama-C experience. 

Rather than going all Hoare or weakest precondition or whatever, verification is done via symbolic execution with a twist.  I hope that this directness makes it more palatable and understandable to an audience that is only mildly interested in formal. Each piece of annotation is there because I literally quite directly need it. I've tried to stay unopinionated (HA!) and have not invented any new syntax, reusing SMTLIB to describe properties. I've avoided making syntactic niceties like inferring labels, keeping it as clunky and spartan as assembly itself. 

I hope this format can be a target for both manual assembly and for compiler writers to output more info. The info should go in the binary metadata. https://www.philipzucker.com/dwarf-patching/

This week, 

- I turned on being able to talk about memory.
- I completely rearrange the symbolic executor so that is emits verification conditions rather than solves them on the spot. 
- `kd_cut` h gives us invariants.
- `kd_prelude` let's you inject smtlib `define-fun`, `define-const`, `declare-const` and `define-fun-rec` annotations which may help you write specs.
- Did light rearrangement to support other archs like riscv.

# Memory

Not too much to say here. The only thing I needed to do is inject a special variable called `ram` into the `substitute` function that turns user facing names into their internal pcode equivalents at that program state. The pcode execution model uniformly represents machine state as byte arrays, including the registers.


In [3]:
%%file /tmp/stack.s
.include "/tmp/knuckle.s"
.global  _start
kd_entry _start " true"
    movq     $42, (%rsp)
kd_exit _start_end "(= (select ram RSP) (_ bv42 8))"
    ret

Overwriting /tmp/stack.s


In [3]:
import kdrag.contrib.pcode as pcode
from kdrag.contrib.pcode.asmspec import assemble_and_check
assemble_and_check("/tmp/stack.s").successes

stmts []
Executing 0x400000/8: MOV qword ptr [RSP],0x2a at (4194304, 0) PCODE IMARK ram[400000:8]
Executing 0x400000/8: MOV qword ptr [RSP],0x2a at (4194304, 1) PCODE unique[6e80:8] = 0x2a
Executing 0x400000/8: MOV qword ptr [RSP],0x2a at (4194304, 2) PCODE *[ram]RSP = unique[6e80:8]
stmts [Exit(label='_start_end', addr=4194312, expr=ram[RSP] == 42)]
finish Exit(label='_start_end', addr=4194312, expr=ram[RSP] == 42) 0x400008


["[✅] VC(Entry(label='_start', addr=4194304, expr=True), ['0x400000'], Exit(label='_start_end', addr=4194312, expr=ram[RSP] == 42), {})"]

https://github.com/philzook58/knuckledragger/blob/0a039c4539e6dba2b03c79e1e995d14d9e081977/kdrag/contrib/pcode/__init__.py#L544 This is the substitution routine. Only substituting for constants that actually appear in the expressions was a very large optimization in terms of construction time of properties


```python
    def substitute(self, memstate: MemState, expr: smt.ExprRef) -> smt.ExprRef:
        """
        Substitute the values in memstate into the expression `expr`.
        `expr` uses the short register names and `ram`
        """
        # Much faster typically to pull only those constants that are used in the expression
        consts = {t.decl().name(): t for t in kd.utils.consts(expr)}
        substs = [
            (t, self.get_reg(memstate, regname))
            for regname, t in consts.items()
            if regname in self.ctx.registers
        ]
        if "ram" in consts:
            substs.append((smt.Array("ram", BV64, BV8), memstate.mem.ram))
        if "ram64" in consts:
            addr = smt.BitVec("addr", 64)
            substs.append(
                (
                    smt.Array("ram64", BV64, BV64),
                    smt.Lambda([addr], memstate.getvalue_ram(addr, 8)),
                )
            )

        return smt.substitute(expr, *substs)
```

Pcode uniformly represents everything as byte addressed memory. It is very useful to be able to pull out 64 and 32 bits at a time. This was a serious ergonomics problem in cbat https://github.com/draperlaboratory/cbat_tools/blob/master/wp/resources/sample_binaries/loop_invariant/break/loop_invariant.smt where we often manually concated bitvectors in the smtlib. I fixed it I think by adding special constants `ram64` which becomes a lambda which pulls out 8 bytes at once and concats them. I'll also add `ram16` and `ram32`.

In [4]:
%%file /tmp/mem.s
.include "/tmp/knuckle.s"
.global  _start
kd_entry _start "true"
    movq     $12345678, (%rsp)
kd_exit _start_end "(= (select ram64 RSP) (_ bv12345678 64))"
    ret
    

Writing /tmp/mem.s


In [1]:
from kdrag.contrib.pcode.asmspec import assemble_and_check
assemble_and_check("/tmp/mem.s").successes

stmts []
Executing 0x400000/8: MOV qword ptr [RSP],0xbc614e at (4194304, 0) PCODE IMARK ram[400000:8]
Executing 0x400000/8: MOV qword ptr [RSP],0xbc614e at (4194304, 1) PCODE unique[6e80:8] = 0xbc614e
Executing 0x400000/8: MOV qword ptr [RSP],0xbc614e at (4194304, 2) PCODE *[ram]RSP = unique[6e80:8]
stmts [Exit(label='_start_end', addr=4194312, expr=ram64[RSP] == 12345678)]
finish Exit(label='_start_end', addr=4194312, expr=ram64[RSP] == 12345678) 0x400008


["[✅] VC(Entry(label='_start', addr=4194304, expr=True), ['0x400000'], Exit(label='_start_end', addr=4194312, expr=ram64[RSP] == 12345678), {})"]

# Rearranging the SymExec

I had sort of let myself build my symexec step by step, following my nose. I realized that I wanted a couple things to change. First off, I wanted to kind of "array of structure" the thing into lists of spec statements indexed by address. This is because I want them ordered, knowing the order with which to execute assumes, assigns, asserts, etc. This is more natural than having their order undefined.

The other stuff I did was start to record all the information about what happened during execution for easier debugging later.

Another important change was to switch from eagerly discharging verification conditions to emitting a verification condition structure. This enables interactive verification.

The core routine now looks like this https://github.com/philzook58/knuckledragger/blob/0a039c4539e6dba2b03c79e1e995d14d9e081977/kdrag/contrib/pcode/asmspec.py#L301


```
def run_all_paths(
    ctx: pcode.BinaryContext, spec: AsmSpec, mem=None, verbose=True
) -> list[VerificationCondition]:
    if mem is None:
        mem = pcode.MemState.Const("mem")
    todo = []
    vcs = []
    # Initialize executions out of entry points and cuts
    for addr, specstmts in spec.addrmap.items():
        for n, stmt in enumerate(specstmts):
            if isinstance(stmt, Cut) or isinstance(stmt, Entry):
                precond = ctx.substitute(mem, stmt.expr)
                assert isinstance(precond, smt.BoolRef)
                tracestate = TraceState(
                    start=stmt,
                    trace=[],
                    state=pcode.SimState(mem, (addr, 0), [precond]),
                )
                tracestate, new_vcs = execute_spec_stmts(
                    specstmts[n + 1 :], tracestate, ctx
                )
                vcs.extend(new_vcs)
                if tracestate is not None:
                    todo.extend(execute_insn(tracestate, ctx, verbose=verbose))
    # Execute pre specstmts and instructions
    while todo:
        tracestate = todo.pop()
        addr = tracestate.state.pc[0]
        specstmts = spec.addrmap.get(addr, [])
        tracestate, new_vcs = execute_spec_stmts(specstmts, tracestate, ctx)
        vcs.extend(new_vcs)
        if tracestate is not None:
            todo.extend(execute_insn(tracestate, ctx, verbose=verbose))
    return vcs
```



# Ghost State
A thing that verifiers often need is ghost state https://hal.science/hal-01396864v1 , stuff that isn't real code that needs to execute, but is necessary for verification. It may need to track extra information that is not being retained in the real state (such as function arguments in rdi/rsi before they get clobbered)

I did this by adding a `kd_assign` statement and a `ghost_env` into my symbolic executor. The ghost code is obviously is not real code (having actual runtime meaning) because it's not assembly, it's in smtlib. I do not ever translate SMTLIB to assembly, although the idea is intriguing.

`kd_prelude` is a feature to be able to add declarations. Constants need to be declared before z3 can parse them.


In [3]:
%%file /tmp/assign.s
.include "/tmp/knuckle.s"
.globl myfunc
kd_prelude "(declare-const mytemp (_ BitVec 64))"

.text
    kd_entry myfunc "true"
    movq $42, %rax
    kd_assign mylabel mytemp "(bvadd RAX (_ bv1 64))"
    kd_exit func_end "(= mytemp (_ bv43 64))"
    ret


Writing /tmp/assign.s


In [4]:
from kdrag.contrib.pcode.asmspec import assemble_and_check
assemble_and_check("/tmp/assign.s").successes

stmts []
Executing 0x400000/7: MOV RAX,0x2a at (4194304, 0) PCODE IMARK ram[400000:7]
Executing 0x400000/7: MOV RAX,0x2a at (4194304, 1) PCODE RAX = 0x2a
stmts [Assign(label='mylabel', addr=4194311, name='mytemp', expr=RAX + 1), Exit(label='func_end', addr=4194311, expr=mytemp == 43)]
assign addr 0x400007
{}
0x400007
RAX + 1
{'mytemp': 42 + 1}
finish Exit(label='func_end', addr=4194311, expr=mytemp == 43) 0x400007


["[✅] VC(Entry(label='myfunc', addr=4194304, expr=True), ['0x400000'], Exit(label='func_end', addr=4194311, expr=mytemp == 43), {'mytemp': 42 + 1})"]

# Cuts
Maybe the most interesting but simple idea that I don't recall seeing before is using "cuts" as a way of thinking about invariants. It is obvious, so it's probably out there.

Cuts are annotations that cut the control flow graph. They both start a symbolic execution with an assumption and stop them with an assertion with the same expression. If your CFG is cut to be acyclic, there will only be finite number of trace fragments. The cut annotations in this way can be proven to be weakly self consistent. The weakness is that I do not currently prove anything about termination, which is necessary for true functional correctness.

In [5]:
%%file /tmp/cutloop.s
.include "/tmp/knuckle.s"
.global  _start
kd_entry _start "true"
    movq     $42, %rdi
kd_cut mycut "(= RDI (_ bv42 64))"
    jmp mycut

Writing /tmp/cutloop.s


In [6]:
assemble_and_check("/tmp/cutloop.s").successes

stmts []
Executing 0x400000/7: MOV RDI,0x2a at (4194304, 0) PCODE IMARK ram[400000:7]
Executing 0x400000/7: MOV RDI,0x2a at (4194304, 1) PCODE RDI = 0x2a
stmts []
Executing 0x400007/2: JMP 0x400007 at (4194311, 0) PCODE IMARK ram[400007:2]
Executing 0x400007/2: JMP 0x400007 at (4194311, 1) PCODE goto ram[400007:8]
stmts [Cut(label='mycut', addr=4194311, expr=RDI == 42)]
finish Cut(label='mycut', addr=4194311, expr=RDI == 42) 0x400007
stmts [Cut(label='mycut', addr=4194311, expr=RDI == 42)]
finish Cut(label='mycut', addr=4194311, expr=RDI == 42) 0x400007


["[✅] VC(Cut(label='mycut', addr=4194311, expr=RDI == 42), ['0x400007'], Cut(label='mycut', addr=4194311, expr=RDI == 42), {})",
 "[✅] VC(Entry(label='_start', addr=4194304, expr=True), ['0x400000'], Cut(label='mycut', addr=4194311, expr=RDI == 42), {})"]

# Risc-V

A huge advantage of using pcode is that other architectures are easy. It's all lifted to a uniform representation. Currently I have to give it the Pcode langid and the appropriate assembler for the architecture, but maybe I could infer this in the future.

In [12]:
%%file /tmp/mov42.s
.include "/tmp/knuckle.s"
    .text
    .globl  myfunc
kd_entry myfunc "true"
    li    a0, 42
kd_exit myfunc_end "(= a0 (_ bv42 64))"
    ret

Writing /tmp/mov42.s


In [13]:
assemble_and_check("/tmp/mov42.s", langid="RISCV:LE:64:default", as_bin="riscv64-linux-gnu-as").successes

stmts []
Executing 0x400000/4: li a0,0x2a at (4194304, 0) PCODE IMARK ram[400000:4]
Executing 0x400000/4: li a0,0x2a at (4194304, 1) PCODE unique[780:8] = 0x2a
Executing 0x400000/4: li a0,0x2a at (4194304, 2) PCODE a0 = unique[780:8]
stmts [Exit(label='myfunc_end', addr=4194308, expr=a0 == 42)]
finish Exit(label='myfunc_end', addr=4194308, expr=a0 == 42) 0x400004


["[✅] VC(Entry(label='myfunc', addr=4194304, expr=True), ['0x400000'], Exit(label='myfunc_end', addr=4194308, expr=a0 == 42), {})"]

Pypcode ships with a bunch. avr8 for arduino stuff, ebpf, 6502 for NES, etc could make for fun posts. I think there are even more out there like a wasm https://github.com/nneonneo/ghidra-wasm-plugin . Not so sure how I'd load these into pypcode though.

In [14]:
! python3 -m pypcode --list

6502:LE:16:default                  - 6502 Microcontroller Family
65C02:LE:16:default                 - 65C02 Microcontroller Family
68000:BE:32:Coldfire                - Motorola 32-bit Coldfire
68000:BE:32:MC68020                 - Motorola 32-bit 68020
68000:BE:32:MC68030                 - Motorola 32-bit 68030
68000:BE:32:default                 - Motorola 32-bit 68040
6805:BE:16:default                  - 6805 Microcontroller Family
6809:BE:16:default                  - 6809 Microprocessor
80251:BE:24:default                 - 80251 Microcontroller Family
80390:BE:24:default                 - 80390 in flat mode
8048:LE:16:default                  - 8048 Microcontroller Family
8051:BE:16:default                  - 8051 Microcontroller Family
8051:BE:24:mx51                     - NXP/Phillips MX51
8085:LE:16:default                  - Intel 8085
AARCH64:BE:32:ilp32                 - Generic ARM64 v8.5-A LE instructions, BE data, ilp32
AARCH64:BE:64:v8A                   - Generic AR

# Bits and Bobbles

TODO:
- better countermodel presentation
- 32 bit arch
- kd_load for constants that need to be loaded into ram
- Make CLI. 
- objcopy into biary data
- kd_always
- Checking memory reads and writes aren't really state functions. Hmm.
- equivalence checking
- bigger more interesting examples. simd?


It would be interesting to make a rust version that uses cbmc stuff as solver backend. A lot of work for questionable gains.


- Ghost state is now available via `kd_assign`. This let's you store information that may be clobbered in the actual state.


getvalue RAX and simplify. before subsitution. Oh I already dio that... Hmm.

Maybe if I set init_RAX
Take the assert out of the smtlib statements. It's weird to have it in an assume
Versions that don't need conditions?

exit and entry are kind of subsumed. No but they aren't if I want 

If I ever jump to something marked as entry, don't do it. Just use entry condition, chaos, and receive exit condition. Or just run, but also receive exit condition?

Faster checks for true / false. DOn't bother doing a whole thing... right?


Mark out the pieces of memory that need to be loaded into memory for this to work.

kd_load foo: .ascii "hello world"
kd_load

should memstate have a pointer to a context? Maybe.

Also initialize all symbols that point to data to at least... eh.


mem ---> Lambda([x], mem(yada)[x])


- fixup to work on riscv and 32 bit.
- nicer error output. Negative examples
- Retain initial ghost variables. kd_ghost? kd_let ?
- Countermodels interpret to be more readable. Give path through program, state at beginning and end.
- Use abstractions so the intermidate symbolic states aren't quite so unreadable. Or initialize mem to RAX0. Too slow though. Hmm.
- Hoarize. I've had so many bugs this principled approach might be pretty useful.
- prearranged loaded data. Mem does not currently reflect the actual contents of the loading. kd_load ?
- pre and post conditions on calls? if jump into address already.
- objcopy assertions into section
- Pro / Con of WP?
- kd_always for constantly checked invariants
- A whitelist of jump points. Only allow labelled positions?
- Regular execution or GDB sessions. All the cbat features.
- failsafe. If I grep kd_ in the line, but don't recognize it, I should complain. Multiline smtlib would be nice too.
- The stdlib of helpers.
- speed
- Tracking seen addresses. We may want to track if we never hit a label that has an annotation on it, as this is suspicious.
- it could make sense to slot in AFL or angr or Symqemu or whatever instead of my system, checking the same properties. I have designed my semantics to be moderately sound
- ram8;    ram32 = Lambda([addr], selectconcat32(ram8, addr))   ram32 ram64
- comparative checking between two assembly programs


done:
- memory. Gotta inject a `mem` variable.
- Gotta test that stuff I put in the library
- cuts. Invariants are cuts of cfg into a dag. Distinguishing backedges from forward edges
- We should probably do asserts, assumes, assigns in program text order if multiple at single address. 
- store history just to make debugging easier.


cbmc / ebmc options to mimic? seelct entry points to test. select asssertions to test
bounded mode'


flag for eager VC or timeout param.

Grab just variables in highlevel assertion to present countermodel.

Hmm it would actually be easy to add a kd_label and let it persist through the parsing loop... Or just parse labels. There _has_ to have been a label in the contiguous previous lines


call tool `asmc`, the "language asmspec


In [17]:
%%file /tmp/knuckle.s

#precondition
.macro kd_entry label smt_expr
\label :
.endm

.macro kd_assert label smt_expr
\label :
.endm

.macro kd_assume label smt_expr
\label :
.endm

#postcondition
.macro kd_exit label smt_expr 
\label :
.endm

#invariant
.macro kd_cut label smt_expr
\label :
.endm 

.macro kd_always smt_expr
.endm


Overwriting /tmp/knuckle.s


In [None]:
%%file /tmp/stack.s
.include "/tmp/knuckle.s"
.global  _start
kd_entry _start " true"
    movq     $42, (%rsp)
kd_exit _start_end "(= (select ram RSP) (_ bv42 8))"
    ret

Overwriting /tmp/stack.s


In [1]:
from kdrag.all import *
import kdrag.contrib.pcode as pcode
from kdrag.contrib.pcode.asmspec import assemble_and_check
assemble_and_check("/tmp/stack.s")


Z3Exception: b'(error "line 2 column 12: unknown constant assert (Bool) ")\n'

# cut

cut, we maybe want post and pre program point
pre_insn1
exec_insn1
post_insn1
pre_insn2
...

examples:
VST book https://softwarefoundations.cis.upenn.edu/vc-current/toc.html
dafny book
frama-c examples / book https://link.springer.com/book/10.1007/978-3-031-55608-1 

sum-array
hmm. I'm worried that we'll need auxiliary predicates.
I could define them in assembly?

kd_prelude  "(define-fun foo ( )  )"
kd_declare  "(declare-const )"   # kdprelude "(declare-const mytemp (BitVec 64))"
kd_assign "mytemp" expr
kd_declare

"(define-const flub (RAX))"  Hmm. This might auto expand. I think so. So you can have a running summary in thisa way
But kd_assign gives you 

Maybe i should be worried more about the embedded python experience. Getting lemmas in there.
lemmas = dict[label, list[kd.Proof]]  Offer needed extra bits.

Or have the thing output a pile of verification conditions in Results.

This is getting verbose. I'm just delaying figuring out what I want to do. But it is also flexibility.

class Trace()
    events : list[Event]
    def VCs(self):

class TraceForest():

class Jump():
    TraceForest

class Event():
    pass
class Execute(Event): ...
class Assert(Event): ...
class Assume(Event):
class Entry():
class 

T





In [2]:
%%file /tmp/cut.s
.include "/tmp/knuckle.s"
.global  _start
kd_entry _start "true"
    movq     $42, %rdi
kd_cut mycut "(= RDI (_ bv42 64))"
    movq     %rdi, %rax
kd_exit _start_end "(= RAX (_ bv42 64))"
    ret

Overwriting /tmp/cut.s


In [2]:
from kdrag.all import *
import kdrag.contrib.pcode as pcode
from kdrag.contrib.pcode.asmspec import assemble_and_check
assemble_and_check("/tmp/cut.s")

{
  "entries": {
    "4194304": [
      [
        "_start",
        "true"
      ]
    ]
  },
  "asserts": {},
  "assumes": {},
  "exits": {
    "4194314": [
      [
        "_start_end",
        "(= RAX #x000000000000002a)"
      ]
    ]
  },
  "cuts": {
    "4194311": [
      [
        "mycut",
        "(= RDI #x000000000000002a)"
      ]
    ]
  }
}
entry _start at 4194304 with precond True
cut mycut at 4194311 with precond RDI == 42
Executing 0x400007/3: MOV RAX,RDI at (4194311, 0) PCODE IMARK ram[400007:3]
Executing 0x400007/3: MOV RAX,RDI at (4194311, 1) PCODE RAX = RDI
[✅] exit _start_end: RAX == 42
Executing 0x400000/7: MOV RDI,0x2a at (4194304, 0) PCODE IMARK ram[400000:7]
Executing 0x400000/7: MOV RDI,0x2a at (4194304, 1) PCODE RDI = 0x2a
[✅] cut mycut: RDI == 42


Results(successes=['[✅] exit _start_end: RAX == 42', '[✅] cut mycut: RDI == 42'], failures=[], traces=[[(4194311, 0), '[✅] exit _start_end: RAX == 42'], [(4194304, 0), '[✅] cut mycut: RDI == 42']])

In [3]:
%%file /tmp/cutloop.s
.include "/tmp/knuckle.s"
.global  _start
kd_entry _start "true"
    movq     $42, %rdi
kd_cut mycut "(= RDI (_ bv42 64))"
    jmp mycut

Writing /tmp/cutloop.s


In [1]:
from kdrag.all import *
import kdrag.contrib.pcode as pcode
from kdrag.contrib.pcode.asmspec import assemble_and_check
assemble_and_check("/tmp/cutloop.s")

{
  "entries": {
    "4194304": [
      [
        "_start",
        "true"
      ]
    ]
  },
  "asserts": {},
  "assumes": {},
  "exits": {},
  "cuts": {
    "4194311": [
      [
        "mycut",
        "(= RDI #x000000000000002a)"
      ]
    ]
  }
}
entry _start at 4194304 with precond True
cut mycut at 4194311 with invariant RDI == 42
Executing 0x400007/2: JMP 0x400007 at (4194311, 0) PCODE IMARK ram[400007:2]
Executing 0x400007/2: JMP 0x400007 at (4194311, 1) PCODE goto ram[400007:8]
[✅] cut mycut: RDI == 42
Executing 0x400000/7: MOV RDI,0x2a at (4194304, 0) PCODE IMARK ram[400000:7]
Executing 0x400000/7: MOV RDI,0x2a at (4194304, 1) PCODE RDI = 0x2a
[✅] cut mycut: RDI == 42


Results(successes=['[✅] cut mycut: RDI == 42', '[✅] cut mycut: RDI == 42'], failures=[], traces=[[(4194311, 0), '[✅] cut mycut: RDI == 42'], [(4194304, 0), '[✅] cut mycut: RDI == 42']])

In [None]:
%%file /tmp/cutloop.s
.include "/tmp/knuckle.s"
.global  _start
kd_entry _start "true"
    movq     $42, %rdi
    lea      1(%rdi), %rax
kd_cut mycut "(bvule RDI (_ bv42 64))"
    add     $1, %rax
    jmp mycut


# riscv


In [6]:
%%file /tmp/mop42.s
.include "/tmp/knuckle.s"
    .text
    .globl  myfunc
kd_entry myfunc "(assert true)"
    li    a0, 42
kd_exit myfunc_end "(assert (= a0 (_ bv42 64)))"
    ret

Overwriting /tmp/mop42.s


In [24]:
! riscv64-linux-gnu-gcc -c -o /tmp/mop42.o /tmp/mop42.s

In [25]:
%%file /tmp/test.c
#include <stdio.h>
#include <stdint.h>
uint64_t myfunc();
int main() {
    printf("Result is %lu\n", myfunc());
    return 0;
}

Overwriting /tmp/test.c


In [26]:
! riscv64-linux-gnu-gcc /tmp/test.c /tmp/mop42.o -o /tmp/test && qemu-riscv64 -L /usr/riscv64-linux-gnu /tmp/test

Result is 42


In [5]:
%load_ext jupyterflame

In [2]:
#%%prun -s tottime
import subprocess
import kdrag.contrib.pcode as pcode
from kdrag.contrib.pcode.asmspec import AsmSpec, run_all_paths
def assemble_and_check_riscv64(filename: str):
    subprocess.run(["riscv64-linux-gnu-as", filename, "-o", "/tmp/kdrag_temp.o"], check=True)
    ctx = pcode.BinaryContext("/tmp/kdrag_temp.o", langid="RISCV:LE:64:default")
    spec = AsmSpec.of_file(filename, ctx)
    print(spec)
    return run_all_paths(ctx, spec)
assemble_and_check_riscv64("/tmp/mop42.s")

{
  "entries": {
    "4194304": [
      [
        "myfunc",
        "true"
      ]
    ]
  },
  "asserts": {},
  "assumes": {},
  "exits": {
    "4194308": [
      [
        "myfunc_end",
        "(= a0 #x000000000000002a)"
      ]
    ]
  },
  "cuts": {}
}
entry myfunc at 4194304 with precond True
Executing 0x400000/4: li a0,0x2a at (4194304, 0) PCODE IMARK ram[400000:4]
Executing 0x400000/4: li a0,0x2a at (4194304, 1) PCODE unique[780:8] = 0x2a
Executing 0x400000/4: li a0,0x2a at (4194304, 2) PCODE a0 = unique[780:8]
[✅] exit myfunc_end: a0 == 42


Results(successes=['[✅] exit myfunc_end: a0 == 42'], failures=[], traces=[[(4194304, 0), '[✅] exit myfunc_end: a0 == 42']])

# 32 bit
little endian and big endian




# Bits and Bobbles
https://github.com/WestfW/structured_gas



# reorg

    checks: defaultdict[int, list[HasCheck]] = dataclasses.field(
        default_factory=lambda: defaultdict(list)
    )


In [1]:
# %%prun -s tottime
from kdrag.contrib.pcode.asmspec import AsmSpec, run_all_paths, assemble_and_check_str

ex = """
    .include "/tmp/knuckle.s"
    .globl  _start
    kd_entry _start "true"
            movq     %rdi, %rax
            cmp     %rdi, %rsi
            cmovb   %rsi, %rax
    kd_exit _start_end "(= RAX (ite (bvult RDI RSI) RDI RSI))"
    #kd_exit _start_end "(or (= RAX RDI) (= RAX RSI))"
            ret
"""
assemble_and_check_str(ex)



{
  "addrmap": {
    "4194304": [
      {
        "label": "_start",
        "addr": 4194304,
        "expr": "true"
      }
    ],
    "4194314": [
      {
        "label": "_start_end",
        "addr": 4194314,
        "expr": "(= RAX (ite (bvult RDI RSI) RDI RSI))"
      }
    ]
  },
  "entries": {
    "4194304": [
      [
        "_start",
        "true"
      ]
    ]
  },
  "asserts": {},
  "assumes": {},
  "exits": {
    "4194314": [
      [
        "_start_end",
        "(= RAX (ite (bvult RDI RSI) RDI RSI))"
      ]
    ]
  },
  "cuts": {}
}
Executing 0x400000/3: MOV RAX,RDI at (4194304, 0) PCODE IMARK ram[400000:3]
Executing 0x400000/3: MOV RAX,RDI at (4194304, 1) PCODE RAX = RDI
Executing 0x400003/3: CMP RSI,RDI at (4194307, 0) PCODE IMARK ram[400003:3]
Executing 0x400003/3: CMP RSI,RDI at (4194307, 1) PCODE unique[3af00:8] = RSI
Executing 0x400003/3: CMP RSI,RDI at (4194307, 2) PCODE CF = unique[3af00:8] < RDI
Executing 0x400003/3: CMP RSI,RDI at (4194307, 3) PCODE OF = sbor

Results(successes=["[✅] VC(Entry(label='_start', addr=4194304, expr=True), ['0x400000', '0x400003', '0x400006'], Exit(label='_start_end', addr=4194314, expr=RAX == If(ULT(RDI, RSI), RDI, RSI)))", "[✅] VC(Entry(label='_start', addr=4194304, expr=True), ['0x400000', '0x400003', '0x400006'], Exit(label='_start_end', addr=4194314, expr=RAX == If(ULT(RDI, RSI), RDI, RSI)))"], failures=[], traces=[])

In [None]:
import pypcode
ctx = pypcode.Context("RISCV:LE:64:default")
#ctx.registers
# Trim mem_init to just the good stuff somehow?
# Maybe in archinfo
{name: vnode for name, vnode in ctx.registers.items() if (vnode.size == 8 or vnode.size == 4) and len(name) <= 3}

{'pc': <pypcode.pypcode_native.Varnode at 0x76b63422a550>,
 'ra': <pypcode.pypcode_native.Varnode at 0x76b60ba68360>,
 'sp': <pypcode.pypcode_native.Varnode at 0x76b60ba68810>,
 'gp': <pypcode.pypcode_native.Varnode at 0x76b60ba685a0>,
 'tp': <pypcode.pypcode_native.Varnode at 0x76b60ba68480>,
 't0': <pypcode.pypcode_native.Varnode at 0x76b60ba68c30>,
 't1': <pypcode.pypcode_native.Varnode at 0x76b60ba68e40>,
 't2': <pypcode.pypcode_native.Varnode at 0x76b60ba68c00>,
 's0': <pypcode.pypcode_native.Varnode at 0x76b60ba68cc0>,
 's1': <pypcode.pypcode_native.Varnode at 0x76b60ba68d50>,
 'a0': <pypcode.pypcode_native.Varnode at 0x76b60ba68d20>,
 'a1': <pypcode.pypcode_native.Varnode at 0x76b60ba68d80>,
 'a2': <pypcode.pypcode_native.Varnode at 0x76b60ba68ed0>,
 'a3': <pypcode.pypcode_native.Varnode at 0x76b60ba69200>,
 'a4': <pypcode.pypcode_native.Varnode at 0x76b60ba68c90>,
 'a5': <pypcode.pypcode_native.Varnode at 0x76b60ba68f90>,
 'a6': <pypcode.pypcode_native.Varnode at 0x76b60ba68f00

In [3]:
from kdrag.all import *
import kdrag.contrib.pcode as pcode

ctx = pcode.BinaryContext()
memstate = ctx.init_mem()
ctx.get_reg(memstate, "RAX")



In [None]:
! sudo apt install gcc-riscv64-linux-gnu g++-riscv64-linux-gnu  binutils-riscv64-linux-gnu

In [16]:
import cle
loader = cle.loader.Loader("/tmp/myfunc.o")
dir(loader.all_elf_objects[0].arch)

['__annotations__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_configure_capstone',
 '_configure_keystone',
 '_cs',
 '_cs_x86_syntax',
 '_get_register_dict',
 '_ks',
 '_ks_x86_syntax',
 'address_types',
 'argument_register_positions',
 'argument_registers',
 'artificial_registers',
 'artificial_registers_offsets',
 'asm',
 'bits',
 'bp_offset',
 'branch_delay_slot',
 'byte_width',
 'bytes',
 'cache_irsb',
 'call_pushes_ret',
 'call_sp_fix',
 'capstone',
 'capstone_support',
 'capstone_x86_syntax',
 'concretize_unique_registers',
 'copy',
 'cpu_flag_register_offsets_and_bitmasks_map',
 'cs_arch',
 'cs_mode',
 'default_endness',
 