zisp: Compile-Time PEG Experiments in Zig

zisp is a proof of concept that asks how far Zig's compile-time machinery and the new labeled switch continue syntax can push parser generation. The project starts from high-level PEG (Parsing Expression Grammar) declarations and lowers them, at compile time, into tightly-specialized VM loops that read more like hand-written interpreters than generic parser combinators.

Why this exists

The repository doubles as a playground for a few ideas:

comptime-driven codegen – Grammar rules are analysed and expanded during compilation, producing concrete bytecode tables and AST layouts before the program ever runs.
Switch-label continue – The VM core relies on Zig 0.15's ability to continue :vm next_ip directly from inside nested control flow, giving a threaded-interpreter style loop without manual gotos.
Runtime that still feels ergonomic – Even with all the specialization, the public API stays close to "declare a grammar, parse a buffer, walk a typed AST".
Transparency of the generated code – We want to be able to inspect the lowered form easily (LLVM IR, assembly, AST dumps) and reason about the cost model.

Repo layout

src/peg.zig – Grammar DSL, compile-time compilation of PEG rules, and AST helpers.
src/vm.zig – The bytecode interpreter/VM with loop-mode execution using labeled switch continue.
src/main.zig – CLI harness that exercises the parser and prints traces/ASTs.
docs/vm-loop-llvm.md – Walkthrough of how to force Zig/LLVM to emit the specialized loop for demoGrammar.
vm_loop_demo.zig – Minimal driver used by the docs to instantiate the VM in isolation.

Getting started

You need Zig 0.15.1 or newer (the build script uses the labeled-continue feature). The usual workflow:

zig build run            # build the CLI and run it
zig build test           # run the grammar + VM unit tests

The CLI parses a miniature Zig subset (src/zigmini). Today that grammar still rides on the older pegvm.zig backend simply because it hasn't been ported over yet, but the shape mirrors the new peg.zig + vm.zig pipeline. For a quick feel of the existing system, run zig run src/peg.vm—that’s the main entry point that prints the bytecode, step trace, and AST using the original VM. Try passing --dump-pegcode for a readable dump of the generated bytecode.

Sample `zig run src/peg.zig`

Running the grammar module directly prints the compiled bytecode, a step-by-step trace for a demo input, and the resulting typed forest:

$ zig run src/peg.zig

&Value:
   0 push ->3
   1 call ->5
   2 drop ->4
   3 call ->15
   4 done

&Integer:
   5 open
   6 read 1..9
   7 next
   8 open
   9 read 0..9*
  10 shut
  ...

Parsing: "[[1] [2]]"

[ | 0000 push ->3
  | 0001 call ->5
  |-| 0005 open
  ...

✓ (156 steps)
Array [0..16) "[[1] [2] [4096]]"
  └─values: 3 items
    ├─[0] Value: .array -> Integer d='1'
    ├─[1] Value: .array -> Integer d='2'
    └─[2] Value: .array -> Integer d='4', ds="096"

Forest shape

The VM builds a "typed forest": every grammar rule owns a dedicated growable array, and siblings for a rule end up stored contiguously. That layout makes it cheap to gather a rule’s results and to reinterpret slices as strongly-typed structs/unions when you walk the AST later. In the demo run the root rule is Array, whose values field is emitted as a Kleene list of Value nodes; each Value lowers to either an Integer or another Array, and you can see the nesting clearly in the forest dump:

Array:
  └─values: 3 items
    ├─[0] Value: .array
    │ └─Array:
    │   └─values: 1 items
    │     └─[0] Value: .integer
    │       └─Integer:
    │         ├─d: '1' [2]
    │         └─ds: (empty)
    ├─[1] Value: .array
    │ └─Array:
    │   └─values: 1 items
    │     └─[0] Value: .integer
    │       └─Integer:
    │         ├─d: '2' [6]
    │         └─ds: (empty)
    └─[2] Value: .array
      └─Array:
        └─values: 1 items
          └─[0] Value: .integer
            └─Integer:
              ├─d: '4' [10]
              └─ds: "096" [11..14)

The full trace (with detailed stack annotations and AST layout) is available any time you want to sanity-check how a grammar runs.

Inspecting the generated code

To look directly at the loop-mode codegen for the included demoGrammar, follow the steps in docs/vm-loop-llvm.md. The short version:

zig build-exe vm_loop_demo.zig \
    -O ReleaseFast -fllvm \
    -femit-llvm-ir=zig-out/vm_loop_demo.ll \
    -femit-asm=zig-out/vm_loop_demo.s

The emitted .ll and .s highlight how the interpreter turns into a computed-goto state machine with literal bitsets for character classes.

How specialization actually looks

Because the VM bytecode is baked during comptime, the “interpreter” that ships in the binary already knows the exact instruction stream. VM(G).next gets monomorphized for the grammar, the opcode array becomes a constant, and the main loop lowers to one giant switch/jump-table keyed on the instruction pointer. In other words we don’t even switch on an opcode enum at runtime; we switch on the literal IP and jump straight to the inlined code for that specific instruction. A toy sketch of the shape you get looks like this:

// Pseudocode, but this is the flavour LLVM ends up with.
vm: switch (ip) {
    0 => { // read '['
        if (self.text[self.sp] != '[') return error.ParseFailed;
        self.sp += 1;
        continue :vm 1;
    },
    1 => { // call Skip rule
        try self.calls.append(.{ .return_ip = 2, .target_ip = 31, ... });
        continue :vm 31;
    },
    2 => { // next field, etc.
        ...;
        continue :vm 3;
    },
    else => return;
}

Every case carries the rule metadata, call targets, character sets, and struct bookkeeping as compile-time constants. In release builds the control flow resembles an assembler hand-written threaded interpreter for a program that was known when you built the binary. The deep dive in docs/vm-loop-llvm.md shows the LLVM view, but even at the Zig level you can reason about the VM as a tightly unrolled state machine specialized to the grammar you compiled.

Project status

This is intentionally exploratory code. Expect breakage, rapid refactors, and plenty of TODOs around:

Enriching the grammar DSL with more PEG operators.
Experimenting with alternative backends (direct threaded code vs VM bytecode).
Measuring performance against other PEG implementations.
Refining the AST representation to reduce allocations.

If you're curious about a specific angle—memoization strategies, labelled-switch ergonomics, or further comptime tricks—open an issue or hack on a branch. The more weird experiments, the better.

License

MIT. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
docs		docs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.zig		build.zig
build.zig.zon		build.zig.zon
test.zig		test.zig
vm_loop_demo.zig		vm_loop_demo.zig

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

zisp: Compile-Time PEG Experiments in Zig

Why this exists

Repo layout

Getting started

Sample `zig run src/peg.zig`

Forest shape

Inspecting the generated code

How specialization actually looks

Project status

License

About

Uh oh!

Releases

Packages

Languages

License

floatdrop/zisp

Folders and files

Latest commit

History

Repository files navigation

zisp: Compile-Time PEG Experiments in Zig

Why this exists

Repo layout

Getting started

Sample zig run src/peg.zig

Forest shape

Inspecting the generated code

How specialization actually looks

Project status

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Sample `zig run src/peg.zig`

Packages