Formal verification of the Levenshtein (edit) distance in Rocq, including a proof that a C implementation refines a verified functional model using the Verified Software Toolchain (VST).
The development is structured in three layers, each proved equivalent to the next:
- an intrinsically-correct recursive model, whose dependently-typed definition carries its own optimality proof;
- a dynamic-programming model in the Wagner–Fischer style, proved to compute the same value as the recursive model; and
- a C implementation (
levenshtein.c), proved with VST to refine the dynamic-programming model.
theories/Levenshtein_recursive.v defines edit scripts as indexed Rocq types:
edit s tis one insertion, deletion, or non-equal-character update fromstot.chain s t nis an edit script fromstotwith exactlyncharged edits; equal heads are skipped at no cost.levenshtein_chain s tcomputes both a distance and a witness edit script.levenshtein_recursive s tis the numeric distance extracted from that witness.
The main theorem, levenshtein_recursive_is_minimal, proves that the computed
distance is minimal: for every edit script chain s t n,
levenshtein_recursive s t <= n. Since levenshtein_chain also returns a
witness script with exactly that distance, the recursive model computes the true
Levenshtein distance.
theories/Levenshtein_dp.v defines levenshtein_dp, a Wagner-Fischer-style
dynamic-programming implementation over Rocq strings. It also contains
index-based cache and loop-state lemmas used by the C proof.
The main theorem, levenshtein_dp_eq_levenshtein_recursive, proves that for all
strings s and t, levenshtein_dp s t = levenshtein_recursive s t. The DP
proof first shows that the left-to-right cache traversal computes the recursive
model on reversed inputs, then uses reversal invariance of the recursive
distance to remove the reversals.
theories/Verif_levenshtein.v proves the generated Clight body of levenshtein_n
against a VST function specification. The specification interprets the input
byte arrays as Rocq strings with bytes_to_string, requires the usual pointer
and size_t bounds, and states that the returned size_t is exactly:
Levenshtein.levenshtein_recursive
(bytes_to_string a)
(bytes_to_string b)The proof connects the C loops to the DP cache model, then uses
levenshtein_dp_eq_levenshtein_recursive to conclude that the C result is the
intrinsic Levenshtein distance.
| File | Description |
|---|---|
theories/Levenshtein_recursive.v |
Intrinsic edit-script model and proof that levenshtein_recursive is minimal among all edit scripts. |
theories/Levenshtein_dp.v |
Wagner-Fischer dynamic-programming model levenshtein_dp, proved equal to levenshtein_recursive. |
levenshtein.c |
The C implementation that is verified. |
theories/levenshtein.v |
CompCert Clight AST generated from levenshtein.c (via clightgen). |
theories/Verif_levenshtein.v |
VST proof that levenshtein_n returns the intrinsic recursive distance for the input byte arrays. |
All .v files live under theories/ and form the Coq theory EditDistance,
so modules are referenced as EditDistance.Levenshtein_dp, etc.
- Rocq /
coq-core≥ 9.0 - VST ≥ 2.16 (which bundles CompCert and Flocq)
- dune ≥ 3.21
These can be installed with opam:
opam install dune coq-vstThe project is built with dune; the Makefile is a thin
frontend to it.
make # dune build
make clean # dune clean + git clean
make install # dune installEquivalently, run dune build directly. Build artifacts go under _build/.
theories/levenshtein.v is generated from levenshtein.c and should not be edited by hand:
clightgen -normalize -o theories/levenshtein.v levenshtein.cThis proof development was carried out with assistance from Claude and Codex. The minimality proof and the dynamic programming implementation and proof was done by Charles C. Norton.