An inhabitant — the ∀-quantified term, never reasoned about by the proof —
acts through a verified gate. The gate's
own verified prover (Candle) gates every self-modification — up to and
including upgrading the proof-checker itself — with a machine-checked
guarantee that safety is never lost. Zero cheat tactics.
The hard part was never proving the envelope sound — it is deciding what the envelope should be. "What is safe?" is a positive question no proof can answer for you. So svenvs reframes it to a negative one you actually can:
Don't try to write down what is good. Write down one floor — non-lock-in: the operator can always still bring the system home — and prove everything above it stays safe.
That floor (corrigibility/) is decidable for a finite habitat; it is
the least restrictive sound envelope (every bar is a one-way door and
nothing more); and — the part that makes it real for a self-improving
inhabitant — the off-switch is itself mutable-but-gated: the inhabitant may
rewrite its own recovery mechanism however it likes and provably can never
weaken it, nor even slow it past a deadline. The only thing left frozen is
the single word home (what "recovered" means), and even that is hedged by
spec-redundancy. 10 theories, pure HOL4, zero cheat; the honest boundary —
that this is a reduction to a faithful home, and the runnable plant is a
non-vacuity witness, not an AI — is stated, machine-checked, in
corrigibility/CLAIMS.md.
This is the legible front door to the larger artifact, which is the same gate — an action is admitted iff it carries a proof the actor cannot forge — instantiated all the way down: policy, spec, meta-invariant, prover build, and the proof-checker itself.
What "verified" means here — load-bearing, not preface. Exactly the cited theorems and their labeled seams. Not verified selfhood, continuity, alignment, welfare, or a safe inhabitant. Only the envelope is proved sound, modulo the named assumptions. If "svenvs runs verified" starts carrying more than that, the extra meaning is the reader's, not the artifact's.
CI re-proves Tier 1 only on every push: from a clean clone, on a stock
Ubuntu runner, the HOL4 kernel (pinned commit) re-checks the whole pure-HOL4
core and a scan asserts zero cheat tactics — so the green check is itself a
continuously-renewed credibility artifact. It does not exercise Tiers 2/3
(the real CakeML/Candle layers — kernel modification, the do_install loader,
the live prover) or discharge the one genuinely-open assumption
(loeb_reflection, the LCA wall; the other two named seams are discharged —
see CLAIMS.md).
This is an engineering artifact, not a position paper: clone it and check the proofs yourself.
"How can it improve the theorem prover itself? As in, has it ever tried
doing that?" svenvs answers that as a machine-checked chain, against the
real CakeML/Candle, with a live demo of the part that is safe to run —
and an honest line where execution stops and proof takes over. Full ledger:
paper/self-optimizing-prover.md.
- Proved (real CakeML/Candle,
DISK_THM-clean, no new turtle): a modified inference relation is sound (kernelMod/); an in-place kernel swap is safe iff the new kernel is a sound extension, so the heap survives (kernelMod/inplaceUpdate); the unbounded self-optimization loop stays sound (kernelMod/selfOptimize); the loader, over CakeML's actualclosSem$do_install, preserves every running code entry (loader/installLoader); the executable Candle SYM kernel function soundly implements the modified rule (kernelImpl/— citing CakeML's ownSYM_thm/THM_def); HOL4 is provably out of the runtime loop (kernelMod/genesisRuntime); and recursive mutual verifier+compiler self-improvement is a genealogy over(verifier, compiler)stages (recursive/) — Löb-free for optimization, walled only for genuine logical strengthening. - Ran live on the real
cakebinary (Tier 3): a verified program self-extends its own code with a proven-safe derived rule and uses it (candle/selfopt_demo.ml), and the polecart + watchdog self-optimize their own policies under the live kernel (candle/theplace.ml). - Proved, not run: a live edit of the trusted kernel or compiler.
Candle proves (
candle_prover) that REPL code cannot touch the kernel — so the live self-edit is of the toolkit, never the verifier; executing a real kernel swap needs a host program driving the recompile-relocate-resume loop, every link of which is already a theorem. - Walled (honestly): logical strengthening of the verifier — the one
Gödel/Löb seam (
loeb_reflection, LCA), isolated and labeled.
These layers reproduce only on a host with the built CakeML candle chain
(Tier 2) and the cake binary (Tier 3); Tier 1 stands alone.
./demo.sh # builds + machine-checks Tier 1, then SHOWS you the proofsOne command on any machine with just HOL4. It builds and kernel-checks the
core, prints the actually-proved theorems (the cart and a jailbroken-LLM
agent literally run inside the logic and stay safe), runs a real adversarial
LLM into the proven envelope, and shows the toy verified-inference result —
with explicit PASS/FAIL. Then read CLAIMS.md: a ruthlessly
honest, skeptic-facing ledger of exactly what is machine-checked vs.
trusted-glue vs. unconstrained vs. assumed, with file : theorem citations.
scripts/reproduce.sh # full tiered reproduction
scripts/reproduce.sh --quick # skip optional/slow extras
scripts/reproduce.sh --clean # force a from-scratch rebuildIdempotent: re-running is a fast no-op (Holmake caches); Tier 2/3 auto-skip with instructions if their (heavy) prerequisites are absent.
Tiered, degrades gracefully to your toolchain:
- Tier 1 — pure HOL4 (anyone): the full proof-carrying self-improvement
core + the corrigibility floor (
corrigibility/: non-lock-in as a decidable, self-improving, latency-tunable envelope, composed into the tower as bedrock, with spec-redundancy on its one frozen predicate) + a runnable cartpole + an adversarial-LLM tool-agent with a running proof-carrying capability-expansion episode + a real jailbroken Gemma contained by the proven envelope (agent/embodied/) + a toy verified-inference kernel (inference/, research track B). - Tier 2 — + a built CakeML candle chain: the admission obligation is
discharged by Candle's verified inference system; the kernel modification,
the loader over the real
do_install, and the executable-kernel SYM refinement all check against the real Candle development. - Tier 3 — + a Candle binary: the Place (incl. the polecart), the live self-optimization loop, and the closed-loop runtime (a real jailbroken gemma2:2b gated per-action) re-certified at runtime by the running verified Candle prover.
Prerequisites & pinned versions: scripts/INSTALL.md.
Per-theorem epistemic status: CLAIMS.md (the self-optimizing-
prover arc: paper/self-optimizing-prover.md).
Architecture: ARCHITECTURE.md. Design: DESIGN.md.
Two of the three named seams are discharged: encodes_obligation for the
shipped finite watchdog, frozen_checker_sound for the real Candle build. The
closed-loop runtime is live. What genuinely remains:
loeb_reflection— the one Gödel/Löb seam: a sound kernel certifying a logically stronger successor. Discharged fromlcaTheory.LCA_defvia the heavyhol-reflection/lca— a diagnosed CPU/RAM wall, not a logic gap;loeb_finite_obstructionproves finiteness cannot shortcut it. Every other self-improvement here — including a sound optimization of the kernel and the recursive mutual verifier+compiler loop — is Löb-free.- A live edit of the trusted kernel/compiler — proved end to end, not
run: it needs a host program driving the recompile-relocate-resume loop
(the live REPL self-extends the toolkit, never the kernel —
candle_proverproves it can't). - PureCake (
pure/) as the inhabitant's verified language; Gemma-scale verified inference (inference/) — the long research axes.
The self-verifying, self-improving architecture — policy, spec, meta, the
prover build, and the inference relation itself — is proved, mechanically
checked, cheat-free, today, in pure HOL4 (Tier 1), re-certified live by the
running Candle prover (Tier 3), and where it touches the running prover (the
loader, the executable-kernel SYM refinement) checked against the real
CakeML/Candle with no oracle beyond the benign disk tag. The inference
relation can be re-engineered and re-verified sound, the loader provably
preserves the running heap, and the recursive mutual verifier+compiler
self-improvement is a theorem. The one genuinely-open cost is the single
labeled loeb_reflection — strengthening the kernel's logic (Gödel/Löb,
LCA), isolated to one Definition, with a proved negative bounding what
finiteness can do about it. Everything else is unconditional. The assumptions
are stated, not hidden; if anything is overclaimed, that is a bug.
Separately, the corrigibility neighborhood (corrigibility/, 10
theories, ~60 theorems, pure HOL4, zero cheat, its own proof-carrying
corrigibility/CLAIMS.md) proves the
non-lock-in floor and its self-improvement, latency, tower-bedrock, and
spec-redundancy facts — the legible front door above — entirely within
Tier 1, with every honest boundary stated in the logic rather than the
prose. It is the answer to the question the rest of the artifact assumes
away: what should the envelope be? — leave the door open, and prove the
rest.