Create STATE.scm project documentation file by hyperpolymath · Pull Request #2 · hyperpolymath/echidna

hyperpolymath · 2025-12-08T19:41:28Z

Create stateful context tracking file following the hyperpolymath/STATE.scm format to maintain continuity across Claude sessions.

Includes:

Current position: 9/12 provers (75%) complete
Route to MVP v1: GitLab deployment, prover installation
Issues: 5 tracked blockers with severity
Questions: 4 decisions for user input
12-month roadmap with milestones
History snapshots for velocity tracking

Create stateful context tracking file following the hyperpolymath/STATE.scm format to maintain continuity across Claude sessions. Includes: - Current position: 9/12 provers (75%) complete - Route to MVP v1: GitLab deployment, prover installation - Issues: 5 tracked blockers with severity - Questions: 4 decisions for user input - 12-month roadmap with milestones - History snapshots for velocity tracking

Coq 14 → 85 190, Mizar 66 → 59 030, Agda 9 312 → 19 829. Cumulative new records: 154 657. Coq now #2 in corpus size. See issues #12 (umbrella), #13/#14 closed, #17 partial. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ACL2 (277 → 170 455, 615×): - Sparse clone acl2/acl2 at external_corpora/acl2_full/ with `books/` checked out (~ 13 478 .lisp files, 1.5 GB). - Walk recursively alongside legacy curated cache. - Accept .lisp and .acl2 files. - Fix UTF-8 byte-index slice crash (switch to first(s, n)). - ACL2 vaults to #2 corpus after Lean — Community Books are dense with named defthm declarations across hundreds of books. MiniZinc (29 → 327, 11×): - Clone MiniZinc/minizinc-benchmarks at external_corpora/minizinc_full/ (~ 300 .mzn models + many .dzn data files). - Walk recursively. - Wrap three eachmatch calls in try/catch — large amalgamated data files caused PCRE catastrophic backtracking. Count growth limited by the extractor's gate (requires at least one variable or constraint); further uplift needs per-file lowering or data-file handling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Tier C (6 provers) lifted from 0 records each — TLAPS 103, Alloy 136, Boogie 1 319, Frama-C 1 317, Viper 3 541, Tamarin 4 595 = +11 011 records total. Extractors already existed; just needed upstream corpora vendored at the expected paths. Tier A boost: ACL2 277 → 170 455 (#2 corpus after Lean) from acl2/acl2 books/; MiniZinc 29 → 327. Both committed in 5348c71. Note: training_data/*.jsonl outputs are large (some > 100 MB) and cannot be pushed to GitHub with the current on-repo storage policy. Extractors + cloned corpora are the reproducible source of truth — regenerate with `just extract-<prover>` when needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

End-of-session summary. Top-3 corpus ranking flipped: ACL2 (170 K) now #2 after Lean (131 K), ahead of the previous #2 (Isabelle 50 K). Tier breakdown of session work: Tier A (6 provers): +211 188 Tier B (4 provers): +41 096 ACL2 + MiniZinc (2): +170 476 Tier C wave 1 (6 — TLAPS/Alloy/Boogie/Frama-C/Viper/Tamarin): +11 011 Tier C wave 2 (4 — Spin/SeaHorn/KeY/Prism): +1 920 Tier C wave 3 (5 — dReal/Cameleer/Abella/Dedukti/Arend): +6 532 Still-missing: ProVerif (clone empty), NuSMV/UPPAAL (binary distros), CBMC (1.5 GB), ABC/CaDiCaL/Kissat/MiniSat (SAT), Isabelle_ZF/Nitpick/ Nunchaku/Mercury/Athena. Low-yield-needs-extractor-tuning: Naproche (19 files → 0), Matita (238 → 17), Arend (1 → 7). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Create stateful context tracking file following the hyperpolymath/STATE.scm format to maintain continuity across Claude sessions. Includes: - Current position: 9/12 provers (75%) complete - Route to MVP v1: GitLab deployment, prover installation - Issues: 5 tracked blockers with severity - Questions: 4 decisions for user input - 12-month roadmap with milestones - History snapshots for velocity tracking Co-authored-by: Claude <noreply@anthropic.com>

Coq 14 → 85 190, Mizar 66 → 59 030, Agda 9 312 → 19 829. Cumulative new records: 154 657. Coq now #2 in corpus size. See issues #12 (umbrella), #13/#14 closed, #17 partial. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ACL2 (277 → 170 455, 615×): - Sparse clone acl2/acl2 at external_corpora/acl2_full/ with `books/` checked out (~ 13 478 .lisp files, 1.5 GB). - Walk recursively alongside legacy curated cache. - Accept .lisp and .acl2 files. - Fix UTF-8 byte-index slice crash (switch to first(s, n)). - ACL2 vaults to #2 corpus after Lean — Community Books are dense with named defthm declarations across hundreds of books. MiniZinc (29 → 327, 11×): - Clone MiniZinc/minizinc-benchmarks at external_corpora/minizinc_full/ (~ 300 .mzn models + many .dzn data files). - Walk recursively. - Wrap three eachmatch calls in try/catch — large amalgamated data files caused PCRE catastrophic backtracking. Count growth limited by the extractor's gate (requires at least one variable or constraint); further uplift needs per-file lowering or data-file handling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Tier C (6 provers) lifted from 0 records each — TLAPS 103, Alloy 136, Boogie 1 319, Frama-C 1 317, Viper 3 541, Tamarin 4 595 = +11 011 records total. Extractors already existed; just needed upstream corpora vendored at the expected paths. Tier A boost: ACL2 277 → 170 455 (#2 corpus after Lean) from acl2/acl2 books/; MiniZinc 29 → 327. Both committed in 618d6b9. Note: training_data/*.jsonl outputs are large (some > 100 MB) and cannot be pushed to GitHub with the current on-repo storage policy. Extractors + cloned corpora are the reproducible source of truth — regenerate with `just extract-<prover>` when needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

End-of-session summary. Top-3 corpus ranking flipped: ACL2 (170 K) now #2 after Lean (131 K), ahead of the previous #2 (Isabelle 50 K). Tier breakdown of session work: Tier A (6 provers): +211 188 Tier B (4 provers): +41 096 ACL2 + MiniZinc (2): +170 476 Tier C wave 1 (6 — TLAPS/Alloy/Boogie/Frama-C/Viper/Tamarin): +11 011 Tier C wave 2 (4 — Spin/SeaHorn/KeY/Prism): +1 920 Tier C wave 3 (5 — dReal/Cameleer/Abella/Dedukti/Arend): +6 532 Still-missing: ProVerif (clone empty), NuSMV/UPPAAL (binary distros), CBMC (1.5 GB), ABC/CaDiCaL/Kissat/MiniSat (SAT), Isabelle_ZF/Nitpick/ Nunchaku/Mercury/Athena. Low-yield-needs-extractor-tuning: Naproche (19 files → 0), Matita (238 → 17), Arend (1 → 7). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

## Summary SPARK 2014/2022 formal companion for `src/rust/integrity/solver_integrity.rs`, the SHAKE3-512 + BLAKE3 binary integrity layer. Advances the Rust→Rust/SPARK migration roadmap to run #2 (axiom_tracker was run #1, #47). - `spark/solver_integrity/solver_integrity.ads` — Ada spec mirroring `IntegrityStatus` + `IntegrityChecker::verify_solver`'s decision tree, with case-arm `Post`-conditions and named ghost lemmas - `spark/solver_integrity/solver_integrity.adb` — body as a 5-arm if-chain mirroring the Rust source line-for-line - `spark/solver_integrity/solver_integrity_proof.adb` — catalogue of 11 proof obligations with discharge notes - `spark/solver_integrity/solver_integrity.gpr` — GPRbuild project (gnat2022, --level=1) - `spark/README.adoc` — module index updated ## Proved properties | # | Property | Method | |---|---|---| | PO-1 | `Integrity_Status` enumeration totality | Definitional (Ada enum) | | PO-2 | `Classify_Verification` → Verified case-arm | CVC5/Z3 path substitution | | PO-3 | **Hash-mismatch ⇒ Tampered** (issue #40 load-bearing) | CVC5/Z3 path substitution | | PO-4 | `Classify_Verification` → Uninitialized case-arm | CVC5/Z3 path substitution | | PO-5 | `Classify_Verification` → Missing case-arm (disjunction) | CVC5/Z3 path substitution | | PO-6 | `Classify_Verification` never returns Unchecked | Unsat path; SMT immediate | | PO-7 | `Is_Safe` post-condition | Definitional (expression function) | | PO-8 | `Quick_Reverify` Unchanged ⇔ Cached_Match | Trivial | | PO-9 | `Aggregate_Safe` post-condition | Definitional (expression function) | | PO-10 | `Cap_At_Tampered` ghost lemma (Has_Tampered ⇒ ¬Aggregate_Safe) | Quantifier instantiation, --level=1 | | PO-11 | `Hash_Mismatch_Means_Tampered` named ghost lemma | Post-condition application, --level=1 | ## gnatprove summary ``` Total 11 checks, 0 unproved, 0 justified Run-time Checks 2 (CVC5 25%, altergo 75%) Functional Contracts 4 (CVC5 44%, Z3 6%, altergo 50%) Termination 5 (flow analysis) ``` Max steps used: **53**. Toolchain on the agent runner: gnatprove 14.1.1 (Why3 1.6.0, Alt-Ergo 2.4.0, CVC5 1.1.2, Z3), wired via #40. ## SPARK modeling note SPARK doesn't model file IO, async, or cryptographic primitives, so the companion is intentionally a *pure-decision mirror*. The Rust `find_solver_binary` / `compute_shake3_512` / `compute_blake3` / `RwLock` layer is abstracted into four boolean inputs of `Classify_Verification` (`Binary_Found`, `Expected_Is_Placeholder`, `Compute_Succeeded`, `Hashes_Match`). The crypto guarantees come from the `tiny_keccak` (SHAKE3) and `blake3` crates. SPARK's contribution is proving the **decision tree above the crypto layer** is correct and that hash mismatch unambiguously yields `Tampered`. For the multi-solver path (`verify_all`), the SPARK model is `Aggregate_Safe` / `Has_Tampered`: a fleet is safe iff every member is safe, and `Cap_At_Tampered` proves any single Tampered member forces the aggregate unsafe. ## Test plan - [x] `gnatprove --mode=prove --level=1 -P spark/solver_integrity/solver_integrity.gpr` — 11/0/0 (run locally on the wired toolchain at branch tip) - [x] Toolchain prereqs: `gnat_native ^14`, `gprbuild ^22`, `gnatprove 14.1.1` — satisfied via #40 ## CI note Repo-level CI failures on `main` (cargo fmt diffs in `tests/stage1_integration_test.rs`, rustdoc intra-doc-link breaks in `src/rust/provers/pvs.rs`, `Cargo.lock` v4 vs MSRV, missing `just` on MVP Smoke runner, missing `.clusterfuzzlite/Dockerfile`, ubuntu24 vs `erlef/setup-beam`) are pre-existing and unrelated to this PR — see the triage table on #47. This PR adds only `spark/solver_integrity/` and one line to `spark/README.adoc`; touches no Rust, Julia, Elixir, Dockerfile, or Cargo.lock. Expecting admin-merge with the same justification as #47. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

… roadmap) (#50) ## Summary SPARK 2014/2022 formal companion for `src/rust/verification/certificates.rs`, the Alethe / DRAT / TSTP / OpenTheory cert parsing layer. **Completes the initial 3-module Rust→Rust/SPARK roadmap from #40** (run #1 was axiom_tracker #47, run #2 was solver_integrity #49). - `spark/certificates/certificates.ads` — Ada spec for `Certificate_Format`, `Format_Extension`, `Alethe_Step`, `Verify_Alethe` + 3 ghost lemmas - `spark/certificates/certificates.adb` — body (`Get_Format_Extension` six-arm case; ghost lemmas have null bodies) - `spark/certificates/certificates_proof.adb` — 8 proof obligations - `spark/certificates/certificates.gpr`, `alire.toml`, `.gitignore` - `spark/README.adoc` — module index updated ## Proved properties | # | Property | Method | |---|---|---| | PO-1 | `Certificate_Format` / `Format_Extension` enumeration totality | Definitional | | PO-2 | `Get_Format_Extension` exhaustive case-arm Post | CVC5 path substitution | | PO-3 | `Check_Alethe_Step` exact characterisation (Assume ⇒ raw nonempty; Step ⇒ has `:rule`) | Expression function, definitional | | PO-4 | `Verify_Alethe` = ∀ step. valid | Expression function, definitional | | PO-5 | `Has_Malformed` ghost predicate | Expression function, definitional | | PO-6 | **`Cap_At_Malformed`** — any malformed step ⇒ `¬Verify_Alethe` | Quantifier duality, --level=1 | | PO-7 | `Empty_Verifies` — empty input vacuously valid | Empty-range axiom, --level=0 | | PO-8 | **`Format_Extension_Is_Injective`** — distinct formats ⇒ distinct extensions | Six-arm case-split (30 unsat subcases), --level=1 | ## gnatprove summary ``` Total 10 checks, 0 unproved, 0 justified Run-time Checks 2 (CVC5 75%, altergo 25%) Functional Contracts 4 (CVC5 56%, altergo 44%) Termination 4 (flow analysis) ``` Max steps: **230** (the case-split injectivity proof is the work-heaviest VC, as expected). Toolchain on the agent runner: gnatprove 14.1.1 (Why3 1.6.0, Alt-Ergo 2.4.0, CVC5 1.1.2, Z3) — wired via #40. ## Scope and out-of-scope **In scope (mirrored):** - `CertificateFormat` enum (6 variants, same positional order as Rust) - `format_extension` six-arm match (Rust line 253-262) - `check_alethe_step` two-arm match on `AletheStepKind` (Rust line 230-233) - `verify_alethe`'s scan-all-and-reject-on-any-failure decision (Rust line 103-109) **Out of scope (cannot be modeled in SPARK):** - `verify_drat` (Rust line 127-179): shells out to `drat-trim`, depends on filesystem + subprocess + stdout substring parse - Lean4 / Coq kernel checks: subprocess-based - `parse_alethe_steps`: string-level `line.starts_with` predicates; SPARK abstracts the parse result into `Alethe_Step` records with `Raw_Is_Nonempty` / `Has_Rule_Marker` set - BLAKE3 hashing in `ProofCertificate::new`: opaque crypto Equivalence to Rust `verify_alethe` is **strong over the structural validator**; the Rust-side parser-adapter must compute `Raw_Is_Nonempty` and `Has_Rule_Marker` faithfully at the boundary (a constructor obligation, not a SPARK obligation). ## Test plan - [x] `gnatprove --mode=prove --level=1 -P spark/certificates/certificates.gpr` — 10/0/0 locally - [x] Toolchain prereqs satisfied via #40 ## CI note Same pre-existing repo-level failures as #47 / #49 (cargo fmt diffs in untouched test files, rustdoc intra-doc-link breaks in `pvs.rs`, `Cargo.lock` v4 vs MSRV, missing `just` on MVP Smoke, missing `.clusterfuzzlite/Dockerfile`, ubuntu24 vs `erlef/setup-beam`). This PR adds only `spark/certificates/` and one line to `spark/README.adoc` — touches no Rust, Julia, Elixir, Dockerfile, or Cargo.lock. Expecting admin-merge with the same justification as the prior two SPARK runs. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

hyperpolymath merged commit 52164fe into main Dec 8, 2025
0 of 4 checks passed

hyperpolymath deleted the claude/create-state-scm-01S92fFtoSQetTjJtyUDLgx9 branch December 8, 2025 19:41

hyperpolymath mentioned this pull request May 11, 2026

feat(spark): solver_integrity SPARK companion — run #2 #49

Merged

2 tasks

hyperpolymath mentioned this pull request May 11, 2026

feat(spark): certificates SPARK companion — run #3 (completes initial roadmap) #50

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create STATE.scm project documentation file#2

Create STATE.scm project documentation file#2
hyperpolymath merged 1 commit into
mainfrom
claude/create-state-scm-01S92fFtoSQetTjJtyUDLgx9

hyperpolymath commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

hyperpolymath commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants