Skip to content

Create STATE.scm project documentation file#2

Merged
hyperpolymath merged 1 commit into
mainfrom
claude/create-state-scm-01S92fFtoSQetTjJtyUDLgx9
Dec 8, 2025
Merged

Create STATE.scm project documentation file#2
hyperpolymath merged 1 commit into
mainfrom
claude/create-state-scm-01S92fFtoSQetTjJtyUDLgx9

Conversation

@hyperpolymath
Copy link
Copy Markdown
Owner

Create stateful context tracking file following the hyperpolymath/STATE.scm format to maintain continuity across Claude sessions.

Includes:

  • Current position: 9/12 provers (75%) complete
  • Route to MVP v1: GitLab deployment, prover installation
  • Issues: 5 tracked blockers with severity
  • Questions: 4 decisions for user input
  • 12-month roadmap with milestones
  • History snapshots for velocity tracking

Create stateful context tracking file following the
hyperpolymath/STATE.scm format to maintain continuity
across Claude sessions.

Includes:
- Current position: 9/12 provers (75%) complete
- Route to MVP v1: GitLab deployment, prover installation
- Issues: 5 tracked blockers with severity
- Questions: 4 decisions for user input
- 12-month roadmap with milestones
- History snapshots for velocity tracking
@hyperpolymath hyperpolymath merged commit 52164fe into main Dec 8, 2025
0 of 4 checks passed
@hyperpolymath hyperpolymath deleted the claude/create-state-scm-01S92fFtoSQetTjJtyUDLgx9 branch December 8, 2025 19:41
hyperpolymath added a commit that referenced this pull request Apr 18, 2026
Coq 14 → 85 190, Mizar 66 → 59 030, Agda 9 312 → 19 829.
Cumulative new records: 154 657. Coq now #2 in corpus size.
See issues #12 (umbrella), #13/#14 closed, #17 partial.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hyperpolymath added a commit that referenced this pull request Apr 18, 2026
ACL2 (277 → 170 455, 615×):
- Sparse clone acl2/acl2 at external_corpora/acl2_full/ with
  `books/` checked out (~ 13 478 .lisp files, 1.5 GB).
- Walk recursively alongside legacy curated cache.
- Accept .lisp and .acl2 files.
- Fix UTF-8 byte-index slice crash (switch to first(s, n)).
- ACL2 vaults to #2 corpus after Lean — Community Books are dense
  with named defthm declarations across hundreds of books.

MiniZinc (29 → 327, 11×):
- Clone MiniZinc/minizinc-benchmarks at external_corpora/minizinc_full/
  (~ 300 .mzn models + many .dzn data files).
- Walk recursively.
- Wrap three eachmatch calls in try/catch — large amalgamated data
  files caused PCRE catastrophic backtracking. Count growth limited
  by the extractor's gate (requires at least one variable or
  constraint); further uplift needs per-file lowering or data-file
  handling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hyperpolymath added a commit that referenced this pull request Apr 18, 2026
Tier C (6 provers) lifted from 0 records each — TLAPS 103, Alloy 136,
Boogie 1 319, Frama-C 1 317, Viper 3 541, Tamarin 4 595 = +11 011
records total. Extractors already existed; just needed upstream
corpora vendored at the expected paths.

Tier A boost: ACL2 277 → 170 455 (#2 corpus after Lean) from
acl2/acl2 books/; MiniZinc 29 → 327. Both committed in 5348c71.

Note: training_data/*.jsonl outputs are large (some > 100 MB) and
cannot be pushed to GitHub with the current on-repo storage policy.
Extractors + cloned corpora are the reproducible source of truth —
regenerate with `just extract-<prover>` when needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hyperpolymath added a commit that referenced this pull request Apr 18, 2026
End-of-session summary. Top-3 corpus ranking flipped: ACL2 (170 K) now
#2 after Lean (131 K), ahead of the previous #2 (Isabelle 50 K).

Tier breakdown of session work:
  Tier A (6 provers):                                +211 188
  Tier B (4 provers):                                 +41 096
  ACL2 + MiniZinc (2):                               +170 476
  Tier C wave 1 (6 — TLAPS/Alloy/Boogie/Frama-C/Viper/Tamarin):    +11 011
  Tier C wave 2 (4 — Spin/SeaHorn/KeY/Prism):                        +1 920
  Tier C wave 3 (5 — dReal/Cameleer/Abella/Dedukti/Arend):          +6 532

Still-missing: ProVerif (clone empty), NuSMV/UPPAAL (binary distros),
CBMC (1.5 GB), ABC/CaDiCaL/Kissat/MiniSat (SAT), Isabelle_ZF/Nitpick/
Nunchaku/Mercury/Athena.

Low-yield-needs-extractor-tuning: Naproche (19 files → 0), Matita
(238 → 17), Arend (1 → 7).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hyperpolymath added a commit that referenced this pull request Apr 20, 2026
Create stateful context tracking file following the
hyperpolymath/STATE.scm format to maintain continuity across Claude
sessions.

Includes:
- Current position: 9/12 provers (75%) complete
- Route to MVP v1: GitLab deployment, prover installation
- Issues: 5 tracked blockers with severity
- Questions: 4 decisions for user input
- 12-month roadmap with milestones
- History snapshots for velocity tracking

Co-authored-by: Claude <noreply@anthropic.com>
hyperpolymath added a commit that referenced this pull request Apr 20, 2026
Coq 14 → 85 190, Mizar 66 → 59 030, Agda 9 312 → 19 829.
Cumulative new records: 154 657. Coq now #2 in corpus size.
See issues #12 (umbrella), #13/#14 closed, #17 partial.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hyperpolymath added a commit that referenced this pull request Apr 20, 2026
ACL2 (277 → 170 455, 615×):
- Sparse clone acl2/acl2 at external_corpora/acl2_full/ with
  `books/` checked out (~ 13 478 .lisp files, 1.5 GB).
- Walk recursively alongside legacy curated cache.
- Accept .lisp and .acl2 files.
- Fix UTF-8 byte-index slice crash (switch to first(s, n)).
- ACL2 vaults to #2 corpus after Lean — Community Books are dense
  with named defthm declarations across hundreds of books.

MiniZinc (29 → 327, 11×):
- Clone MiniZinc/minizinc-benchmarks at external_corpora/minizinc_full/
  (~ 300 .mzn models + many .dzn data files).
- Walk recursively.
- Wrap three eachmatch calls in try/catch — large amalgamated data
  files caused PCRE catastrophic backtracking. Count growth limited
  by the extractor's gate (requires at least one variable or
  constraint); further uplift needs per-file lowering or data-file
  handling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hyperpolymath added a commit that referenced this pull request Apr 20, 2026
Tier C (6 provers) lifted from 0 records each — TLAPS 103, Alloy 136,
Boogie 1 319, Frama-C 1 317, Viper 3 541, Tamarin 4 595 = +11 011
records total. Extractors already existed; just needed upstream
corpora vendored at the expected paths.

Tier A boost: ACL2 277 → 170 455 (#2 corpus after Lean) from
acl2/acl2 books/; MiniZinc 29 → 327. Both committed in 618d6b9.

Note: training_data/*.jsonl outputs are large (some > 100 MB) and
cannot be pushed to GitHub with the current on-repo storage policy.
Extractors + cloned corpora are the reproducible source of truth —
regenerate with `just extract-<prover>` when needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hyperpolymath added a commit that referenced this pull request Apr 20, 2026
End-of-session summary. Top-3 corpus ranking flipped: ACL2 (170 K) now
#2 after Lean (131 K), ahead of the previous #2 (Isabelle 50 K).

Tier breakdown of session work:
  Tier A (6 provers):                                +211 188
  Tier B (4 provers):                                 +41 096
  ACL2 + MiniZinc (2):                               +170 476
  Tier C wave 1 (6 — TLAPS/Alloy/Boogie/Frama-C/Viper/Tamarin):    +11 011
  Tier C wave 2 (4 — Spin/SeaHorn/KeY/Prism):                        +1 920
  Tier C wave 3 (5 — dReal/Cameleer/Abella/Dedukti/Arend):          +6 532

Still-missing: ProVerif (clone empty), NuSMV/UPPAAL (binary distros),
CBMC (1.5 GB), ABC/CaDiCaL/Kissat/MiniSat (SAT), Isabelle_ZF/Nitpick/
Nunchaku/Mercury/Athena.

Low-yield-needs-extractor-tuning: Naproche (19 files → 0), Matita
(238 → 17), Arend (1 → 7).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hyperpolymath added a commit that referenced this pull request May 11, 2026
## Summary

SPARK 2014/2022 formal companion for
`src/rust/integrity/solver_integrity.rs`, the SHAKE3-512 + BLAKE3 binary
integrity layer. Advances the Rust→Rust/SPARK migration roadmap to run
#2 (axiom_tracker was run #1, #47).

- `spark/solver_integrity/solver_integrity.ads` — Ada spec mirroring
`IntegrityStatus` + `IntegrityChecker::verify_solver`'s decision tree,
with case-arm `Post`-conditions and named ghost lemmas
- `spark/solver_integrity/solver_integrity.adb` — body as a 5-arm
if-chain mirroring the Rust source line-for-line
- `spark/solver_integrity/solver_integrity_proof.adb` — catalogue of 11
proof obligations with discharge notes
- `spark/solver_integrity/solver_integrity.gpr` — GPRbuild project
(gnat2022, --level=1)
- `spark/README.adoc` — module index updated

## Proved properties

| # | Property | Method |
|---|---|---|
| PO-1 | `Integrity_Status` enumeration totality | Definitional (Ada
enum) |
| PO-2 | `Classify_Verification` → Verified case-arm | CVC5/Z3 path
substitution |
| PO-3 | **Hash-mismatch ⇒ Tampered** (issue #40 load-bearing) | CVC5/Z3
path substitution |
| PO-4 | `Classify_Verification` → Uninitialized case-arm | CVC5/Z3 path
substitution |
| PO-5 | `Classify_Verification` → Missing case-arm (disjunction) |
CVC5/Z3 path substitution |
| PO-6 | `Classify_Verification` never returns Unchecked | Unsat path;
SMT immediate |
| PO-7 | `Is_Safe` post-condition | Definitional (expression function) |
| PO-8 | `Quick_Reverify` Unchanged ⇔ Cached_Match | Trivial |
| PO-9 | `Aggregate_Safe` post-condition | Definitional (expression
function) |
| PO-10 | `Cap_At_Tampered` ghost lemma (Has_Tampered ⇒ ¬Aggregate_Safe)
| Quantifier instantiation, --level=1 |
| PO-11 | `Hash_Mismatch_Means_Tampered` named ghost lemma |
Post-condition application, --level=1 |

## gnatprove summary

```
Total  11 checks, 0 unproved, 0 justified
  Run-time Checks      2  (CVC5 25%, altergo 75%)
  Functional Contracts 4  (CVC5 44%, Z3 6%, altergo 50%)
  Termination          5  (flow analysis)
```

Max steps used: **53**. Toolchain on the agent runner: gnatprove 14.1.1
(Why3 1.6.0, Alt-Ergo 2.4.0, CVC5 1.1.2, Z3), wired via #40.

## SPARK modeling note

SPARK doesn't model file IO, async, or cryptographic primitives, so the
companion is intentionally a *pure-decision mirror*. The Rust
`find_solver_binary` / `compute_shake3_512` / `compute_blake3` /
`RwLock` layer is abstracted into four boolean inputs of
`Classify_Verification` (`Binary_Found`, `Expected_Is_Placeholder`,
`Compute_Succeeded`, `Hashes_Match`). The crypto guarantees come from
the `tiny_keccak` (SHAKE3) and `blake3` crates. SPARK's contribution is
proving the **decision tree above the crypto layer** is correct and that
hash mismatch unambiguously yields `Tampered`.

For the multi-solver path (`verify_all`), the SPARK model is
`Aggregate_Safe` / `Has_Tampered`: a fleet is safe iff every member is
safe, and `Cap_At_Tampered` proves any single Tampered member forces the
aggregate unsafe.

## Test plan

- [x] `gnatprove --mode=prove --level=1 -P
spark/solver_integrity/solver_integrity.gpr` — 11/0/0 (run locally on
the wired toolchain at branch tip)
- [x] Toolchain prereqs: `gnat_native ^14`, `gprbuild ^22`, `gnatprove
14.1.1` — satisfied via #40

## CI note

Repo-level CI failures on `main` (cargo fmt diffs in
`tests/stage1_integration_test.rs`, rustdoc intra-doc-link breaks in
`src/rust/provers/pvs.rs`, `Cargo.lock` v4 vs MSRV, missing `just` on
MVP Smoke runner, missing `.clusterfuzzlite/Dockerfile`, ubuntu24 vs
`erlef/setup-beam`) are pre-existing and unrelated to this PR — see the
triage table on #47. This PR adds only `spark/solver_integrity/` and one
line to `spark/README.adoc`; touches no Rust, Julia, Elixir, Dockerfile,
or Cargo.lock. Expecting admin-merge with the same justification as #47.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
hyperpolymath added a commit that referenced this pull request May 11, 2026
… roadmap) (#50)

## Summary

SPARK 2014/2022 formal companion for
`src/rust/verification/certificates.rs`, the Alethe / DRAT / TSTP /
OpenTheory cert parsing layer. **Completes the initial 3-module
Rust→Rust/SPARK roadmap from #40** (run #1 was axiom_tracker #47, run #2
was solver_integrity #49).

- `spark/certificates/certificates.ads` — Ada spec for
`Certificate_Format`, `Format_Extension`, `Alethe_Step`, `Verify_Alethe`
+ 3 ghost lemmas
- `spark/certificates/certificates.adb` — body (`Get_Format_Extension`
six-arm case; ghost lemmas have null bodies)
- `spark/certificates/certificates_proof.adb` — 8 proof obligations
- `spark/certificates/certificates.gpr`, `alire.toml`, `.gitignore`
- `spark/README.adoc` — module index updated

## Proved properties

| # | Property | Method |
|---|---|---|
| PO-1 | `Certificate_Format` / `Format_Extension` enumeration totality
| Definitional |
| PO-2 | `Get_Format_Extension` exhaustive case-arm Post | CVC5 path
substitution |
| PO-3 | `Check_Alethe_Step` exact characterisation (Assume ⇒ raw
nonempty; Step ⇒ has `:rule`) | Expression function, definitional |
| PO-4 | `Verify_Alethe` = ∀ step. valid | Expression function,
definitional |
| PO-5 | `Has_Malformed` ghost predicate | Expression function,
definitional |
| PO-6 | **`Cap_At_Malformed`** — any malformed step ⇒ `¬Verify_Alethe`
| Quantifier duality, --level=1 |
| PO-7 | `Empty_Verifies` — empty input vacuously valid | Empty-range
axiom, --level=0 |
| PO-8 | **`Format_Extension_Is_Injective`** — distinct formats ⇒
distinct extensions | Six-arm case-split (30 unsat subcases), --level=1
|

## gnatprove summary

```
Total  10 checks, 0 unproved, 0 justified
  Run-time Checks      2  (CVC5 75%, altergo 25%)
  Functional Contracts 4  (CVC5 56%, altergo 44%)
  Termination          4  (flow analysis)
```

Max steps: **230** (the case-split injectivity proof is the
work-heaviest VC, as expected). Toolchain on the agent runner: gnatprove
14.1.1 (Why3 1.6.0, Alt-Ergo 2.4.0, CVC5 1.1.2, Z3) — wired via #40.

## Scope and out-of-scope

**In scope (mirrored):**
- `CertificateFormat` enum (6 variants, same positional order as Rust)
- `format_extension` six-arm match (Rust line 253-262)
- `check_alethe_step` two-arm match on `AletheStepKind` (Rust line
230-233)
- `verify_alethe`'s scan-all-and-reject-on-any-failure decision (Rust
line 103-109)

**Out of scope (cannot be modeled in SPARK):**
- `verify_drat` (Rust line 127-179): shells out to `drat-trim`, depends
on filesystem + subprocess + stdout substring parse
- Lean4 / Coq kernel checks: subprocess-based
- `parse_alethe_steps`: string-level `line.starts_with` predicates;
SPARK abstracts the parse result into `Alethe_Step` records with
`Raw_Is_Nonempty` / `Has_Rule_Marker` set
- BLAKE3 hashing in `ProofCertificate::new`: opaque crypto

Equivalence to Rust `verify_alethe` is **strong over the structural
validator**; the Rust-side parser-adapter must compute `Raw_Is_Nonempty`
and `Has_Rule_Marker` faithfully at the boundary (a constructor
obligation, not a SPARK obligation).

## Test plan

- [x] `gnatprove --mode=prove --level=1 -P
spark/certificates/certificates.gpr` — 10/0/0 locally
- [x] Toolchain prereqs satisfied via #40

## CI note

Same pre-existing repo-level failures as #47 / #49 (cargo fmt diffs in
untouched test files, rustdoc intra-doc-link breaks in `pvs.rs`,
`Cargo.lock` v4 vs MSRV, missing `just` on MVP Smoke, missing
`.clusterfuzzlite/Dockerfile`, ubuntu24 vs `erlef/setup-beam`). This PR
adds only `spark/certificates/` and one line to `spark/README.adoc` —
touches no Rust, Julia, Elixir, Dockerfile, or Cargo.lock. Expecting
admin-merge with the same justification as the prior two SPARK runs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants