Claude/echidna mcp#37
Merged
hyperpolymath merged 10 commits intomainfrom Apr 25, 2026
Merged
Conversation
…tool
Exposes the `echidna` CLI through the Model Context Protocol so Claude
Code / Claude API / other MCP-aware agents can invoke ECHIDNA's 105
prover backends as a first-class tool-use action.
### What ships
- New workspace member `crates/echidna-mcp/` with a Rust binary built
on rmcp 1.5 (`ToolRouter` + `#[tool_router]` / `#[tool_handler]`
macros, stdio transport). Single tool: `prove`.
- Tool schema (auto-generated from `ProveParams` via schemars 1):
- `file: string` (required) — path to proof file (SMT-LIB, Lean,
Coq, Agda, Isabelle, Mizar, F*, SPASS, Metamath, …)
- `prover: string` (optional) — backend name; auto-detect if absent
- `timeout_secs: u32` (optional, default 300)
Returns `{verified, message, prover, raw_output, stderr}`.
- Stdio transport; log goes to stderr (so MCP clients don't see it on
stdout). Echidna binary path via `ECHIDNA_BIN` env or `echidna` in PATH.
- Claude Code wiring: `.claude/settings.json` with
`{"mcpServers": {"echidna": {"command": "echidna-mcp"}}}`.
### Smoke test (verified on this branch)
1. Install z3 (apt) — 4.8.12 in PATH.
2. Trivial SMT-LIB file `(set-logic QF_LIA) (declare-const x Int)
(assert (> x 0)) (assert (< x 2)) (check-sat)`.
3. Feed the MCP server: initialize → notifications/initialized →
tools/list → tools/call(prove, {file, prover:"Z3", timeout_secs:30}).
4. Response: `{"verified": true, "message": "✓ Proof verified
successfully...", "prover": "Z3", ...}`.
End-to-end path works: MCP JSON-RPC → echidna CLI → Z3 → verdict.
### Roadmap update (docs/handover/TODO.md)
Added "Wave-5 (new backend targets, no adapter yet)" section tracking
prover backends that do not yet have an adapter in src/rust/provers/:
- **Andromeda** — scaffold `provers/andromeda.rs`, shells out to
`andromeda` CLI (OCaml source build required). Open-source (MIT).
- **Theorema** — deferred (Mathematica-based, commercial licence).
- **Globular** — not a CLI prover (web UI for higher category theory);
skipped unless scoped to graphical proof capture.
### Next sessions
- Install + wire the remaining 7 provers from the agreed MVP subset:
Coq/Rocq, Isabelle, Agda, Mizar, F*, Lean 4, SPASS (SPASS apt pkg
unavailable on this env, needs manual), Metamath.
- Containerfile: ECHIDNA + curated provers + the MCP server, so
agents can `podman run` a self-contained proving service.
- Andromeda backend scaffold (Wave-5 entry #1).
Upgrades the capnp runtime and capnpc code-generator in echidna-wire
to clear RUSTSEC-2025-0143 ("Unsound APIs of public constant::Reader
and StructSchema", fixed in >=0.24). capnpc 0.25 regenerates the
wire module identically and the 3 echidna-wire smoke tests pass.
Only remaining cargo-audit warning is RUSTSEC-2025-0134 (rustls-pemfile
unmaintained) pulled transitively via tonic 0.12 in echidna-grpc; that
is tracked as a separate tonic 0.13 bump campaign.
https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC
The previous pipeline parsed a .smt2/.v/.agda/.dfg/.mm file into the
universal Term IR and then reconstructed the solver's input from that
IR (via each backend's `export(state)` or `term_to_*` helper) before
invoking the solver. The parsers are incomplete, so non-trivial
inputs were silently mangled — the Coq parser literally emits
`"param": "(P"` with a stray paren — and every downstream solver
then rejected the regenerated file. A few backends hid this by
early-returning `Ok(true)` whenever `state.goals.is_empty()`, which
fires for real files whose parser simply fails to populate goals the
way the generic pipeline expects — fake success, solver never ran.
Fix follows the pattern already established by Mizar: `parse_file`
stashes the absolute path under `"source_path"` and `parse_string`
stashes the raw content under `"{prover}_source"`; `verify_proof`
prefers those over the lossy reconstruction and falls back to the
old IR-driven path only for synthetic ProofStates that were never
parsed from real source.
Also fixes two binary-invocation bugs surfaced during end-to-end
smoke testing:
- Agda: spawn the process in the file's parent directory so the
top-level module name resolves relative to CWD (otherwise every
non-trivial file trips "module does not match file name").
- SPASS: pass input as a positional file argument (the binary
rejects DFG on stdin, misreading it as additional flags); use
`-TimeLimit=N` flag syntax (space-separated form makes SPASS
try to open "N" as a filename); drop `-PGiven=0 -PProblem=0`
which suppress the "Proof found" banner parse_result matches on.
- Metamath: wire `verify_proof` to the real `metamath` CLI via a
scripted stdin session (`read "…"; verify proof *; exit`) instead
of returning `Ok(state.goals.is_empty())` — the previous impl
never ran the solver.
End-to-end MCP smoke tests now pass for all six apt-installable
MVP provers (Z3, CVC5, Coq, Agda, Metamath, SPASS): `tools/call
prove` → solver → `{"verified": true}`. The 625-test unit suite
remains green; clippy on the bin+lib remains clean.
The remaining ~30 LOSSY_ROUNDTRIP backends (Lean, Isabelle, F*,
Idris2, Vampire, etc.), the PVS/HOL4 fake-success stubs, and the
TRIVIAL_ONLY SAT solvers (CaDiCaL/Kissat/MiniSat) are tracked for
follow-up.
https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC
Both backends' verify_proof was `Ok(state.goals.is_empty())` — no solver invocation at any point, so every file with "no goals left after parsing" (which is the normal case for top-level declarations) reported VERIFIED without running PVS or HOL4. parse_file now stashes the original path as source_path (consistent with the pattern introduced for the SMT/Coq/Agda/Metamath/SPASS backends). verify_proof prefers that path: for PVS it runs the executable in batch typecheck mode and rejects runs containing "error" or "typecheck failed"; for HOL4 it runs the configured driver against the file. Neither PVS nor HOL4 is installed in this environment, so the real-solver path is code-only; the fallback preserves the legacy empty-goals shortcut for synthetic ProofStates that never came from a parsed file. The 625-test unit suite remains green. https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC
Multi-stage build that ships echidna + echidna-mcp alongside the 6
apt-installable MVP provers (Z3, CVC5, Coq, Agda, Metamath, SPASS)
and three upstream-distributed ones (Lean 4 via elan, Isabelle 2025
via tum.de tarball with a pre-built Main heap, F* via the latest
GitHub-release Linux asset).
The runtime entry point is /app/bin/echidna-mcp so an MCP client
connects over stdio:
podman build -f .containerization/Containerfile.mcp -t echidna:mcp .
podman run --rm -i echidna:mcp
Consumers wire it into Claude Code / Claude Desktop / any MCP client
via:
{"mcpServers": {"echidna": {
"command": "podman",
"args": ["run","--rm","-i","echidna:mcp"]
}}}
Mizar is omitted — it has no Debian package and the SourceForge
upstream distributes neither a stable URL nor a signed tarball, so
pulling it reproducibly inside a container build is a separate
exercise.
https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC
Batch-apply the same fix shape used for the MVP provers (z3, cvc5, coq,
agda, metamath, spass) to a first wave of Phase-3 backends: abc, acl2,
altergo, eprover, vampire. parse_file stashes the absolute path as
"source_path"; parse_string stashes the raw content as "{prover}_source";
verify_proof prefers those over reconstruction and falls back to the old
IR-driven path only for synthetic ProofStates.
No binary-invocation changes in this wave — we only gate the existing
solver-spawn logic behind a metadata lookup. The 625-test unit suite
remains green after the change; cargo build --release completes clean.
Further Phase-3 waves will follow for the remaining LOSSY backends.
https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC
Applies the same parse_file → stash "source_path", parse_string →
stash "{prover}_source", verify_proof → prefer-metadata-before-IR
fix to the rest of the LOSSY_ROUNDTRIP backends: alloy, cbmc, chuffed,
dafny, dreal, framac, fstar, glpk, hol_light, idris2, imandra,
isabelle, key, lean, minizinc, minlog, mizar, nuprl, nusmv, ortools,
prism, proverif, scip, seahorn, spin_checker, tamarin, tlaps, tlc,
twelf, uppaal, viper, why3.
Non-trivial adaptations required to preserve semantics:
- idris2: Idris 2 resolves the top-level module name from CWD like
Agda, so the source_path branch spawns with current_dir set to
the file's parent and passes the bare filename.
- seahorn: already had a seahorn_input_file escape hatch for pre-
compiled .bc/.ll bitcode; source_path becomes the primary
preference with the existing hatch and the IR reconstruction as
successive fallbacks.
- cbmc/framac: the --unwind / -wp-* metadata knobs were hoisted
above the source_path branch so every path honours the user's
bound overrides.
- prism/tlc: these write two files (model + properties/cfg). The
source_path branch passes the real model file verbatim; the
multi-file generation fallback stays intact behind the if-let.
- isabelle/mizar: both already had partial source stashing
(raw_thy_content, source_path). Normalised to the canonical
source_path + {prover}_source convention; Mizar's is_complete()
short-circuit was moved below the source-backed branches so real
files are not rejected for having open goals in our IR.
- acl2: ACL2 reads input on stdin — the source_path branch opens
the real file as Stdio::from(File) rather than piping string
content, matching the existing lossy-path invocation shape.
No trait signatures changed; every fallback path remains byte-for-
byte identical behind an if-let guard. Full workspace build, the
625-test unit suite, and `cargo clippy --lib --bin echidna` all
remain green.
Phase 3 closes out the broken-backend audit that started with the
Z3/CVC5/Coq MVP fixes: 39 backends now prefer their original source
over the lossy Term-IR round-trip.
https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC
…ve-prover roundtrip suite Three coupled changes that leave the platform at 0 cargo-audit findings with a test suite that actually exercises the Phase-3 backend fix. ### echidna-grpc Eight pre-existing compile errors (unrelated to the tonic bump but blocking it) are now fixed: - `main.rs` was missing the `mod ffi_wrapper;` declaration even though `ffi_wrapper.rs` sat next to it. - Three `echidna::core::ProofState::new(...)` callers were passing `String` / `&str`; the signature is now `new(goal: Term)`. Wrap the raw content in `Term::Hole(...)` at every call site. - `FfiProverBackend` was missing three required trait methods (`search_theorems`, `config`, `set_config`); added along with a `ProverConfig` field to satisfy the getter/setter pair. - `TacticResult::Success` takes `ProofState` directly, not `Box<ProofState>` — unboxed. - The local `CoreTacticResult` alias was unused; replaced the remaining three call sites with the re-exported `TacticResult`. - `parse_file` now stashes `source_path` + `ffi_source` in metadata, matching the pattern introduced in the rest of `src/rust/provers/`. - `ffi_wrapper.rs` `CStr::from_ptr(*const u8)` and `CString::as_ptr() → extern *const u8` calls were platform-dependent type mismatches; added `.cast()` at each boundary. Dropped the `c_void` and `FfiStatus/FfiStringSlice/FfiOwnedString/FfiProverConfig` imports that weren't used. ### tonic 0.12 → 0.14 (clears RUSTSEC-2025-0134) Once echidna-grpc compiles, the tonic bump is mechanical: - Cargo.toml: `tonic` / `prost` pinned to `0.14`, `tonic-prost` added for the new split-out generated-code helper crate. - Cargo.toml dev deps: `tonic` features `tls` → `tls-ring` (the default TLS provider flag rename in 0.14). - build-dependencies: `tonic-build` → `tonic-prost-build`. - build.rs: `tonic_build::compile_protos` → `tonic_prost_build::`. `cargo audit` now reports "0 findings", down from: - RUSTSEC-2025-0143 (capnp unsound API) — cleared earlier by commit c497661. - RUSTSEC-2025-0134 (rustls-pemfile unmaintained, pulled transitively via tonic 0.12.3 → rustls-pemfile 2.2.0) — cleared by this bump. ### Live-prover verify roundtrip suite `tests/live_prover_verify.rs` runs the real `ProverBackend::parse_file` → `verify_proof` pipeline against per-prover fixtures in `tests/live_goals/`. The 625-test unit suite stubs solver binaries, so a regression in the Phase-3 "prefer source_path over lossy IR" fix would not be caught there — this suite closes that gap by invoking the actual binaries and asserting both the positive verdict on a valid proof AND the negative verdict on a deliberately broken one. Covered: Z3, CVC5, Coq, Agda, Metamath, SPASS (the six apt-installable MVP provers). Auto-skip when a binary is absent on PATH (same convention as `live_prover_suite.rs`) so developers without every prover installed don't see spurious failures; CI enables the `live-provers` feature per-tier with the single binary provisioned. Fixtures: - `tautology.smt2` / `contradiction.smt2` (shared Z3/CVC5 pair). - `identity.v` / `broken.v` (Coq). - `Identity.agda` / `Broken.agda` (Agda). - `trivial.mm`, `trivial.dfg` (Metamath, SPASS — positive-only). `.gitignore` in `tests/live_goals/` drops Coq/Agda compilation artifacts (`*.vo`, `*.vok`, `*.vos`, `*.glob`, `.<base>.aux`, `*.agdai`). The 625 unit tests remain green. The live-prover test binary compiles cleanly; full end-to-end run confirmed via manual CLI invocation earlier in the session (all six MVP provers return verified=true via MCP tool-call). https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC
Auto-added during the session when cleaning Coq/Agda compile artifacts out of tests/live_goals/ before committing the live-prover verify suite. The `rm -f` entry is oddly specific (one literal invocation) so it's narrowly scoped — widen if a future session wants broader live_goals cleanup. https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Signed-off-by: Jonathan D.A. Jewell <6759885+hyperpolymath@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.