Claude/echidna mcp by hyperpolymath · Pull Request #37 · hyperpolymath/echidna

hyperpolymath · 2026-04-25T08:40:52Z

No description provided.

…tool Exposes the `echidna` CLI through the Model Context Protocol so Claude Code / Claude API / other MCP-aware agents can invoke ECHIDNA's 105 prover backends as a first-class tool-use action. ### What ships - New workspace member `crates/echidna-mcp/` with a Rust binary built on rmcp 1.5 (`ToolRouter` + `#[tool_router]` / `#[tool_handler]` macros, stdio transport). Single tool: `prove`. - Tool schema (auto-generated from `ProveParams` via schemars 1): - `file: string` (required) — path to proof file (SMT-LIB, Lean, Coq, Agda, Isabelle, Mizar, F*, SPASS, Metamath, …) - `prover: string` (optional) — backend name; auto-detect if absent - `timeout_secs: u32` (optional, default 300) Returns `{verified, message, prover, raw_output, stderr}`. - Stdio transport; log goes to stderr (so MCP clients don't see it on stdout). Echidna binary path via `ECHIDNA_BIN` env or `echidna` in PATH. - Claude Code wiring: `.claude/settings.json` with `{"mcpServers": {"echidna": {"command": "echidna-mcp"}}}`. ### Smoke test (verified on this branch) 1. Install z3 (apt) — 4.8.12 in PATH. 2. Trivial SMT-LIB file `(set-logic QF_LIA) (declare-const x Int) (assert (> x 0)) (assert (< x 2)) (check-sat)`. 3. Feed the MCP server: initialize → notifications/initialized → tools/list → tools/call(prove, {file, prover:"Z3", timeout_secs:30}). 4. Response: `{"verified": true, "message": "✓ Proof verified successfully...", "prover": "Z3", ...}`. End-to-end path works: MCP JSON-RPC → echidna CLI → Z3 → verdict. ### Roadmap update (docs/handover/TODO.md) Added "Wave-5 (new backend targets, no adapter yet)" section tracking prover backends that do not yet have an adapter in src/rust/provers/: - **Andromeda** — scaffold `provers/andromeda.rs`, shells out to `andromeda` CLI (OCaml source build required). Open-source (MIT). - **Theorema** — deferred (Mathematica-based, commercial licence). - **Globular** — not a CLI prover (web UI for higher category theory); skipped unless scoped to graphical proof capture. ### Next sessions - Install + wire the remaining 7 provers from the agreed MVP subset: Coq/Rocq, Isabelle, Agda, Mizar, F*, Lean 4, SPASS (SPASS apt pkg unavailable on this env, needs manual), Metamath. - Containerfile: ECHIDNA + curated provers + the MCP server, so agents can `podman run` a self-contained proving service. - Andromeda backend scaffold (Wave-5 entry #1).

Upgrades the capnp runtime and capnpc code-generator in echidna-wire to clear RUSTSEC-2025-0143 ("Unsound APIs of public constant::Reader and StructSchema", fixed in >=0.24). capnpc 0.25 regenerates the wire module identically and the 3 echidna-wire smoke tests pass. Only remaining cargo-audit warning is RUSTSEC-2025-0134 (rustls-pemfile unmaintained) pulled transitively via tonic 0.12 in echidna-grpc; that is tracked as a separate tonic 0.13 bump campaign. https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC

The previous pipeline parsed a .smt2/.v/.agda/.dfg/.mm file into the universal Term IR and then reconstructed the solver's input from that IR (via each backend's `export(state)` or `term_to_*` helper) before invoking the solver. The parsers are incomplete, so non-trivial inputs were silently mangled — the Coq parser literally emits `"param": "(P"` with a stray paren — and every downstream solver then rejected the regenerated file. A few backends hid this by early-returning `Ok(true)` whenever `state.goals.is_empty()`, which fires for real files whose parser simply fails to populate goals the way the generic pipeline expects — fake success, solver never ran. Fix follows the pattern already established by Mizar: `parse_file` stashes the absolute path under `"source_path"` and `parse_string` stashes the raw content under `"{prover}_source"`; `verify_proof` prefers those over the lossy reconstruction and falls back to the old IR-driven path only for synthetic ProofStates that were never parsed from real source. Also fixes two binary-invocation bugs surfaced during end-to-end smoke testing: - Agda: spawn the process in the file's parent directory so the top-level module name resolves relative to CWD (otherwise every non-trivial file trips "module does not match file name"). - SPASS: pass input as a positional file argument (the binary rejects DFG on stdin, misreading it as additional flags); use `-TimeLimit=N` flag syntax (space-separated form makes SPASS try to open "N" as a filename); drop `-PGiven=0 -PProblem=0` which suppress the "Proof found" banner parse_result matches on. - Metamath: wire `verify_proof` to the real `metamath` CLI via a scripted stdin session (`read "…"; verify proof *; exit`) instead of returning `Ok(state.goals.is_empty())` — the previous impl never ran the solver. End-to-end MCP smoke tests now pass for all six apt-installable MVP provers (Z3, CVC5, Coq, Agda, Metamath, SPASS): `tools/call prove` → solver → `{"verified": true}`. The 625-test unit suite remains green; clippy on the bin+lib remains clean. The remaining ~30 LOSSY_ROUNDTRIP backends (Lean, Isabelle, F*, Idris2, Vampire, etc.), the PVS/HOL4 fake-success stubs, and the TRIVIAL_ONLY SAT solvers (CaDiCaL/Kissat/MiniSat) are tracked for follow-up. https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC

Both backends' verify_proof was `Ok(state.goals.is_empty())` — no solver invocation at any point, so every file with "no goals left after parsing" (which is the normal case for top-level declarations) reported VERIFIED without running PVS or HOL4. parse_file now stashes the original path as source_path (consistent with the pattern introduced for the SMT/Coq/Agda/Metamath/SPASS backends). verify_proof prefers that path: for PVS it runs the executable in batch typecheck mode and rejects runs containing "error" or "typecheck failed"; for HOL4 it runs the configured driver against the file. Neither PVS nor HOL4 is installed in this environment, so the real-solver path is code-only; the fallback preserves the legacy empty-goals shortcut for synthetic ProofStates that never came from a parsed file. The 625-test unit suite remains green. https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC

Multi-stage build that ships echidna + echidna-mcp alongside the 6 apt-installable MVP provers (Z3, CVC5, Coq, Agda, Metamath, SPASS) and three upstream-distributed ones (Lean 4 via elan, Isabelle 2025 via tum.de tarball with a pre-built Main heap, F* via the latest GitHub-release Linux asset). The runtime entry point is /app/bin/echidna-mcp so an MCP client connects over stdio: podman build -f .containerization/Containerfile.mcp -t echidna:mcp . podman run --rm -i echidna:mcp Consumers wire it into Claude Code / Claude Desktop / any MCP client via: {"mcpServers": {"echidna": { "command": "podman", "args": ["run","--rm","-i","echidna:mcp"] }}} Mizar is omitted — it has no Debian package and the SourceForge upstream distributes neither a stable URL nor a signed tarball, so pulling it reproducibly inside a container build is a separate exercise. https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC

Batch-apply the same fix shape used for the MVP provers (z3, cvc5, coq, agda, metamath, spass) to a first wave of Phase-3 backends: abc, acl2, altergo, eprover, vampire. parse_file stashes the absolute path as "source_path"; parse_string stashes the raw content as "{prover}_source"; verify_proof prefers those over reconstruction and falls back to the old IR-driven path only for synthetic ProofStates. No binary-invocation changes in this wave — we only gate the existing solver-spawn logic behind a metadata lookup. The 625-test unit suite remains green after the change; cargo build --release completes clean. Further Phase-3 waves will follow for the remaining LOSSY backends. https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC

Applies the same parse_file → stash "source_path", parse_string → stash "{prover}_source", verify_proof → prefer-metadata-before-IR fix to the rest of the LOSSY_ROUNDTRIP backends: alloy, cbmc, chuffed, dafny, dreal, framac, fstar, glpk, hol_light, idris2, imandra, isabelle, key, lean, minizinc, minlog, mizar, nuprl, nusmv, ortools, prism, proverif, scip, seahorn, spin_checker, tamarin, tlaps, tlc, twelf, uppaal, viper, why3. Non-trivial adaptations required to preserve semantics: - idris2: Idris 2 resolves the top-level module name from CWD like Agda, so the source_path branch spawns with current_dir set to the file's parent and passes the bare filename. - seahorn: already had a seahorn_input_file escape hatch for pre- compiled .bc/.ll bitcode; source_path becomes the primary preference with the existing hatch and the IR reconstruction as successive fallbacks. - cbmc/framac: the --unwind / -wp-* metadata knobs were hoisted above the source_path branch so every path honours the user's bound overrides. - prism/tlc: these write two files (model + properties/cfg). The source_path branch passes the real model file verbatim; the multi-file generation fallback stays intact behind the if-let. - isabelle/mizar: both already had partial source stashing (raw_thy_content, source_path). Normalised to the canonical source_path + {prover}_source convention; Mizar's is_complete() short-circuit was moved below the source-backed branches so real files are not rejected for having open goals in our IR. - acl2: ACL2 reads input on stdin — the source_path branch opens the real file as Stdio::from(File) rather than piping string content, matching the existing lossy-path invocation shape. No trait signatures changed; every fallback path remains byte-for- byte identical behind an if-let guard. Full workspace build, the 625-test unit suite, and `cargo clippy --lib --bin echidna` all remain green. Phase 3 closes out the broken-backend audit that started with the Z3/CVC5/Coq MVP fixes: 39 backends now prefer their original source over the lossy Term-IR round-trip. https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC

…ve-prover roundtrip suite Three coupled changes that leave the platform at 0 cargo-audit findings with a test suite that actually exercises the Phase-3 backend fix. ### echidna-grpc Eight pre-existing compile errors (unrelated to the tonic bump but blocking it) are now fixed: - `main.rs` was missing the `mod ffi_wrapper;` declaration even though `ffi_wrapper.rs` sat next to it. - Three `echidna::core::ProofState::new(...)` callers were passing `String` / `&str`; the signature is now `new(goal: Term)`. Wrap the raw content in `Term::Hole(...)` at every call site. - `FfiProverBackend` was missing three required trait methods (`search_theorems`, `config`, `set_config`); added along with a `ProverConfig` field to satisfy the getter/setter pair. - `TacticResult::Success` takes `ProofState` directly, not `Box<ProofState>` — unboxed. - The local `CoreTacticResult` alias was unused; replaced the remaining three call sites with the re-exported `TacticResult`. - `parse_file` now stashes `source_path` + `ffi_source` in metadata, matching the pattern introduced in the rest of `src/rust/provers/`. - `ffi_wrapper.rs` `CStr::from_ptr(*const u8)` and `CString::as_ptr() → extern *const u8` calls were platform-dependent type mismatches; added `.cast()` at each boundary. Dropped the `c_void` and `FfiStatus/FfiStringSlice/FfiOwnedString/FfiProverConfig` imports that weren't used. ### tonic 0.12 → 0.14 (clears RUSTSEC-2025-0134) Once echidna-grpc compiles, the tonic bump is mechanical: - Cargo.toml: `tonic` / `prost` pinned to `0.14`, `tonic-prost` added for the new split-out generated-code helper crate. - Cargo.toml dev deps: `tonic` features `tls` → `tls-ring` (the default TLS provider flag rename in 0.14). - build-dependencies: `tonic-build` → `tonic-prost-build`. - build.rs: `tonic_build::compile_protos` → `tonic_prost_build::`. `cargo audit` now reports "0 findings", down from: - RUSTSEC-2025-0143 (capnp unsound API) — cleared earlier by commit c497661. - RUSTSEC-2025-0134 (rustls-pemfile unmaintained, pulled transitively via tonic 0.12.3 → rustls-pemfile 2.2.0) — cleared by this bump. ### Live-prover verify roundtrip suite `tests/live_prover_verify.rs` runs the real `ProverBackend::parse_file` → `verify_proof` pipeline against per-prover fixtures in `tests/live_goals/`. The 625-test unit suite stubs solver binaries, so a regression in the Phase-3 "prefer source_path over lossy IR" fix would not be caught there — this suite closes that gap by invoking the actual binaries and asserting both the positive verdict on a valid proof AND the negative verdict on a deliberately broken one. Covered: Z3, CVC5, Coq, Agda, Metamath, SPASS (the six apt-installable MVP provers). Auto-skip when a binary is absent on PATH (same convention as `live_prover_suite.rs`) so developers without every prover installed don't see spurious failures; CI enables the `live-provers` feature per-tier with the single binary provisioned. Fixtures: - `tautology.smt2` / `contradiction.smt2` (shared Z3/CVC5 pair). - `identity.v` / `broken.v` (Coq). - `Identity.agda` / `Broken.agda` (Agda). - `trivial.mm`, `trivial.dfg` (Metamath, SPASS — positive-only). `.gitignore` in `tests/live_goals/` drops Coq/Agda compilation artifacts (`*.vo`, `*.vok`, `*.vos`, `*.glob`, `.<base>.aux`, `*.agdai`). The 625 unit tests remain green. The live-prover test binary compiles cleanly; full end-to-end run confirmed via manual CLI invocation earlier in the session (all six MVP provers return verified=true via MCP tool-call). https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC

Auto-added during the session when cleaning Coq/Agda compile artifacts out of tests/live_goals/ before committing the live-prover verify suite. The `rm -f` entry is oddly specific (one literal invocation) so it's narrowly scoped — widen if a future session wants broader live_goals cleanup. https://claude.ai/code/session_01NaC5RXjTyr5xg9XXUa2FCC

chatgpt-codex-connector · 2026-04-25T08:40:59Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Signed-off-by: Jonathan D.A. Jewell <6759885+hyperpolymath@users.noreply.github.com>

claude added 9 commits April 22, 2026 16:39

Merge branch 'main' into claude/echidna-mcp

cdd9ffb

Signed-off-by: Jonathan D.A. Jewell <6759885+hyperpolymath@users.noreply.github.com>

hyperpolymath merged commit 365dde2 into main Apr 25, 2026
25 of 42 checks passed

hyperpolymath deleted the claude/echidna-mcp branch April 25, 2026 08:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Claude/echidna mcp#37

Claude/echidna mcp#37
hyperpolymath merged 10 commits intomainfrom
claude/echidna-mcp

hyperpolymath commented Apr 25, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

hyperpolymath commented Apr 25, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants