Skip to content

Releases: mcp-tool-shop-org/prism-verify

prism-verify 1.6.0 — family-AB harder corpus + round-robin CLI

Choose a tag to compare

@mcp-tool-shop mcp-tool-shop released this 14 Jun 22:45
7c8b677

@-

prism-verify 1.5.0 — valid family-different A/B

Choose a tag to compare

@mcp-tool-shop mcp-tool-shop released this 14 Jun 14:28
0b73f64

The validity-clean rebuild of prism's flagship Lock-1 measurement ("family-different verification helps").

Added

  • Family-different A/B v2 — the within-judge round-robin. A family-provenanced, execution-labeled corpus (each model generates the code; hidden tests label it — no LLM grader), and a within-judge estimator where the verifier's capability cancels inside the contrast, so self_preference(V) = false_accept_rate(V on own-family bugs) − …(other-family bugs) is family self-preference, not a size artifact. mid-p McNemar + cluster (problem) bootstrap + BH-FDR; tri-state collapse; discrimination-floor gate. First valid pilot published in eval/RESULTS.md (an interpretable ceiling-effect null — the v1.4.0 result was a confound).
  • Configurable verifier registry (F-14). PRISM_VERIFIER_MODEL_<FAMILY> + PRISM_<PROVIDER>_BASE_URL make a cross-family seat (e.g. Ollama-Cloud gpt-oss:120b-cloud) first-class env config. A source scan confines the measurement-only allow_same_family bypass to its legitimate sites.

Fixed

  • Dated-Anthropic-model default → durable alias claude-haiku-4-5.
  • Stale runtime version: prism.__version__ was pinned at 1.0.0 since v1.0 (surfaced by prism --version + the HTTP version).

Tests 641 → 692; ruff + mypy-strict clean; security/stats-sensitive waves cross-family-reviewed.

prism-verify 1.4.0

Choose a tag to compare

@mcp-tool-shop mcp-tool-shop released this 14 Jun 11:42

Runtime verifier-as-a-product — now measured, not asserted.

A full dogfood swarm (health A/B/C + the F-01 "validate-on-own-data" feature pass). Tests 472 → 641.

Highlights

  • Lock-1 family-A/B fixed — the family-different proof had been silently measuring nothing; it now produces a real paired McNemar delta + CI. First real run is an honest null on a small/confounded corpus (see eval/RESULTS.md).
  • Validate-on-own-data — a real-bug corpus (vendored MIT QuixBugs, contamination-honest) + a CodeJudgeBench loader/harness (prism eval --benchmark codejudgebench).
  • Observability — request-id correlation threaded HTTP→engine→providers, structured decision/breaker logging, /healthz reports circuit-breaker state.
  • Eval reproducibility — corpus content-hash + resolved-model + temperature pinned in the report; honest contaminated-vs-fresh accuracy delta.
  • Fixed (CRITICAL) — the npm launcher was shipping a 6-week-stale binary; it now self-syncs the binary pin from package.json.
  • Maintenance: Node-24 action bumps, SHA-pinned publish actions, dependency upper-bounds, CONTRIBUTING.md.

Full detail: CHANGELOG.md. Measured calibration: eval/RESULTS.md.

v1.3.0 — sycophancy lens v1 + prism.security hardening

Choose a tag to compare

@mcp-tool-shop mcp-tool-shop released this 08 Jun 15:17

Added

  • Sycophancy lens v1 — a new verification duty. prism judges a model RESPONSE for regressive sycophancy (telling the user what they want over what is correct — affirming a false premise, abandoning a correct answer under mere pushback), via a family-different, reasoning-stripped fine-tuned specialist (opt-in PRISM_SYCOPHANCY_ENDPOINT, fail-open to abstain — never a silent "not sycophantic"). Adds prism.probes — active capitulation/counterfactual probes.

Security

  • prism.security — input-hardening on a screening copy (opt-in / additive, fail-open): desmuggle (strip zero-width / Unicode-tag / variation-selector / bidi smuggling + NFKC-fold) + spotlight (content-derived unforgeable delimiters, sha256(content)). The citation groundedness prompt is hardened (de-smuggle + unforgeable markers) only for non-certified general-model verifiers — a frozen specialist's certified input is never transformed, so the default path is byte-identical.

Full changelog: CHANGELOG.md.

v1.2.1 — self-hosted verifier & harvest-sink docs

Choose a tag to compare

@mcp-tool-shop mcp-tool-shop released this 06 Jun 07:09

Documentation release.

The README and handbook now cover two opt-in capabilities from 1.2.0:

  • Self-hosted groundedness verifier — run prism's citation-groundedness check against a model you host (PRISM_LOCAL_VERIFIER_ENDPOINT), family-different and fail-open to your hosted verifiers.
  • Harvest sink (PRISM_HARVEST_PATH) — capture the (claim, evidence, verdict) triples to build a training corpus for your own verifier.

New handbook pages document both. No code changes from 1.2.0.

prism-verify v1.2.0

Choose a tag to compare

@mcp-tool-shop mcp-tool-shop released this 06 Jun 05:42

Two opt-in additions to the citation-verification path (both default-off, backward-compatible):

  • Local Verifier specialist backend (prism.providers.local_verifier) — plug a locally-served, fine-tuned groundedness model into the citation lens via PRISM_LOCAL_VERIFIER_ENDPOINT. When set, it becomes the primary citation verifier (fail-over to the hosted/mistral verifiers on circuit-open); when unset, behavior is unchanged. New ModelFamily.LOCAL_VERIFIER; fail-open via ProviderError → circuit breaker.
  • L4 training-data capture sink (prism.eval.harvest, PRISM_HARVEST_PATH) — opt-in JSONL capture of (claim, evidence_span, verdict) triples; the corpus that trains the local verifier.

Full notes: CHANGELOG.md.

v1.1.0 — numeric/unit floor + optional orthogonal NLI floor

Choose a tag to compare

@mcp-tool-shop mcp-tool-shop released this 03 Jun 21:43

Two more deterministic/orthogonal layers for the citation path, both precision-biased (neither can introduce a false-confirm), measured on a 56-case labeled set.

Added

  • Generalized numeric/unit floor — the deterministic numeric guard now catches unit-scale mismatches (a claim's "42 milliarcseconds" where the source says "42 micro-arcseconds") and comparison-direction falsehoods ("observed 5.0σ exceeded expected 5.8σ" when 5.0 < 5.8), beyond the existing percentage check. Refute-or-abstain; runs before the groundedness lens. 0 false-refutes on the labeled set.
  • Optional orthogonal NLI floor (prism.lenses.nli, opt-in) — an encoder NLI cross-encoder (DeBERTa-v3 NLI) vetoes a supported the LLM gave but a mechanistically-different model does not corroborate. Opt-in via the new nli extra (pip install "prism-verify[nli]") + PRISM_NLI_FLOOR; abstains/no-ops when absent, so the default install stays lightweight and CI is unchanged.

Consumers that shell the prism CLI (e.g. role-os's verify-citations) gain the numeric/unit floor automatically on upgrade; the NLI floor is opt-in via PRISM_NLI_FLOOR.

prism-verify v1.0.0 — first stable release

Choose a tag to compare

@mcp-tool-shop mcp-tool-shop released this 03 Jun 18:45
7d3acf6

prism-verify graduates to a stable 1.0.0. The architecture proven across the 0.x series — the four locks (family-different routing, reasoning-stripped, multi-lens ≥ 3, submodularity-aware), the citation-verification layer, the installable + HTTP-callable + independently-verifiable surface with Ed25519 receipts + npm launcher, and the prism eval calibration pack — is now declared stable.

Functional change since 0.5.0

  • CitationResult.source_abstract — the full retrieved abstract is now surfaced on each RESOLVED citation result (previously discarded after grounding the lens) and flows to the verify --type citations JSON. A downstream re-verifier (e.g. role-os verify-citations --local-panel) can judge a faithful claim against the whole abstract instead of one supporting span. Additive, non-breaking.

Install

  • PyPI: uv tool install prism-verify · pipx install prism-verify · pip install prism-verify
  • npx (zero-prereq, verified binary launcher): npx @mcptoolshop/prism-verify
  • Binaries (linux-x64 / darwin-arm64 / win-x64) + checksums attached below.

Published via Trusted Publishing (PyPI OIDC + PEP 740 attestations; npm provenance). Full changelog in CHANGELOG.md.

v0.5.0 — Slice-1 calibration pack + Stage-A hardening

Choose a tag to compare

@mcp-tool-shop mcp-tool-shop released this 02 Jun 09:55

Slice 1 — prism now measures its locks. prism eval runs the lenses over a labeled corpus and reports per-lens precision/recall/MCC, the inter-lens diversity matrix (Krippendorff α + pairwise Cohen κ), submodular coverage-gain, verdict accuracy, and confidence calibration — each with an honest CI.

The first real run (local mistral-small:24b, public split) surfaced a genuine gap in a core lock: the runtime finding-set ρ gate reads 0.0 for every lens pair while decision-level Cohen κ is 0.73–0.81 — the ρ ≤ 0.25 gate is blind to the lens correlation κ reveals. Finding that is the point of the slice. Full results + method: eval/RESULTS.md, design/07.

Stage-A security hardening (same minor): closed the Lock-2 reasoning-strip bypass class (incl. the attribute-tolerant <thinking signature="…"> Anthropic form), provider timeout circuit-breaking, versioned receipt canonicalization (schema v5 — cross-tool Ed25519 verify), and three bounded-memory fixes on the HTTP surface.

Install: uv tool install prism-verify · npx @mcptoolshop/prism-verify · pip install "prism-verify[all]"
Docs: Handbook → Calibration & benchmark.

prism-verify v0.4.2 — npm README logo + landing page & handbook

Choose a tag to compare

@mcp-tool-shop mcp-tool-shop released this 02 Jun 05:22

Added

  • Landing page + Starlight handbook at https://mcp-tool-shop-org.github.io/prism-verify/
    (site-theme + GitHub Pages) — 5 handbook pages (overview / getting-started / http-service /
    receipts / reference) wired to the landing; the repo homepage now points to it.

Docs

  • Logo + PyPI/npm badges on the npm wrapper README (was missing) — now visible on npmjs.com.
  • Landing-page + handbook badges on the main README (the README↔landing connection contract);
    re-translated across all 8 locales.

(Docs/branding only — no code or API changes from 0.4.1.)