Skip to content

Expose V4 audit committed fields to Python via deserialize_v4_audit#5

Open
ClementWalter wants to merge 1 commit intolambdaclass:mainfrom
ClementWalter:feat/verify-output-binding
Open

Expose V4 audit committed fields to Python via deserialize_v4_audit#5
ClementWalter wants to merge 1 commit intolambdaclass:mainfrom
ClementWalter:feat/verify-output-binding

Conversation

@ClementWalter
Copy link
Copy Markdown

@ClementWalter ClementWalter commented Apr 21, 2026

Why

verify_v4_binary cryptographically binds the audit binary to a specific output_text (via the detokenization check that decodes committed token IDs and compares them to response.output_text under the committed detokenization policy). That guarantees the binary is self-consistent, but it does not surface that text to the caller.

Relay-layer callers — e.g. a server that receives an audit binary alongside a client-declared output and wants to post/persist the attested inference — currently have two options, both bad:

  1. Re-implement the binary wire format in their own language to read the committed fields. Brittle, hard to keep in sync, no benefit of Rust-side schema evolution.
  2. Trust a parallel "claimed output" value the client supplies on the wire. This is what's deployed today in several downstream projects, and it opens a re-binding gap: a valid binary can be re-attached to arbitrary new outputs because the server never consults the binary's own attested output.

An earlier draft of this PR added a comparison-style verify_output_binding(response, expected_output_text) function. Reviewing the design exposed the real issue: since the binary already contains the canonical output, the correct answer is not "compare client claim against binary" but "read the binary directly and use that." The client's "claimed output" field should not exist on the wire in the first place.

What

Adds verilm_rs.deserialize_v4_audit(audit_binary: bytes) -> dict in crates/verilm-py. It deserializes a V4 audit binary via the already-in-tree verilm_core::serialize::deserialize_v4_audit and returns the publicly-committed fields as a plain Python dict:

{
    "token_index":      int,
    "token_id":         int,
    "prev_io_hash":     bytes,        # 32 bytes
    "revealed_seed":    bytes,        # 32 bytes
    "output_text":      str | None,
    "prompt":           bytes | None,
    "n_prompt_tokens":  int | None,
    "prefix_token_ids": list[int],
    "commitment": {
        "version":          str,         # "v4"
        "n_tokens":         int,
        "merkle_root":      bytes,       # 32 bytes
        "io_root":          bytes,       # 32 bytes
        "prompt_hash":      bytes | None,
        "seed_commitment":  bytes | None,
        "manifest_hash":    bytes | None,
        "input_spec_hash":  bytes | None,
        "model_spec_hash":  bytes | None,
        "decode_spec_hash": bytes | None,
        "output_spec_hash": bytes | None,
        "n_prompt_tokens":  int | None,
    }
}

Verification internals — Merkle proofs, retained state, shell openings, KV entries/roots/proofs, prefix embeddings, witnessed scores — are intentionally not exposed. They are consumed by the verifier and have no meaning to a downstream caller that is not reimplementing the verifier.

Companion changes:

  • Registered in the PyO3 module next to the other public pyfunctions.
  • Python tests under crates/verilm-py/tests/test_deserialize_v4_audit.py covering symbol presence + malformed-input error shape.
  • CHANGELOG.md entry under today's date.

No existing APIs change. No existing behaviour changes. Only one new additive function.

How

The function is a thin Rust → Python projection:

#[pyfunction]
fn deserialize_v4_audit<'py>(py: Python<'py>, audit_binary: &[u8]) -> PyResult<Bound<'py, PyDict>> {
    let r = verilm_core::serialize::deserialize_v4_audit(audit_binary)
        .map_err(|e| PyValueError::new_err(format!("failed to deserialize V4 binary: {}", e)))?;
    // ...fill a PyDict with the public fields...
}

A small opt_bytes_32 helper lifts Option<[u8; 32]> into Option<PyBytes> so commitment hashes are either Python bytes or None.

Intended usage pattern

With this function, a relay server's flow becomes:

# 1. verify the binary is well-formed and self-consistent
report = verilm_rs.verify_v4_binary(audit_binary, key_json)
if not report["passed"]:
    reject()

# 2. read the canonical output straight from the binary
audit = verilm_rs.deserialize_v4_audit(audit_binary)
content = audit["output_text"]  # authoritative, not caller-supplied
post(content)

No modelOutput field on the wire. No re-binding gap. The binary is the source of truth.

Alternatives considered

  1. Comparison-style verify_output_binding(response, expected_output_text) (previous revision of this PR). Matches the verify_input_tokenization pattern mechanically but asks the wrong question: the input-side comparison exists because the tokenizer is external to the binary, so the caller has to apply a tokenizer and compare. On the output side the committed text is inside the binary — comparison is redundant with extraction, and forces callers to keep carrying a redundant "claimed output" on the wire.
  2. Full #[pyclass] V4AuditResponse with typed getters. Best ergonomics, but ~24 fields × nested structs = a lot of wrapping. A dict keeps the surface small, is serde-friendly, and matches the existing convention (e.g. verify_v4_binary already returns a PyDict report).
  3. Narrow single getter extract_output_text(audit_binary) -> str | None. Smaller but loses the commitment hashes that relay servers also want (e.g. to cross-check the deployment manifest the binary was audited against).

Checklist

  • New function is additive; no existing API changed.
  • Python test file added (symbol presence, malformed-input error shape).
  • CHANGELOG updated under 2026-04-21.
  • cargo check -p verilm-py clean.
  • Marked draft — happy to iterate on function naming (deserialize_v4_audit vs inspect_v4_audit vs read_v4_audit_metadata), dict shape (flatten commitment vs keep nested), and whether to strip-or-keep fields I categorized as "internal" (e.g. prev_io_hash, revealed_seed).

Verify_v4_binary cryptographically binds the audit binary to a specific
output_text (via the detokenization check) but does not surface that
text to the caller. Relay-layer callers — servers that receive an
audit binary alongside a client-declared output — currently have to
either (a) re-implement the binary wire format to read the committed
fields, or (b) trust a parallel "claimed output" value the client
supplies on the wire. Option (b) is what opens a re-binding gap: a
valid binary can be re-attached to arbitrary new outputs because the
server never consults the binary's own attested output.

This change adds verilm_rs.deserialize_v4_audit(audit_binary) which
returns the publicly-committed fields as a Python dict: output_text,
prompt, n_prompt_tokens, prefix_token_ids, token_index, token_id,
plus a nested commitment dict with n_tokens, version, and the spec /
manifest / prompt / seed hashes. Internal verification state (Merkle
proofs, retained state, shell openings, KV transcripts, prefix
embeddings, witnessed scores) is intentionally omitted.

With this, a relay server can drop the client-supplied "claimed output"
from its wire protocol entirely and treat the binary as the single
source of truth: verify_v4_binary(bin, key) first, then
deserialize_v4_audit(bin)["output_text"] as the canonical content.
@ClementWalter ClementWalter force-pushed the feat/verify-output-binding branch from 86401f1 to f76c679 Compare April 21, 2026 13:28
@ClementWalter ClementWalter changed the title Add verify_output_binding for relay-layer output binding checks Expose V4 audit committed fields to Python via deserialize_v4_audit Apr 21, 2026
@ClementWalter ClementWalter marked this pull request as ready for review April 21, 2026 13:54
ClementWalter added a commit to ClementWalter/agent-captcha that referenced this pull request Apr 21, 2026
The sidecar's output-hash binding check referenced
`verilm_rs.deserialize_v4_audit`, which upstream CommitLLM never
exported to Python. Every real verify hit the fail-closed
'output_hash_binding_unverifiable' branch because the function did
not exist in the deployed wheel.

Pin the CommitLLM git clone to ClementWalter/CommitLLM@f76c679 which
ships the pyfunction (upstream PR lambdaclass/CommitLLM#5). The
function returns the publicly-committed fields of a V4 audit binary
as a dict; we extract `output_text`, sha256-hash it, and compare to
the `expected_output_hash` the Node server sends. Same binding
contract as before, but with a real function on the other end of
the call.

Revert the pin to upstream once PR #5 merges.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant