Harden the DYADT claim verifier against adversarial-review bypasses#458
Merged
Merged
Conversation
…sses An adversarial review of the Wave-4 verifier (3 red-team lenses + independent verification) found real ways to make a FALSE claim pass, or to make the verifier drop/mis-judge a claim. A claim-checker that can be fooled is worse than none, so every confirmed hole is closed fail-safe (unverifiable, never a confident wrong verdict) and locked with a regression test + conformance vector. Closed holes (scripts/verify-claims.sh): - Unresolvable/empty base ref no longer false-confirms `created`/`modified`/ `deleted` — returns `unverifiable no-base-ref`. - `created` now requires a git-TRACKED file new to the change; stray untracked build output no longer confirms "I created X". - Missing required field (claim_class/target/expect/verifier) -> unverifiable; a `[[claim]]` with no id is no longer silently dropped (appears as a block, never confirmed). - Empty / always-matching `expect` (`contains:` / `stdout-contains:` with empty arg) -> unverifiable; malformed regex -> `bad-regex` (not a false refute). - Unsafe `target` (absolute, `..` traversal, or symlink) -> `unsafe-path` (evidence can't be redirected to a known-good file). - `stdout-contains:` matches STDOUT only — a marker on stderr no longer false-confirms. - `contains:`/`sha256:` on a directory or unreadable file -> unverifiable. - Licence/SPDX detected in ANY of class/target/expect/STATEMENT -> manual-only (previously the statement field was not scanned). - `claims-compose` recursion depth-capped at 8 (cycle / fork-bomb guard). - not_before present -> unverifiable (reference impl collects no timestamps). - Parser is now whitespace-tolerant (`key="v"` and `key = "v"`). Spec: VERIFICATION-PROTOCOL.adoc gains normative "Fail-safe requirements" and "Command execution & sandboxing" sections (command-transcript executes target; untrusted claims MUST be sandboxed; the reference impl is trusted-input only). Tests: wave4-dyadt-test.sh +7 hardening assertions (14 total); 3 new conformance vectors (missing-field, unsafe-path, licence-in-statement; 9 total). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0114ps6mY5jAH4Sz
|
This was referenced Jul 3, 2026
hyperpolymath
added a commit
that referenced
this pull request
Jul 3, 2026
…names guard, DYADT residual fix (#459) Two cohesive commits closing out the estate audit-and-optimization program (umbrella #460). Both fully tested; generated artifacts in sync. ## Commit 1 — Wave 5: per-language testing depth You flagged this directly: the estate's only per-language testing *depth* was a single Julia guide from **2024** (no MUST/SHOULD, Rust+Julia only) next to a **byte-identical duplicate**. - `language-testing-standards.md` → **v2.0.0**: RFC-2119 requirements **R1–R9** mapped to the CRG test taxonomy; an anti-theatre rule (no `continue-on-error` on a MUST check; coverage reported-with-artifact, not asserted). - `templates/language-testing-guide-TEMPLATE.md`: the skeleton every guide follows — requirement-mapping table (tool or **visible** `none`), tools, SHA-pinned CI, and a **mandatory honest "Known gaps"** section. - `affinescript-testing-guide.md`: your primary language, previously with **zero** testing standard — authored honestly (most SHOULD rows are tracked gaps; R3 notes `affinescript-verify.yml` is advisory). SSOT migrates to `hyperpolymath/affinescript` prospectively. - `scripts/check-language-guide.sh` (wired into `just validate`) + `wave5-language-guides-test.sh` (7/7). Deleted the duplicate snapshot. ## Commit 2 — Wave 6: guard, DYADT residual fix, licence record - **DYADT residual (#461):** an adversarial review confirmed 16 bypasses in the Wave-4 verifier; 15 were fixed in #458, and this closes the last — an always-matching `contains:` regex (`.*`, `^`, `$`, …) no longer confirms vacuously (`unverifiable trivial-pattern`). Spec pins the `contains:` dialect to POSIX ERE; conformance vector + assertion added (10 vectors, 15 assertions). - **Canonical-names guard** (`check-canonical-names.sh`): blocks *reintroduction* of the deprecated names (`6a2`→descriptiles, `agent_instructions`→bot_directives) in **added** diff lines only (chartered bulk migration untouched). Wired into `just validate` + the pre-commit hook; `wave6-canonical-names-test.sh` (4/4). - **`audits/licence-flags-2026-07.adoc`**: flag-only record — the whole program made no SPDX edits and no auto licence PRs; DYADT treats licence claims as `manual-only` end to end. ## Verification All six wave suites pass; DYADT conformance 10/10 + dogfood all-confirmed; registry + scorecard dashboard in sync. ## Program status (umbrella #460) Waves 0/1/3/4 + hardening **merged** (#453, #454, #457, #458). This lands Waves 5 + 6. Remaining estate-wide work is chartered: #461 (verifier residual — **fixed here**), #462 (DYADT production verifier), #463 (per-language guides completion). Licence rows `manual-only` throughout (flag-only policy). 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Context
Follow-up to #457 (Wave 4, merged), which introduced DYADT — the post-action verifier that checks an agent's claimed outcomes actually happened. Before considering it done, I ran an adversarial review of the reference verifier (3 red-team lenses — bypass, verdict-logic, spec-gaps — each finding independently re-verified). It found real ways to make a false claim pass, or to make the verifier drop or mis-judge a claim.
A claim-checker that can be fooled is worse than none. This PR closes every confirmed hole fail-safe (return
unverifiable, never a confident wrong verdict) and locks each with a regression assertion and a conformance vector.Holes closed (
scripts/verify-claims.sh)created/modified/deletedunverifiable no-base-refcreatedconfirmed any existing file (untracked build output, etc.)unverifiable missing-field; a block with noidstill appears (no silent drop).*expect→ unconditional confirmunverifiable empty-pattern; malformed regex →bad-regextarget= absolute /../ symlink → evidence redirectionunverifiable unsafe-pathstdout-contains:matched stderr toocontains:/sha256:on a directory/unreadable file →refutedunverifiablestatement→ auto-confirmedmanual-onlyclaims-composeinfinite recursion (cycle / fork bomb)not_before(stale-evidence) unimplementedunverifiable(reference collects no timestamps)key = "v"key="v"too)Spec
VERIFICATION-PROTOCOL.adocgains two normative sections: Fail-safe requirements (the exhaustive list of "cannot collect trustworthy evidence → unverifiable") and Command execution & sandboxing (command-transcriptexecutestarget; untrusted claims MUST be sandboxed; the reference impl is trusted-input-only and says so).Verification
scripts/tests/wave4-dyadt-test.sh: 14/14 (7 new hardening assertions — each proves a specific bypass is now closed).spec/conformance/: 9/9 vectors (addedmissing-field,unsafe-path,licence-in-statementso the production verifier must handle them too).CLAIMS.a2mlstill all-confirmed; Waves 0/1/3 tests unaffected; registry + dashboard in sync.Method: 3-lens red-team fan-out → independent re-verification of each finding → fix only the confirmed ones fail-safe. Licence handling stays
manual-onlyend-to-end.🤖 Generated with Claude Code
Generated by Claude Code