Skip to content

Harden the DYADT claim verifier against adversarial-review bypasses#458

Merged
hyperpolymath merged 1 commit into
mainfrom
claude/estate-audit-optimization-h19z12
Jul 3, 2026
Merged

Harden the DYADT claim verifier against adversarial-review bypasses#458
hyperpolymath merged 1 commit into
mainfrom
claude/estate-audit-optimization-h19z12

Conversation

@hyperpolymath

Copy link
Copy Markdown
Owner

Context

Follow-up to #457 (Wave 4, merged), which introduced DYADT — the post-action verifier that checks an agent's claimed outcomes actually happened. Before considering it done, I ran an adversarial review of the reference verifier (3 red-team lenses — bypass, verdict-logic, spec-gaps — each finding independently re-verified). It found real ways to make a false claim pass, or to make the verifier drop or mis-judge a claim.

A claim-checker that can be fooled is worse than none. This PR closes every confirmed hole fail-safe (return unverifiable, never a confident wrong verdict) and locks each with a regression assertion and a conformance vector.

Holes closed (scripts/verify-claims.sh)

Bypass found Now
Unresolvable/empty base ref → confident-wrong created/modified/deleted unverifiable no-base-ref
created confirmed any existing file (untracked build output, etc.) requires a git-tracked file new to the change
Missing required field → false-confirm / silent drop unverifiable missing-field; a block with no id still appears (no silent drop)
Empty / .* expect → unconditional confirm unverifiable empty-pattern; malformed regex → bad-regex
target = absolute / .. / symlink → evidence redirection unverifiable unsafe-path
stdout-contains: matched stderr too matches stdout only (stderr captured separately)
contains:/sha256: on a directory/unreadable file → refuted unverifiable
Licence claim phrased only in statement → auto-confirmed licence detected in class/target/expect/statementmanual-only
claims-compose infinite recursion (cycle / fork bomb) depth-capped at 8
not_before (stale-evidence) unimplemented present → unverifiable (reference collects no timestamps)
Parser only accepted key = "v" whitespace-tolerant (key="v" too)

Spec

VERIFICATION-PROTOCOL.adoc gains two normative sections: Fail-safe requirements (the exhaustive list of "cannot collect trustworthy evidence → unverifiable") and Command execution & sandboxing (command-transcript executes target; untrusted claims MUST be sandboxed; the reference impl is trusted-input-only and says so).

Verification

  • scripts/tests/wave4-dyadt-test.sh: 14/14 (7 new hardening assertions — each proves a specific bypass is now closed).
  • spec/conformance/: 9/9 vectors (added missing-field, unsafe-path, licence-in-statement so the production verifier must handle them too).
  • Dogfood CLAIMS.a2ml still all-confirmed; Waves 0/1/3 tests unaffected; registry + dashboard in sync.

Method: 3-lens red-team fan-out → independent re-verification of each finding → fix only the confirmed ones fail-safe. Licence handling stays manual-only end-to-end.

🤖 Generated with Claude Code


Generated by Claude Code

…sses

An adversarial review of the Wave-4 verifier (3 red-team lenses + independent
verification) found real ways to make a FALSE claim pass, or to make the
verifier drop/mis-judge a claim. A claim-checker that can be fooled is worse
than none, so every confirmed hole is closed fail-safe (unverifiable, never a
confident wrong verdict) and locked with a regression test + conformance vector.

Closed holes (scripts/verify-claims.sh):
- Unresolvable/empty base ref no longer false-confirms `created`/`modified`/
  `deleted` — returns `unverifiable no-base-ref`.
- `created` now requires a git-TRACKED file new to the change; stray untracked
  build output no longer confirms "I created X".
- Missing required field (claim_class/target/expect/verifier) -> unverifiable;
  a `[[claim]]` with no id is no longer silently dropped (appears as a block,
  never confirmed).
- Empty / always-matching `expect` (`contains:` / `stdout-contains:` with empty
  arg) -> unverifiable; malformed regex -> `bad-regex` (not a false refute).
- Unsafe `target` (absolute, `..` traversal, or symlink) -> `unsafe-path`
  (evidence can't be redirected to a known-good file).
- `stdout-contains:` matches STDOUT only — a marker on stderr no longer
  false-confirms.
- `contains:`/`sha256:` on a directory or unreadable file -> unverifiable.
- Licence/SPDX detected in ANY of class/target/expect/STATEMENT -> manual-only
  (previously the statement field was not scanned).
- `claims-compose` recursion depth-capped at 8 (cycle / fork-bomb guard).
- not_before present -> unverifiable (reference impl collects no timestamps).
- Parser is now whitespace-tolerant (`key="v"` and `key = "v"`).

Spec: VERIFICATION-PROTOCOL.adoc gains normative "Fail-safe requirements" and
"Command execution & sandboxing" sections (command-transcript executes target;
untrusted claims MUST be sandboxed; the reference impl is trusted-input only).
Tests: wave4-dyadt-test.sh +7 hardening assertions (14 total); 3 new
conformance vectors (missing-field, unsafe-path, licence-in-statement; 9 total).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0114ps6mY5jAH4Sz
@hyperpolymath hyperpolymath marked this pull request as ready for review July 3, 2026 02:16
@hyperpolymath hyperpolymath enabled auto-merge (squash) July 3, 2026 02:16
@sonarqubecloud

sonarqubecloud Bot commented Jul 3, 2026

Copy link
Copy Markdown

@hyperpolymath hyperpolymath disabled auto-merge July 3, 2026 02:18
@hyperpolymath hyperpolymath enabled auto-merge (squash) July 3, 2026 02:18
@hyperpolymath hyperpolymath disabled auto-merge July 3, 2026 02:19
@hyperpolymath hyperpolymath merged commit 832b157 into main Jul 3, 2026
20 checks passed
@hyperpolymath hyperpolymath deleted the claude/estate-audit-optimization-h19z12 branch July 3, 2026 02:19
hyperpolymath added a commit that referenced this pull request Jul 3, 2026
…names guard, DYADT residual fix (#459)

Two cohesive commits closing out the estate audit-and-optimization
program (umbrella #460). Both fully tested; generated artifacts in sync.

## Commit 1 — Wave 5: per-language testing depth

You flagged this directly: the estate's only per-language testing
*depth* was a single Julia guide from **2024** (no MUST/SHOULD,
Rust+Julia only) next to a **byte-identical duplicate**.

- `language-testing-standards.md` → **v2.0.0**: RFC-2119 requirements
**R1–R9** mapped to the CRG test taxonomy; an anti-theatre rule (no
`continue-on-error` on a MUST check; coverage reported-with-artifact,
not asserted).
- `templates/language-testing-guide-TEMPLATE.md`: the skeleton every
guide follows — requirement-mapping table (tool or **visible** `none`),
tools, SHA-pinned CI, and a **mandatory honest "Known gaps"** section.
- `affinescript-testing-guide.md`: your primary language, previously
with **zero** testing standard — authored honestly (most SHOULD rows are
tracked gaps; R3 notes `affinescript-verify.yml` is advisory). SSOT
migrates to `hyperpolymath/affinescript` prospectively.
- `scripts/check-language-guide.sh` (wired into `just validate`) +
`wave5-language-guides-test.sh` (7/7). Deleted the duplicate snapshot.

## Commit 2 — Wave 6: guard, DYADT residual fix, licence record

- **DYADT residual (#461):** an adversarial review confirmed 16 bypasses
in the Wave-4 verifier; 15 were fixed in #458, and this closes the last
— an always-matching `contains:` regex (`.*`, `^`, `$`, …) no longer
confirms vacuously (`unverifiable trivial-pattern`). Spec pins the
`contains:` dialect to POSIX ERE; conformance vector + assertion added
(10 vectors, 15 assertions).
- **Canonical-names guard** (`check-canonical-names.sh`): blocks
*reintroduction* of the deprecated names (`6a2`→descriptiles,
`agent_instructions`→bot_directives) in **added** diff lines only
(chartered bulk migration untouched). Wired into `just validate` + the
pre-commit hook; `wave6-canonical-names-test.sh` (4/4).
- **`audits/licence-flags-2026-07.adoc`**: flag-only record — the whole
program made no SPDX edits and no auto licence PRs; DYADT treats licence
claims as `manual-only` end to end.

## Verification

All six wave suites pass; DYADT conformance 10/10 + dogfood
all-confirmed; registry + scorecard dashboard in sync.

## Program status (umbrella #460)

Waves 0/1/3/4 + hardening **merged** (#453, #454, #457, #458). This
lands Waves 5 + 6. Remaining estate-wide work is chartered: #461
(verifier residual — **fixed here**), #462 (DYADT production verifier),
#463 (per-language guides completion).

Licence rows `manual-only` throughout (flag-only policy).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants