Estate audit — Wave 3: MUST/SHOULD/COULD compliance scorecards + generated dashboard by hyperpolymath · Pull Request #454 · hyperpolymath/standards

hyperpolymath · 2026-07-03T01:23:48Z

Context

Follow-up to #453 (Waves 0–1, merged). This wave delivers the must / should / could / systems / compliance / effects audit itself: a single, honest, committed view of where every registered standard actually stands — grounded in what exists on disk, not aspiration.

It mirrors the existing build-registry.sh pattern exactly: hand-authored source → deterministic generated artifact → --check in CI.

What's here

The scorecard format — .machine_readable/scorecards/<spec-id>.scorecard.a2ml, one per LOCAL registered spec (28), keyed 1:1 to REGISTRY.a2ml. Each requirement carries:

system — the mechanical check that discharges it, or the literal none (visible, not hidden)
status — pass (evidence required) · fail (real gap) · manual-only (governance/licence) · aspirational (reach target, never counted as pass)
evidence and effects (downstream impact)

The generator — scripts/build-scorecards.sh produces COMPLIANCE-DASHBOARD.md: per-spec MUST verdict + a systems-coverage % (share of requirements with a real mechanical check — the honest measure of enforcement vs. assertion). Deterministic; --check (drift) and --strict (every spec must have a scorecard) modes.

The schema — scorecard.schema.json. Pass-without-evidence and aspirational-as-pass are rejected by construction.

The honest baseline this reveals

Metric	Value
Specs scored	28 / 28
MUST requirements passing	41 / 138 (74 failing)
Estate systems coverage	66% of 328 graded requirements have a real mechanical check
Specs at `✅ MUST-met`	`a2ml` (the rest carry honest gaps)

This is deliberately unflattering — it's the truthful starting line for the uplift, and it operationalises the honest-badge ask (#446): no intuition-plucked Grade-A gate can inflate a score.

Enforcement

registry-verify.yml now also checks the dashboard is current (--strict) with a remediation hint.
Justfile: scorecards, scorecards-check, scorecards-check-strict.
scripts/tests/wave3-scorecards-test.sh (5 assertions): pass-without-evidence rejected, orphan rejected, deterministic, --check detects drift.

Method note

Scorecards were populated by a fan-out of per-spec readers (one per spec home, each grounding claims in on-disk files), then serialised deterministically so the .a2ml format is guaranteed regardless of generator output. Licence rows are manual-only throughout (flag-only policy — no SPDX edits).

Coming next (same track)

Wave 4 — the did-you-actually-do-that post-action claim-verifier spec (the missing LLM-regulation tier); Wave 5 — AffineScript testing standard; Wave 6 — campaign issues (cross-linking #426/#451/#437/#446) + release hygiene.

🤖 Generated with Claude Code

Generated by Claude Code

…board The estate declared standards but had no single, honest view of where each one actually stands. Wave 3 adds a per-spec audit keyed 1:1 to the registry, mirroring the build-registry.sh pattern (hand-authored source → deterministic generated dashboard, --check in CI). - .machine_readable/scorecards/<spec-id>.scorecard.a2ml: one scorecard per LOCAL registered spec (28), each requirement tagged MUST/SHOULD/COULD with a `system` (the mechanical check, or "none" — visible), a `status` (pass|fail|aspirational|manual-only), `evidence` (REQUIRED for pass), and `effects` (downstream impact). Grounded in what exists on disk. - .machine_readable/scorecards/scorecard.schema.json: the logical shape; pass-without-evidence and aspirational-as-pass are rejected by construction. - scripts/build-scorecards.sh: deterministic generator → COMPLIANCE-DASHBOARD.md with per-spec MUST verdict + a "systems coverage %" (share of requirements with a real mechanical check — the honest enforcement-vs-assertion measure). --check (drift) and --strict (every spec must have a scorecard) modes. Aspirational Grade-A gates never count as pass (operationalises standards#446). - COMPLIANCE-DASHBOARD.md (generated): 28/28 specs scored; 41/138 MUSTs passing; 66% estate systems coverage. The honest baseline to drive uplift. - registry-verify.yml: also checks the dashboard is current (strict) with a remediation hint. Justfile: scorecards / scorecards-check[-strict]. - scripts/tests/wave3-scorecards-test.sh (5 assertions): pass-without-evidence rejected, orphan rejected, deterministic, --check detects drift. Scorecard population fanned out across per-spec readers; serialised deterministically so format is guaranteed. Licence rows are manual-only (flag-only policy). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0114ps6mY5jAH4SzbGxeuYjc

sonarqubecloud · 2026-07-03T01:24:38Z

Quality Gate passed

Issues
29 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

…ein in LLMs) (#457) ## Context Follow-up to #453 (Waves 0–1) and #454 (Wave 3), both merged. This wave builds the piece you asked for directly — a system to **rein in LLMs**: a standard that checks, mechanically, that an agent's *claimed* outcomes actually happened. The estate already gates what an agent may do **before** it acts (gatekeeper → AGENTIC → contractiles). It had **nothing** that takes an agent's asserted outcome and confirms it after. Every false-green hole this program has fixed is a special case of one disease: *a claim was trusted instead of verified.* DYADT is the missing **Tier 4**. ## The four-tier accountability pipeline | Tier | Governs | Home | |---|---|---| | 1. Admission | must read the manifest before acting | `0-ai-gatekeeper-protocol/` | | 2. Pre-action | entropy budgets, intent, confirmation | `agentic-a2ml/` | | 3. In-session gates | contractile MUST/TRUST/… at close/push/merge | `contractiles/` | | **4. Post-action (this PR)** | **claimed X → mechanically confirm/refute X** | **`did-you-actually-do-that/`** | ## What's here **Spec set** (`did-you-actually-do-that/`): `README` (pipeline binding) · `spec/CLAIM-FORMAT.adoc` (typed claims) · `spec/VERIFICATION-PROTOCOL.adoc` (the `confirmed`/`refuted`/`unverifiable` contract — **unverifiable is loud, never green**; a verifier must *re-derive* evidence, never read back the agent's own `evidence` field) · `spec/CONSEQUENCE-LEDGER.adoc` (append-only, dual-signed, per-actor confirmation rate that Tier-3 MAY gate on) · `spec/conformance/` (6 executable vectors + runner) · `docs/NAMING-RESOLUTION.adoc` (resolves the PLASMA collision). **Executable + dogfooded:** - `scripts/verify-claims.sh` — reference verifier (local verifiers real; network/manual return `unverifiable`). - Root `CLAIMS.a2ml` — **7 claims about this very change**, re-derived from primary evidence; `dyadt-verify.yml` runs the verifier + conformance suite in CI. If a claim here were false, CI **refutes** it and fails. The spec's first conformance run is on itself. - `scripts/tests/wave4-dyadt-test.sh` (7/7) — proves a false claim is REFUTED despite an honest-sounding statement, and the incompatible-verifier + manual-only guards fire. **Registered + graded:** added to `build-registry.sh` (32 specs); honest scorecard (5/5 MUST met, 90% systems coverage — the network verifier is an honest `fail`, since only the production impl does forge/CI APIs). ## Boundary This repo is the **declaration layer**: it ships the normative spec + a reference verifier + the dogfood. The **production actuator** (continuous, in-session, wired to hypatia/gitbot-fleet with real ledger enforcement) is chartered for `hyperpolymath/did-you-actually-do-that`, built against these conformance vectors — it MUST NOT diverge from this contract. That's the parallel session you flagged. Licence/SPDX is `manual-only` end-to-end (flag-only policy — a licence claim is always `unverifiable: manual-only`). ## Coming next (same track) Wave 5 — AffineScript testing standard + template; Wave 6 — campaign issues (cross-linking #426/#451/#437/#446) + release hygiene. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- _Generated by [Claude Code](https://claude.ai/code/session_0114ps6mY5jAH4SzbGxeuYjc)_ Co-authored-by: Claude <noreply@anthropic.com>

…names guard, DYADT residual fix (#459) Two cohesive commits closing out the estate audit-and-optimization program (umbrella #460). Both fully tested; generated artifacts in sync. ## Commit 1 — Wave 5: per-language testing depth You flagged this directly: the estate's only per-language testing *depth* was a single Julia guide from **2024** (no MUST/SHOULD, Rust+Julia only) next to a **byte-identical duplicate**. - `language-testing-standards.md` → **v2.0.0**: RFC-2119 requirements **R1–R9** mapped to the CRG test taxonomy; an anti-theatre rule (no `continue-on-error` on a MUST check; coverage reported-with-artifact, not asserted). - `templates/language-testing-guide-TEMPLATE.md`: the skeleton every guide follows — requirement-mapping table (tool or **visible** `none`), tools, SHA-pinned CI, and a **mandatory honest "Known gaps"** section. - `affinescript-testing-guide.md`: your primary language, previously with **zero** testing standard — authored honestly (most SHOULD rows are tracked gaps; R3 notes `affinescript-verify.yml` is advisory). SSOT migrates to `hyperpolymath/affinescript` prospectively. - `scripts/check-language-guide.sh` (wired into `just validate`) + `wave5-language-guides-test.sh` (7/7). Deleted the duplicate snapshot. ## Commit 2 — Wave 6: guard, DYADT residual fix, licence record - **DYADT residual (#461):** an adversarial review confirmed 16 bypasses in the Wave-4 verifier; 15 were fixed in #458, and this closes the last — an always-matching `contains:` regex (`.*`, `^`, `$`, …) no longer confirms vacuously (`unverifiable trivial-pattern`). Spec pins the `contains:` dialect to POSIX ERE; conformance vector + assertion added (10 vectors, 15 assertions). - **Canonical-names guard** (`check-canonical-names.sh`): blocks *reintroduction* of the deprecated names (`6a2`→descriptiles, `agent_instructions`→bot_directives) in **added** diff lines only (chartered bulk migration untouched). Wired into `just validate` + the pre-commit hook; `wave6-canonical-names-test.sh` (4/4). - **`audits/licence-flags-2026-07.adoc`**: flag-only record — the whole program made no SPDX edits and no auto licence PRs; DYADT treats licence claims as `manual-only` end to end. ## Verification All six wave suites pass; DYADT conformance 10/10 + dogfood all-confirmed; registry + scorecard dashboard in sync. ## Program status (umbrella #460) Waves 0/1/3/4 + hardening **merged** (#453, #454, #457, #458). This lands Waves 5 + 6. Remaining estate-wide work is chartered: #461 (verifier residual — **fixed here**), #462 (DYADT production verifier), #463 (per-language guides completion). Licence rows `manual-only` throughout (flag-only policy). 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude <noreply@anthropic.com>

hyperpolymath marked this pull request as ready for review July 3, 2026 01:38

hyperpolymath enabled auto-merge (squash) July 3, 2026 01:38

hyperpolymath disabled auto-merge July 3, 2026 01:41

hyperpolymath merged commit 2bad986 into main Jul 3, 2026
19 checks passed

hyperpolymath deleted the claude/estate-audit-optimization-h19z12 branch July 3, 2026 01:41

hyperpolymath mentioned this pull request Jul 3, 2026

Estate audit — Wave 4: DYADT, post-action agent-claim verification (rein in LLMs) #457

Merged

This was referenced Jul 3, 2026

[umbrella] Estate audit & optimization — make the enforcement real (Waves 0–6) #460

Open

Estate audit — Waves 5 + 6: per-language testing standard, canonical-names guard, DYADT residual fix #459

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Estate audit — Wave 3: MUST/SHOULD/COULD compliance scorecards + generated dashboard#454

Estate audit — Wave 3: MUST/SHOULD/COULD compliance scorecards + generated dashboard#454
hyperpolymath merged 1 commit into
mainfrom
claude/estate-audit-optimization-h19z12

hyperpolymath commented Jul 3, 2026

Uh oh!

sonarqubecloud Bot commented Jul 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

hyperpolymath commented Jul 3, 2026

Context

What's here

The honest baseline this reveals

Enforcement

Method note

Coming next (same track)

Uh oh!

sonarqubecloud Bot commented Jul 3, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants