Estate audit & optimization — Wave 0: kill the false green (holes before goals) by hyperpolymath · Pull Request #453 · hyperpolymath/standards

hyperpolymath · 2026-07-03T00:39:39Z

Context

This is the first wave of a meticulous, estate-wide must / should / could / systems / compliance / effects audit-and-uplift program anchored in standards (the ~290-repo control plane). A 7-agent read-only recon found the core problem: the declarations are rich but the enforcement is largely theatre — validators pass vacuously, mandatory-check runners aren't wired, and there's no post-action check on agent claims. Per estate doctrine (RSR-PHILOSOPHY.adoc) the program is ordered holes before goals, solutions at source, always fail loudly.

This PR is the branch trunk; subsequent waves land as further commits.

Wave 0 — Kill the false green (this commit)

Every validator that reported success while checking nothing, or masked real errors behind || true, is now state-aware and provably able to fail, each backed by a red-team regression fixture.

Fix	File	Hole closed
State-aware 6scm check	`a2ml/scripts/check-6scm.sh`	Exited 0 vacuously once `.scm` sources migrated to 6a2; now fails on orphan-mirror drift / out-of-sync, and declares itself retired (not "in sync") when obsolete
Real Mustfile validation	`.github/workflows/boj-build.yml` + `scripts/check-mustfile-structure.sh`	Replaced `echo "K9 validation would run here"` with a real structural check — every Mustfile check must carry a severity + a means of discharge (`- run:` or `- verification:`); hollow checks fail loudly
Arg parsing (#387)	`rhodium-standard-repositories/rsr-audit.sh`	Documented `--format json` silently didn't work; invalid format now exits 4 instead of defaulting to text; bare-positional form kept for back-compat
De-mask `\|\| true`	`Justfile` `validate` + `scripts/rsr-selfaudit.sh`	A broken audit (exit 4) was swallowed as green; now a low grade is informational/non-blocking but an audit error fails loudly. `registry-check` stays the hard gate
De-hardcode paths	`audit-contractiles.sh`	Removed `/var/mnt/eclipse/...` owner-machine paths (couldn't run in CI); takes args/`$CONTRACTILE_AUDIT_REPOS`, never audits zero repos

Verification

scripts/tests/wave0-false-green-test.sh — 13/13, exercising the pass and fail path of every fixed validator (just false-green-test).
Existing check-workflow-staleness-test.sh still passes; boj-build.yml validates as YAML; all changed scripts pass bash -n.
The real Mustfile passes structural validation (17 checks); a hollow check fails; rsr-audit --format json emits JSON; an errored audit fails validate.

Not touched (by policy / scope)

Licence/SPDX — flag-only throughout; no header edits.
continue-on-error soft-gates — left in place (documented, with real blocking equivalents elsewhere); honest-labelling / promotion is a Wave 1 item, not a blanket removal that would manufacture red noise.

Coming in later waves (same branch)

Wave 1 (wire the automation that never runs), Wave 3 (MUST/SHOULD/COULD scorecards + compliance dashboard), Wave 4 (the did-you-actually-do-that post-action claim-verifier spec — the missing LLM-regulation tier), Wave 5 (AffineScript testing standard + template), Wave 6 (campaign issues cross-linking #426/#451/#437/#381/#387/#446, release hygiene).

🤖 Generated with Claude Code

Generated by Claude Code

Holes-before-goals pass on the enforcement surface: several validators reported success while checking nothing, or masked real errors behind `|| true`. Each is now state-aware and provably able to fail, with a permanent regression test that exercises both the pass and fail paths (a check that cannot fail is not a check). - a2ml/scripts/check-6scm.sh: no longer exits 0 vacuously when the .scm sources have migrated away. Distinguishes obsolete-no-op (retired, superseded by 6a2/descriptiles), orphan-mirror DRIFT (fails), and out-of-sync mirror (fails). - .github/workflows/boj-build.yml: replace the "K9 validation would run here" placeholder with real structural validation via new scripts/check-mustfile-structure.sh (every Mustfile check must carry a severity and a means of discharge — `- run:` or `- verification:`; hollow checks fail loudly). - rhodium-standard-repositories/rsr-audit.sh: fix argument parsing (standards#387) so the documented `--format json` works and an invalid format errors (exit 4) instead of silently defaulting to text; keep the bare-positional form for backward compatibility. - Justfile `validate`: drop the blanket `|| true` on the RSR self-audit. Via scripts/rsr-selfaudit.sh a low grade stays informational/non-blocking (a monorepo is not expected to score Gold) but a broken audit (exit 4) now fails validate loudly. registry-check remains the hard gate. - audit-contractiles.sh: remove hardcoded /var/mnt/eclipse owner-machine paths; take repos as args or $CONTRACTILE_AUDIT_REPOS, default to self, and never silently audit zero repos. Fix a corrupted box-drawing char. - scripts/tests/wave0-false-green-test.sh: 13 assertions covering the pass + fail path of each fixed validator; Justfile recipes `false-green-test` and `mustfile-check`. Licence/SPDX untouched (flag-only policy). continue-on-error soft-gates left in place (documented, with real blocking equivalents) — honest labelling / promotion tracked for Wave 1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0114ps6mY5jAH4SzbGxeuYjc

Wave-0 edits to files inside the a2ml and rhodium-standard-repositories spec trees changed their content source_hash, so the generated registry drifted (the recurring standards#381 failure). Regenerated deterministically via scripts/build-registry.sh; TOPOLOGY.md unchanged. A pre-commit regen hook + CI remediation hint lands in Wave 1 to stop this recurring. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0114ps6mY5jAH4SzbGxeuYjc

…ptimization-h19z12

… drift) origin/main (6df21b1, PR #452) edited files under spec homes without regenerating the derived registry, leaving main itself failing `build-registry.sh --check` (standards#381/#399 — drift blocks every PR). The PR merge-check inherited that drift. Merging main into this branch and regenerating brings the registry back in sync with the combined tree (a2ml + k9-svc source hashes). Wave 1 adds hook auto-install + a CI remediation hint so this stops recurring. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0114ps6mY5jAH4SzbGxeuYjc

The estate declared mandatory checks and install-only hooks that nothing executed. Wave 1 wires them so the enforcement is real, not aspirational. - scripts/run-mustfile.sh: EXECUTE the Mustfile's `- run:` invariants (16 checks that were declared but never run by any CI job). Critical/high failures block; warning-severity failures are advisory; `- verification:` checks are reported as MANUAL (counted, never silently green). Wired into boj-build.yml's contractile job as a real "Mustfile Enforcement" step (the repo passes all 16 today, so it gates honestly). Justfile: must-check. - hooks/install.sh + Justfile hooks-install: install the pre-commit guard (language policy + registry-drift) into .git/hooks. Previously the hooks only ran if a contributor manually copied them; now `just hooks-install` wires a thin shim that execs the tracked hook (single source of truth). - registry-verify.yml: on drift, write the exact remediation (`just registry` + `git add` + `just hooks-install`) to the CI job summary instead of a bare exit 1 — this is the recurring standards#381 pain made self-explaining. - hypatia-scan-reusable.yml: the "critical issues" step is honestly labelled ADVISORY / does-not-gate (echo + job summary) so a green check carrying critical findings is not mistaken for zero findings. Blocking promotion tracked at standards#399/#437. (no-js-scan.yml was already honest.) - just doctor: report whether the optional contractile.just import resolved and whether the pre-commit hook is installed (import? fails silent otherwise). - scripts/tests/wave1-automation-test.sh (9 assertions) + Justfile automation-test: prove the Mustfile runner blocks on critical/high, stays advisory on warning, and the hook installer is idempotent. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0114ps6mY5jAH4SzbGxeuYjc

sonarqubecloud · 2026-07-03T00:48:47Z

Quality Gate passed

Issues
51 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

…rated dashboard (#454) ## Context Follow-up to #453 (Waves 0–1, merged). This wave delivers the **must / should / could / systems / compliance / effects** audit itself: a single, honest, committed view of where every registered standard actually stands — grounded in what exists on disk, not aspiration. It mirrors the existing `build-registry.sh` pattern exactly: hand-authored source → deterministic generated artifact → `--check` in CI. ## What's here **The scorecard format** — `.machine_readable/scorecards/<spec-id>.scorecard.a2ml`, one per LOCAL registered spec (28), keyed 1:1 to `REGISTRY.a2ml`. Each requirement carries: - `system` — the mechanical check that discharges it, or the literal `none` (**visible**, not hidden) - `status` — `pass` (evidence **required**) · `fail` (real gap) · `manual-only` (governance/licence) · `aspirational` (reach target, **never** counted as pass) - `evidence` and `effects` (downstream impact) **The generator** — `scripts/build-scorecards.sh` produces `COMPLIANCE-DASHBOARD.md`: per-spec MUST verdict + a **systems-coverage %** (share of requirements with a real mechanical check — the honest measure of *enforcement vs. assertion*). Deterministic; `--check` (drift) and `--strict` (every spec must have a scorecard) modes. **The schema** — `scorecard.schema.json`. Pass-without-evidence and aspirational-as-pass are rejected by construction. ## The honest baseline this reveals | Metric | Value | |---|---| | Specs scored | **28 / 28** | | MUST requirements passing | **41 / 138** (74 failing) | | Estate systems coverage | **66%** of 328 graded requirements have a real mechanical check | | Specs at `✅ MUST-met` | `a2ml` (the rest carry honest gaps) | This is deliberately unflattering — it's the truthful starting line for the uplift, and it operationalises the honest-badge ask (#446): no intuition-plucked Grade-A gate can inflate a score. ## Enforcement - `registry-verify.yml` now also checks the dashboard is current (`--strict`) with a remediation hint. - `Justfile`: `scorecards`, `scorecards-check`, `scorecards-check-strict`. - `scripts/tests/wave3-scorecards-test.sh` (5 assertions): pass-without-evidence rejected, orphan rejected, deterministic, `--check` detects drift. ## Method note Scorecards were populated by a fan-out of per-spec readers (one per spec home, each grounding claims in on-disk files), then **serialised deterministically** so the `.a2ml` format is guaranteed regardless of generator output. Licence rows are `manual-only` throughout (flag-only policy — no SPDX edits). ## Coming next (same track) Wave 4 — the `did-you-actually-do-that` post-action claim-verifier spec (the missing LLM-regulation tier); Wave 5 — AffineScript testing standard; Wave 6 — campaign issues (cross-linking #426/#451/#437/#446) + release hygiene. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- _Generated by [Claude Code](https://claude.ai/code/session_0114ps6mY5jAH4SzbGxeuYjc)_ Co-authored-by: Claude <noreply@anthropic.com>

…ein in LLMs) (#457) ## Context Follow-up to #453 (Waves 0–1) and #454 (Wave 3), both merged. This wave builds the piece you asked for directly — a system to **rein in LLMs**: a standard that checks, mechanically, that an agent's *claimed* outcomes actually happened. The estate already gates what an agent may do **before** it acts (gatekeeper → AGENTIC → contractiles). It had **nothing** that takes an agent's asserted outcome and confirms it after. Every false-green hole this program has fixed is a special case of one disease: *a claim was trusted instead of verified.* DYADT is the missing **Tier 4**. ## The four-tier accountability pipeline | Tier | Governs | Home | |---|---|---| | 1. Admission | must read the manifest before acting | `0-ai-gatekeeper-protocol/` | | 2. Pre-action | entropy budgets, intent, confirmation | `agentic-a2ml/` | | 3. In-session gates | contractile MUST/TRUST/… at close/push/merge | `contractiles/` | | **4. Post-action (this PR)** | **claimed X → mechanically confirm/refute X** | **`did-you-actually-do-that/`** | ## What's here **Spec set** (`did-you-actually-do-that/`): `README` (pipeline binding) · `spec/CLAIM-FORMAT.adoc` (typed claims) · `spec/VERIFICATION-PROTOCOL.adoc` (the `confirmed`/`refuted`/`unverifiable` contract — **unverifiable is loud, never green**; a verifier must *re-derive* evidence, never read back the agent's own `evidence` field) · `spec/CONSEQUENCE-LEDGER.adoc` (append-only, dual-signed, per-actor confirmation rate that Tier-3 MAY gate on) · `spec/conformance/` (6 executable vectors + runner) · `docs/NAMING-RESOLUTION.adoc` (resolves the PLASMA collision). **Executable + dogfooded:** - `scripts/verify-claims.sh` — reference verifier (local verifiers real; network/manual return `unverifiable`). - Root `CLAIMS.a2ml` — **7 claims about this very change**, re-derived from primary evidence; `dyadt-verify.yml` runs the verifier + conformance suite in CI. If a claim here were false, CI **refutes** it and fails. The spec's first conformance run is on itself. - `scripts/tests/wave4-dyadt-test.sh` (7/7) — proves a false claim is REFUTED despite an honest-sounding statement, and the incompatible-verifier + manual-only guards fire. **Registered + graded:** added to `build-registry.sh` (32 specs); honest scorecard (5/5 MUST met, 90% systems coverage — the network verifier is an honest `fail`, since only the production impl does forge/CI APIs). ## Boundary This repo is the **declaration layer**: it ships the normative spec + a reference verifier + the dogfood. The **production actuator** (continuous, in-session, wired to hypatia/gitbot-fleet with real ledger enforcement) is chartered for `hyperpolymath/did-you-actually-do-that`, built against these conformance vectors — it MUST NOT diverge from this contract. That's the parallel session you flagged. Licence/SPDX is `manual-only` end-to-end (flag-only policy — a licence claim is always `unverifiable: manual-only`). ## Coming next (same track) Wave 5 — AffineScript testing standard + template; Wave 6 — campaign issues (cross-linking #426/#451/#437/#446) + release hygiene. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- _Generated by [Claude Code](https://claude.ai/code/session_0114ps6mY5jAH4SzbGxeuYjc)_ Co-authored-by: Claude <noreply@anthropic.com>

…names guard, DYADT residual fix (#459) Two cohesive commits closing out the estate audit-and-optimization program (umbrella #460). Both fully tested; generated artifacts in sync. ## Commit 1 — Wave 5: per-language testing depth You flagged this directly: the estate's only per-language testing *depth* was a single Julia guide from **2024** (no MUST/SHOULD, Rust+Julia only) next to a **byte-identical duplicate**. - `language-testing-standards.md` → **v2.0.0**: RFC-2119 requirements **R1–R9** mapped to the CRG test taxonomy; an anti-theatre rule (no `continue-on-error` on a MUST check; coverage reported-with-artifact, not asserted). - `templates/language-testing-guide-TEMPLATE.md`: the skeleton every guide follows — requirement-mapping table (tool or **visible** `none`), tools, SHA-pinned CI, and a **mandatory honest "Known gaps"** section. - `affinescript-testing-guide.md`: your primary language, previously with **zero** testing standard — authored honestly (most SHOULD rows are tracked gaps; R3 notes `affinescript-verify.yml` is advisory). SSOT migrates to `hyperpolymath/affinescript` prospectively. - `scripts/check-language-guide.sh` (wired into `just validate`) + `wave5-language-guides-test.sh` (7/7). Deleted the duplicate snapshot. ## Commit 2 — Wave 6: guard, DYADT residual fix, licence record - **DYADT residual (#461):** an adversarial review confirmed 16 bypasses in the Wave-4 verifier; 15 were fixed in #458, and this closes the last — an always-matching `contains:` regex (`.*`, `^`, `$`, …) no longer confirms vacuously (`unverifiable trivial-pattern`). Spec pins the `contains:` dialect to POSIX ERE; conformance vector + assertion added (10 vectors, 15 assertions). - **Canonical-names guard** (`check-canonical-names.sh`): blocks *reintroduction* of the deprecated names (`6a2`→descriptiles, `agent_instructions`→bot_directives) in **added** diff lines only (chartered bulk migration untouched). Wired into `just validate` + the pre-commit hook; `wave6-canonical-names-test.sh` (4/4). - **`audits/licence-flags-2026-07.adoc`**: flag-only record — the whole program made no SPDX edits and no auto licence PRs; DYADT treats licence claims as `manual-only` end to end. ## Verification All six wave suites pass; DYADT conformance 10/10 + dogfood all-confirmed; registry + scorecard dashboard in sync. ## Program status (umbrella #460) Waves 0/1/3/4 + hardening **merged** (#453, #454, #457, #458). This lands Waves 5 + 6. Remaining estate-wide work is chartered: #461 (verifier residual — **fixed here**), #462 (DYADT production verifier), #463 (per-language guides completion). Licence rows `manual-only` throughout (flag-only policy). 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude <noreply@anthropic.com>

claude added 5 commits July 3, 2026 00:38

Merge remote-tracking branch 'origin/main' into claude/estate-audit-o…

23d741e

…ptimization-h19z12

hyperpolymath marked this pull request as ready for review July 3, 2026 01:05

hyperpolymath merged commit 5fa7a83 into main Jul 3, 2026
19 checks passed

hyperpolymath deleted the claude/estate-audit-optimization-h19z12 branch July 3, 2026 01:06

hyperpolymath mentioned this pull request Jul 3, 2026

Estate audit — Wave 3: MUST/SHOULD/COULD compliance scorecards + generated dashboard #454

Merged

hyperpolymath mentioned this pull request Jul 3, 2026

Estate audit — Wave 4: DYADT, post-action agent-claim verification (rein in LLMs) #457

Merged

This was referenced Jul 3, 2026

[umbrella] Estate audit & optimization — make the enforcement real (Waves 0–6) #460

Open

Estate audit — Waves 5 + 6: per-language testing standard, canonical-names guard, DYADT residual fix #459

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Estate audit & optimization — Wave 0: kill the false green (holes before goals)#453

Estate audit & optimization — Wave 0: kill the false green (holes before goals)#453
hyperpolymath merged 5 commits into
mainfrom
claude/estate-audit-optimization-h19z12

hyperpolymath commented Jul 3, 2026

Uh oh!

sonarqubecloud Bot commented Jul 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

hyperpolymath commented Jul 3, 2026

Context

Wave 0 — Kill the false green (this commit)

Verification

Not touched (by policy / scope)

Coming in later waves (same branch)

Uh oh!

sonarqubecloud Bot commented Jul 3, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants