Skip to content

Estate audit & optimization — Wave 0: kill the false green (holes before goals)#453

Merged
hyperpolymath merged 5 commits into
mainfrom
claude/estate-audit-optimization-h19z12
Jul 3, 2026
Merged

Estate audit & optimization — Wave 0: kill the false green (holes before goals)#453
hyperpolymath merged 5 commits into
mainfrom
claude/estate-audit-optimization-h19z12

Conversation

@hyperpolymath

Copy link
Copy Markdown
Owner

Context

This is the first wave of a meticulous, estate-wide must / should / could / systems / compliance / effects audit-and-uplift program anchored in standards (the ~290-repo control plane). A 7-agent read-only recon found the core problem: the declarations are rich but the enforcement is largely theatre — validators pass vacuously, mandatory-check runners aren't wired, and there's no post-action check on agent claims. Per estate doctrine (RSR-PHILOSOPHY.adoc) the program is ordered holes before goals, solutions at source, always fail loudly.

This PR is the branch trunk; subsequent waves land as further commits.

Wave 0 — Kill the false green (this commit)

Every validator that reported success while checking nothing, or masked real errors behind || true, is now state-aware and provably able to fail, each backed by a red-team regression fixture.

Fix File Hole closed
State-aware 6scm check a2ml/scripts/check-6scm.sh Exited 0 vacuously once .scm sources migrated to 6a2; now fails on orphan-mirror drift / out-of-sync, and declares itself retired (not "in sync") when obsolete
Real Mustfile validation .github/workflows/boj-build.yml + scripts/check-mustfile-structure.sh Replaced echo "K9 validation would run here" with a real structural check — every Mustfile check must carry a severity + a means of discharge (- run: or - verification:); hollow checks fail loudly
Arg parsing (#387) rhodium-standard-repositories/rsr-audit.sh Documented --format json silently didn't work; invalid format now exits 4 instead of defaulting to text; bare-positional form kept for back-compat
De-mask || true Justfile validate + scripts/rsr-selfaudit.sh A broken audit (exit 4) was swallowed as green; now a low grade is informational/non-blocking but an audit error fails loudly. registry-check stays the hard gate
De-hardcode paths audit-contractiles.sh Removed /var/mnt/eclipse/... owner-machine paths (couldn't run in CI); takes args/$CONTRACTILE_AUDIT_REPOS, never audits zero repos

Verification

  • scripts/tests/wave0-false-green-test.sh13/13, exercising the pass and fail path of every fixed validator (just false-green-test).
  • Existing check-workflow-staleness-test.sh still passes; boj-build.yml validates as YAML; all changed scripts pass bash -n.
  • The real Mustfile passes structural validation (17 checks); a hollow check fails; rsr-audit --format json emits JSON; an errored audit fails validate.

Not touched (by policy / scope)

  • Licence/SPDX — flag-only throughout; no header edits.
  • continue-on-error soft-gates — left in place (documented, with real blocking equivalents elsewhere); honest-labelling / promotion is a Wave 1 item, not a blanket removal that would manufacture red noise.

Coming in later waves (same branch)

Wave 1 (wire the automation that never runs), Wave 3 (MUST/SHOULD/COULD scorecards + compliance dashboard), Wave 4 (the did-you-actually-do-that post-action claim-verifier spec — the missing LLM-regulation tier), Wave 5 (AffineScript testing standard + template), Wave 6 (campaign issues cross-linking #426/#451/#437/#381/#387/#446, release hygiene).

🤖 Generated with Claude Code


Generated by Claude Code

claude added 5 commits July 3, 2026 00:38
Holes-before-goals pass on the enforcement surface: several validators
reported success while checking nothing, or masked real errors behind
`|| true`. Each is now state-aware and provably able to fail, with a
permanent regression test that exercises both the pass and fail paths
(a check that cannot fail is not a check).

- a2ml/scripts/check-6scm.sh: no longer exits 0 vacuously when the .scm
  sources have migrated away. Distinguishes obsolete-no-op (retired,
  superseded by 6a2/descriptiles), orphan-mirror DRIFT (fails), and
  out-of-sync mirror (fails).
- .github/workflows/boj-build.yml: replace the "K9 validation would run
  here" placeholder with real structural validation via new
  scripts/check-mustfile-structure.sh (every Mustfile check must carry a
  severity and a means of discharge — `- run:` or `- verification:`;
  hollow checks fail loudly).
- rhodium-standard-repositories/rsr-audit.sh: fix argument parsing
  (standards#387) so the documented `--format json` works and an invalid
  format errors (exit 4) instead of silently defaulting to text; keep the
  bare-positional form for backward compatibility.
- Justfile `validate`: drop the blanket `|| true` on the RSR self-audit.
  Via scripts/rsr-selfaudit.sh a low grade stays informational/non-blocking
  (a monorepo is not expected to score Gold) but a broken audit (exit 4)
  now fails validate loudly. registry-check remains the hard gate.
- audit-contractiles.sh: remove hardcoded /var/mnt/eclipse owner-machine
  paths; take repos as args or $CONTRACTILE_AUDIT_REPOS, default to self,
  and never silently audit zero repos. Fix a corrupted box-drawing char.
- scripts/tests/wave0-false-green-test.sh: 13 assertions covering the
  pass + fail path of each fixed validator; Justfile recipes
  `false-green-test` and `mustfile-check`.

Licence/SPDX untouched (flag-only policy). continue-on-error soft-gates
left in place (documented, with real blocking equivalents) — honest
labelling / promotion tracked for Wave 1.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0114ps6mY5jAH4SzbGxeuYjc
Wave-0 edits to files inside the a2ml and rhodium-standard-repositories
spec trees changed their content source_hash, so the generated registry
drifted (the recurring standards#381 failure). Regenerated deterministically
via scripts/build-registry.sh; TOPOLOGY.md unchanged. A pre-commit regen
hook + CI remediation hint lands in Wave 1 to stop this recurring.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0114ps6mY5jAH4SzbGxeuYjc
… drift)

origin/main (6df21b1, PR #452) edited files under spec homes without
regenerating the derived registry, leaving main itself failing
`build-registry.sh --check` (standards#381/#399 — drift blocks every PR).
The PR merge-check inherited that drift. Merging main into this branch and
regenerating brings the registry back in sync with the combined tree
(a2ml + k9-svc source hashes). Wave 1 adds hook auto-install + a CI
remediation hint so this stops recurring.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0114ps6mY5jAH4SzbGxeuYjc
The estate declared mandatory checks and install-only hooks that nothing
executed. Wave 1 wires them so the enforcement is real, not aspirational.

- scripts/run-mustfile.sh: EXECUTE the Mustfile's `- run:` invariants (16
  checks that were declared but never run by any CI job). Critical/high
  failures block; warning-severity failures are advisory; `- verification:`
  checks are reported as MANUAL (counted, never silently green). Wired into
  boj-build.yml's contractile job as a real "Mustfile Enforcement" step
  (the repo passes all 16 today, so it gates honestly). Justfile: must-check.
- hooks/install.sh + Justfile hooks-install: install the pre-commit guard
  (language policy + registry-drift) into .git/hooks. Previously the hooks
  only ran if a contributor manually copied them; now `just hooks-install`
  wires a thin shim that execs the tracked hook (single source of truth).
- registry-verify.yml: on drift, write the exact remediation (`just registry`
  + `git add` + `just hooks-install`) to the CI job summary instead of a bare
  exit 1 — this is the recurring standards#381 pain made self-explaining.
- hypatia-scan-reusable.yml: the "critical issues" step is honestly labelled
  ADVISORY / does-not-gate (echo + job summary) so a green check carrying
  critical findings is not mistaken for zero findings. Blocking promotion
  tracked at standards#399/#437. (no-js-scan.yml was already honest.)
- just doctor: report whether the optional contractile.just import resolved
  and whether the pre-commit hook is installed (import? fails silent otherwise).
- scripts/tests/wave1-automation-test.sh (9 assertions) + Justfile
  automation-test: prove the Mustfile runner blocks on critical/high, stays
  advisory on warning, and the hook installer is idempotent.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0114ps6mY5jAH4SzbGxeuYjc
@sonarqubecloud

sonarqubecloud Bot commented Jul 3, 2026

Copy link
Copy Markdown

@hyperpolymath hyperpolymath marked this pull request as ready for review July 3, 2026 01:05
@hyperpolymath hyperpolymath merged commit 5fa7a83 into main Jul 3, 2026
19 checks passed
@hyperpolymath hyperpolymath deleted the claude/estate-audit-optimization-h19z12 branch July 3, 2026 01:06
hyperpolymath added a commit that referenced this pull request Jul 3, 2026
…rated dashboard (#454)

## Context

Follow-up to #453 (Waves 0–1, merged). This wave delivers the **must /
should / could / systems / compliance / effects** audit itself: a
single, honest, committed view of where every registered standard
actually stands — grounded in what exists on disk, not aspiration.

It mirrors the existing `build-registry.sh` pattern exactly:
hand-authored source → deterministic generated artifact → `--check` in
CI.

## What's here

**The scorecard format** —
`.machine_readable/scorecards/<spec-id>.scorecard.a2ml`, one per LOCAL
registered spec (28), keyed 1:1 to `REGISTRY.a2ml`. Each requirement
carries:
- `system` — the mechanical check that discharges it, or the literal
`none` (**visible**, not hidden)
- `status` — `pass` (evidence **required**) · `fail` (real gap) ·
`manual-only` (governance/licence) · `aspirational` (reach target,
**never** counted as pass)
- `evidence` and `effects` (downstream impact)

**The generator** — `scripts/build-scorecards.sh` produces
`COMPLIANCE-DASHBOARD.md`: per-spec MUST verdict + a **systems-coverage
%** (share of requirements with a real mechanical check — the honest
measure of *enforcement vs. assertion*). Deterministic; `--check`
(drift) and `--strict` (every spec must have a scorecard) modes.

**The schema** — `scorecard.schema.json`. Pass-without-evidence and
aspirational-as-pass are rejected by construction.

## The honest baseline this reveals

| Metric | Value |
|---|---|
| Specs scored | **28 / 28** |
| MUST requirements passing | **41 / 138** (74 failing) |
| Estate systems coverage | **66%** of 328 graded requirements have a
real mechanical check |
| Specs at `✅ MUST-met` | `a2ml` (the rest carry honest gaps) |

This is deliberately unflattering — it's the truthful starting line for
the uplift, and it operationalises the honest-badge ask (#446): no
intuition-plucked Grade-A gate can inflate a score.

## Enforcement

- `registry-verify.yml` now also checks the dashboard is current
(`--strict`) with a remediation hint.
- `Justfile`: `scorecards`, `scorecards-check`,
`scorecards-check-strict`.
- `scripts/tests/wave3-scorecards-test.sh` (5 assertions):
pass-without-evidence rejected, orphan rejected, deterministic,
`--check` detects drift.

## Method note

Scorecards were populated by a fan-out of per-spec readers (one per spec
home, each grounding claims in on-disk files), then **serialised
deterministically** so the `.a2ml` format is guaranteed regardless of
generator output. Licence rows are `manual-only` throughout (flag-only
policy — no SPDX edits).

## Coming next (same track)

Wave 4 — the `did-you-actually-do-that` post-action claim-verifier spec
(the missing LLM-regulation tier); Wave 5 — AffineScript testing
standard; Wave 6 — campaign issues (cross-linking #426/#451/#437/#446) +
release hygiene.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---
_Generated by [Claude
Code](https://claude.ai/code/session_0114ps6mY5jAH4SzbGxeuYjc)_

Co-authored-by: Claude <noreply@anthropic.com>
hyperpolymath added a commit that referenced this pull request Jul 3, 2026
…ein in LLMs) (#457)

## Context

Follow-up to #453 (Waves 0–1) and #454 (Wave 3), both merged. This wave
builds the piece you asked for directly — a system to **rein in LLMs**:
a standard that checks, mechanically, that an agent's *claimed* outcomes
actually happened.

The estate already gates what an agent may do **before** it acts
(gatekeeper → AGENTIC → contractiles). It had **nothing** that takes an
agent's asserted outcome and confirms it after. Every false-green hole
this program has fixed is a special case of one disease: *a claim was
trusted instead of verified.* DYADT is the missing **Tier 4**.

## The four-tier accountability pipeline

| Tier | Governs | Home |
|---|---|---|
| 1. Admission | must read the manifest before acting |
`0-ai-gatekeeper-protocol/` |
| 2. Pre-action | entropy budgets, intent, confirmation |
`agentic-a2ml/` |
| 3. In-session gates | contractile MUST/TRUST/… at close/push/merge |
`contractiles/` |
| **4. Post-action (this PR)** | **claimed X → mechanically
confirm/refute X** | **`did-you-actually-do-that/`** |

## What's here

**Spec set** (`did-you-actually-do-that/`): `README` (pipeline binding)
· `spec/CLAIM-FORMAT.adoc` (typed claims) ·
`spec/VERIFICATION-PROTOCOL.adoc` (the
`confirmed`/`refuted`/`unverifiable` contract — **unverifiable is loud,
never green**; a verifier must *re-derive* evidence, never read back the
agent's own `evidence` field) · `spec/CONSEQUENCE-LEDGER.adoc`
(append-only, dual-signed, per-actor confirmation rate that Tier-3 MAY
gate on) · `spec/conformance/` (6 executable vectors + runner) ·
`docs/NAMING-RESOLUTION.adoc` (resolves the PLASMA collision).

**Executable + dogfooded:**
- `scripts/verify-claims.sh` — reference verifier (local verifiers real;
network/manual return `unverifiable`).
- Root `CLAIMS.a2ml` — **7 claims about this very change**, re-derived
from primary evidence; `dyadt-verify.yml` runs the verifier +
conformance suite in CI. If a claim here were false, CI **refutes** it
and fails. The spec's first conformance run is on itself.
- `scripts/tests/wave4-dyadt-test.sh` (7/7) — proves a false claim is
REFUTED despite an honest-sounding statement, and the
incompatible-verifier + manual-only guards fire.

**Registered + graded:** added to `build-registry.sh` (32 specs); honest
scorecard (5/5 MUST met, 90% systems coverage — the network verifier is
an honest `fail`, since only the production impl does forge/CI APIs).

## Boundary

This repo is the **declaration layer**: it ships the normative spec + a
reference verifier + the dogfood. The **production actuator**
(continuous, in-session, wired to hypatia/gitbot-fleet with real ledger
enforcement) is chartered for `hyperpolymath/did-you-actually-do-that`,
built against these conformance vectors — it MUST NOT diverge from this
contract. That's the parallel session you flagged.

Licence/SPDX is `manual-only` end-to-end (flag-only policy — a licence
claim is always `unverifiable: manual-only`).

## Coming next (same track)

Wave 5 — AffineScript testing standard + template; Wave 6 — campaign
issues (cross-linking #426/#451/#437/#446) + release hygiene.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---
_Generated by [Claude
Code](https://claude.ai/code/session_0114ps6mY5jAH4SzbGxeuYjc)_

Co-authored-by: Claude <noreply@anthropic.com>
hyperpolymath added a commit that referenced this pull request Jul 3, 2026
…names guard, DYADT residual fix (#459)

Two cohesive commits closing out the estate audit-and-optimization
program (umbrella #460). Both fully tested; generated artifacts in sync.

## Commit 1 — Wave 5: per-language testing depth

You flagged this directly: the estate's only per-language testing
*depth* was a single Julia guide from **2024** (no MUST/SHOULD,
Rust+Julia only) next to a **byte-identical duplicate**.

- `language-testing-standards.md` → **v2.0.0**: RFC-2119 requirements
**R1–R9** mapped to the CRG test taxonomy; an anti-theatre rule (no
`continue-on-error` on a MUST check; coverage reported-with-artifact,
not asserted).
- `templates/language-testing-guide-TEMPLATE.md`: the skeleton every
guide follows — requirement-mapping table (tool or **visible** `none`),
tools, SHA-pinned CI, and a **mandatory honest "Known gaps"** section.
- `affinescript-testing-guide.md`: your primary language, previously
with **zero** testing standard — authored honestly (most SHOULD rows are
tracked gaps; R3 notes `affinescript-verify.yml` is advisory). SSOT
migrates to `hyperpolymath/affinescript` prospectively.
- `scripts/check-language-guide.sh` (wired into `just validate`) +
`wave5-language-guides-test.sh` (7/7). Deleted the duplicate snapshot.

## Commit 2 — Wave 6: guard, DYADT residual fix, licence record

- **DYADT residual (#461):** an adversarial review confirmed 16 bypasses
in the Wave-4 verifier; 15 were fixed in #458, and this closes the last
— an always-matching `contains:` regex (`.*`, `^`, `$`, …) no longer
confirms vacuously (`unverifiable trivial-pattern`). Spec pins the
`contains:` dialect to POSIX ERE; conformance vector + assertion added
(10 vectors, 15 assertions).
- **Canonical-names guard** (`check-canonical-names.sh`): blocks
*reintroduction* of the deprecated names (`6a2`→descriptiles,
`agent_instructions`→bot_directives) in **added** diff lines only
(chartered bulk migration untouched). Wired into `just validate` + the
pre-commit hook; `wave6-canonical-names-test.sh` (4/4).
- **`audits/licence-flags-2026-07.adoc`**: flag-only record — the whole
program made no SPDX edits and no auto licence PRs; DYADT treats licence
claims as `manual-only` end to end.

## Verification

All six wave suites pass; DYADT conformance 10/10 + dogfood
all-confirmed; registry + scorecard dashboard in sync.

## Program status (umbrella #460)

Waves 0/1/3/4 + hardening **merged** (#453, #454, #457, #458). This
lands Waves 5 + 6. Remaining estate-wide work is chartered: #461
(verifier residual — **fixed here**), #462 (DYADT production verifier),
#463 (per-language guides completion).

Licence rows `manual-only` throughout (flag-only policy).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants