Skip to content

chore: claude-config tier-1 cleanup + agent lint + CI#2

Merged
screenleon merged 4 commits into
mainfrom
chore/claude-config-tier-1-cleanup
May 2, 2026
Merged

chore: claude-config tier-1 cleanup + agent lint + CI#2
screenleon merged 4 commits into
mainfrom
chore/claude-config-tier-1-cleanup

Conversation

@screenleon
Copy link
Copy Markdown
Owner

@screenleon screenleon commented May 2, 2026

Summary

Closes the deferred items from PR #1 review plus a deeper Tier-A pass on token usage and the codex pipeline.

Tier 1 cleanup (initial commits)

  • commands/pr-gate.md — detect integration branch via git symbolic-ref refs/remotes/origin/HEAD; mark Agent(...) block as illustrative pseudocode.
  • commands/pm.md — drop "spawn reviewers" half (that's /pr-gate territory). Codex fallback points at scripts/codex-dispatch.sh, not raw codex exec.
  • agents/project-pm.md — drop Bash(codex exec) fallback line; main-thread implementation details don't belong in subagent prompts.
  • README.md — refresh project-pm and codex-executor descriptions; new lead Design note "Subagents cannot spawn subagents".
  • scripts/lint-agents.sh (new) — fail if any subagent declares Agent in tools:. Handles inline scalar, YAML flow [a,b,c], and YAML block-list. Verified against fixtures.
  • install.sh — pre-flight call to lint-agents.sh.
  • .github/workflows/lint.yml (new) — run lint on push/PR. Defense-in-depth.

Tier A — token + codex pipeline (3 added commits)

docs(codex): canonical brief schema + executor enforcement

  • docs/codex-brief.md (new) — required fields (working_dir, goal, files, acceptance) + 5 reusable self-verify macros extracted from real briefs run this session: cross-source, sample-N OK re-check, git-status no-collateral-damage, dedup-across-N, schema-match. Good and bad example briefs.
  • agents/codex-executor.md — enforce schema. Reject briefs missing required fields rather than improvising.
  • agents/project-pm.md — brief-writing section points at the schema doc instead of reciting an inline example.
  • commands/pm.md + README.md — reference the schema.

fix(pr-gate): single PM hop — main thread classifies via heuristic

  • commands/pr-gate.md — main thread classifies via deterministic file-pattern heuristic (grep -vE '\.(md|jsonl|txt)$|^audits/|...') instead of asking PM. PM is invoked once at Step 3 (synthesis only). Bias toward implementation when ambiguous.

Pre-merge review

  • critic (round 1): advise → medium finding (lint missed YAML block-list form) fixed; codex flag duplication fixed; codex-executor README description fixed.
  • architecture-reviewer (round 1): advise → install-time-only enforcement fixed via CI workflow; double PM hop concern fixed via single-hop pr-gate rewrite (Tier A).

Test plan

  • CI workflow runs green on this PR
  • Next codex dispatch: brief written against docs/codex-brief.md schema; codex-executor accepts (or correctly rejects malformed briefs)
  • Next /pr-gate run: confirm main thread classifies via heuristic (no PM hop at Step 1) and PM is invoked once at synthesis

screenleon and others added 4 commits May 2, 2026 10:02
Address deferred items from PR #1 critic/architecture review and
add a regression guard:

- commands/pr-gate.md: detect integration branch via
  `git symbolic-ref refs/remotes/origin/HEAD` instead of hardcoding
  `main`; mark Agent(...) block as illustrative pseudocode.
- commands/pm.md: drop "or spawn reviewers X/Y/Z" half — that's
  /pr-gate territory. Move full Bash(codex exec) fallback recipe
  here so PM prompt doesn't carry main-thread implementation
  details.
- agents/project-pm.md: drop the Bash(codex exec) fallback line
  from PM prompt — main-thread fallback paths don't belong here.
- README.md: refresh project-pm description; add "Subagents
  cannot spawn subagents" as the lead Design note with link to
  Agent SDK docs.
- scripts/lint-agents.sh: regression guard — fail if any subagent
  declares Agent in frontmatter tools. Wired into install.sh
  pre-flight.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lication

Address review findings on prior tier-1 cleanup commit:

- scripts/lint-agents.sh: previously only matched the inline scalar
  form `tools: Read, Bash, Agent`. The YAML block-list form
      tools:
        - Read
        - Agent
  was silently passed — exact escape hatch the script is meant
  to prevent. Now extracts the full tools: block (handles inline,
  flow `[a, b, c]`, and block-list) and matches Agent as a whole
  token. Also warn on files lacking YAML frontmatter rather than
  treating them as compliant. Verified against fixtures covering
  inline / flow / block-list / solo Agent / "MyAgent" suffix /
  "AgentX" prefix / no-frontmatter cases.

- .github/workflows/lint.yml: run scripts/lint-agents.sh on push
  and PRs. install.sh pre-flight only catches drift on machines
  that re-run install; CI catches it on every contribution
  regardless of whether the contributor symlinks. Defense in depth.

- commands/pm.md: codex fallback now points at scripts/codex-dispatch.sh
  (single source of truth for sandbox / approval / trace flags)
  instead of duplicating the raw codex exec invocation. Avoids
  drift the next time codex-dispatch.sh changes.

- README.md: codex-executor description now explicitly says
  "dispatched by the main thread" — matches the new project-pm
  description and the Design notes lead bullet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Briefs dispatched to codex-executor have been written ad-hoc each
time — required fields varied, self-verify patterns were copied
from memory rather than referenced, and a vague brief would still
get dispatched (executor's only validation was "working dir is
missing"). This drives composition cost up and quality variance.

Establish a canonical schema and codify the reusable patterns:

- docs/codex-brief.md (new): four required fields (working_dir,
  goal, files, acceptance) and five reusable self-verify macros
  extracted from real briefs run this session: cross-source,
  sample-N OK re-check, git-status no-collateral-damage,
  dedup-across-N, schema-match. Includes one good example brief
  and one bad example for contrast.

- agents/codex-executor.md: enforce the schema. Reject briefs
  missing any required field rather than improvising. Link to
  the schema doc.

- agents/project-pm.md: PM's brief-writing section now points at
  the schema doc and macros, instead of reciting an inline
  example. -3 lines.

- commands/pm.md: note that briefs must follow the schema.

- README.md: new "Codex briefs" section linking to the schema;
  "Adding new pieces" reminds contributors not to put Agent in
  subagent tools (lint-agents.sh enforces).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Architecture-reviewer noted that the prior pr-gate flow could call
PM twice on ambiguous classification (once to classify, once to
synthesize). Each PM invocation reloads the full PM system prompt
(~3KB).

Move classification to a deterministic heuristic in the skill text:

  non_docs=$(git diff $BASE...HEAD --name-only \
    | grep -vE '\.(md|jsonl|txt)$|^\.gitignore$|^audits/|^docs/|^\.github/')
  [ -z "$non_docs" ] && CLASS=docs-only || CLASS=implementation

Bias toward `implementation` when ambiguous: an unnecessary
security/risk spawn returns `pass-not-applicable` cheaply, but a
missed implementation review ships a real bug.

PM's role narrows to synthesis only — invoked once at Step 3 with
classification + reviewer outputs. Net: one PM round-trip per gate
instead of one or two.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@screenleon screenleon merged commit 0c2498a into main May 2, 2026
1 check passed
screenleon added a commit that referenced this pull request May 23, 2026
Gate findings on PR (CC-243 stacked on cc-060):

- [medium, 3-reviewer overlap] Default snapshot output path used only a
  second-resolution timestamp (`/tmp/pm-snapshot-<ts>.md`); concurrent
  /pm spawns could collide and clobber each other's snapshots, feeding
  the wrong ground truth to PM. Switched the default to include PID +
  $RANDOM suffix (`/tmp/pm-snapshot-<ts>-$$.$RANDOM.md`), which is
  collision-resistant enough for solo + concurrent dispatches without
  introducing a mktemp portability concern.

- [medium] Snapshot script hard-failed when `origin/main` could not be
  resolved (detached HEAD, missing remote, fresh local clone). Added a
  graceful fallback: try `origin/main` -> `main` (local) -> emit
  `branch_base: unresolved` with a `# warn:` line. The same fallback
  also degrades `ahead_by` to 0 + warns instead of hard-failing.

- [low] BACKLOG.md CC-243 description claimed `project_tooling` was
  derived from a shell probe `ls Makefile project/backlog.yml 2>/dev/null`,
  but the script actually emits three fixed boolean keys
  (`makefile`, `backlog_render_target`, `has_validate_sh`). Updated the
  row text to match the impl contract and noted the map is extensible.
  Also updated the default OUT_PATH shape in the description and added
  pr:TBD ref (PR number unknown until open).

- [low] CLI `--help` / unknown-flag paths had no test coverage. Added
  cli-help (exits 0, usage on stderr), cli-unknown-flag (non-zero,
  "unknown argument" message), and branch-base-warn-on-missing-origin-main
  (the medium #2 fix's positive case).

Local verify: 31 tests passed (was 29), lint-scripts OK, validator
emits 31 E-* lines (= CC-228 baseline unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
screenleon added a commit that referenced this pull request May 23, 2026
Gate findings on PR (CC-243 stacked on cc-060):

- [medium, 3-reviewer overlap] Default snapshot output path used only a
  second-resolution timestamp (`/tmp/pm-snapshot-<ts>.md`); concurrent
  /pm spawns could collide and clobber each other's snapshots, feeding
  the wrong ground truth to PM. Switched the default to include PID +
  $RANDOM suffix (`/tmp/pm-snapshot-<ts>-$$.$RANDOM.md`), which is
  collision-resistant enough for solo + concurrent dispatches without
  introducing a mktemp portability concern.

- [medium] Snapshot script hard-failed when `origin/main` could not be
  resolved (detached HEAD, missing remote, fresh local clone). Added a
  graceful fallback: try `origin/main` -> `main` (local) -> emit
  `branch_base: unresolved` with a `# warn:` line. The same fallback
  also degrades `ahead_by` to 0 + warns instead of hard-failing.

- [low] BACKLOG.md CC-243 description claimed `project_tooling` was
  derived from a shell probe `ls Makefile project/backlog.yml 2>/dev/null`,
  but the script actually emits three fixed boolean keys
  (`makefile`, `backlog_render_target`, `has_validate_sh`). Updated the
  row text to match the impl contract and noted the map is extensible.
  Also updated the default OUT_PATH shape in the description and added
  pr:TBD ref (PR number unknown until open).

- [low] CLI `--help` / unknown-flag paths had no test coverage. Added
  cli-help (exits 0, usage on stderr), cli-unknown-flag (non-zero,
  "unknown argument" message), and branch-base-warn-on-missing-origin-main
  (the medium #2 fix's positive case).

Local verify: 31 tests passed (was 29), lint-scripts OK, validator
emits 31 E-* lines (= CC-228 baseline unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
screenleon added a commit that referenced this pull request May 23, 2026
* backlog: add CC-243 (pm-prep-snapshot) + CC-244 (typed pipeline, someday)

Captures the agent-routing improvements proposed in the CC-060 retro:

- CC-243 (🔵 active, P2): scripts/pm-prep-snapshot.sh writes a typed
  state snapshot to /tmp/pm-snapshot-<ts>.md before any PM agent spawn.
  Solves the two HALT classes hit during CC-242: caller brief carrying
  a stale main commit (#129+#130 had merged after the brief was
  written) and a ticket-ID collision (CC-241 was already consumed).
  Schema-like keys (branch_base/ticket_ids_consumed/project_tooling)
  chosen now to align with the future spike_v1 schema in CC-244.

- CC-244 (🟢 someday): typed spike → brief → handover pipeline. Define
  spike_v1 schema + validator + gen-brief-from-spike helper. Deferred
  because only one spike doc (CC-060) exists today; schema value scales
  with N. Promote when 3+ spikes accumulate or an automation use case
  demands structured spike state.

Validator: BACKLOG.md emits 31 E-* lines (unchanged from pre-existing
CC-228 baseline; the 4 area/epic enum errors I initially hit on
ops/agent-prep + agent-routing were resolved by picking valid tokens
ops/arch + design).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cc-243): add pm pre-spawn snapshot capture

* fix(cc-243): three pm-prep-snapshot bugs caught by local test run

Codex's initial commit (d964e56) had three issues that the test suite
caught on first local run:

1. printf '---\n' triggers `printf: --: invalid option` on bash 5.1.16
   builtin printf — leading `---` is parsed as a flag prefix. Switched
   both occurrences to `printf '%s\n' '---'` (lines 246, 271).

2. `${#RECENTLY_MERGED[@]:-0}` is invalid bash syntax — the :- default
   operator cannot combine with the # length operator on an array.
   RECENTLY_MERGED is declared as an empty array at line 160, so
   `${#RECENTLY_MERGED[@]}` returns 0 safely. Dropped the :-0.

3. `collect_backlog_row()` used `awk -F'|'` and ran `gsub` on `$2` to
   trim whitespace for matching. Modifying `$2` causes awk to rebuild
   `$0` with the default OFS (space), which stripped all `|` from the
   emitted row — making the test for `| CC-244 |` literal fail. Fixed
   by copying `$2` into a local `candidate` variable, so `$0` stays
   untouched. The CC-243 test passing earlier was a false-positive:
   the literal `| CC-243 |` happens to appear inside the CC-243 body
   section's example YAML, so it matched there even though the actual
   row was malformed.

Local verify: bash scripts/test-pm-prep-snapshot.sh -> 29 passed, 0 failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cc-243): address pr-gate findings (2 medium + 2 low)

Gate findings on PR (CC-243 stacked on cc-060):

- [medium, 3-reviewer overlap] Default snapshot output path used only a
  second-resolution timestamp (`/tmp/pm-snapshot-<ts>.md`); concurrent
  /pm spawns could collide and clobber each other's snapshots, feeding
  the wrong ground truth to PM. Switched the default to include PID +
  $RANDOM suffix (`/tmp/pm-snapshot-<ts>-$$.$RANDOM.md`), which is
  collision-resistant enough for solo + concurrent dispatches without
  introducing a mktemp portability concern.

- [medium] Snapshot script hard-failed when `origin/main` could not be
  resolved (detached HEAD, missing remote, fresh local clone). Added a
  graceful fallback: try `origin/main` -> `main` (local) -> emit
  `branch_base: unresolved` with a `# warn:` line. The same fallback
  also degrades `ahead_by` to 0 + warns instead of hard-failing.

- [low] BACKLOG.md CC-243 description claimed `project_tooling` was
  derived from a shell probe `ls Makefile project/backlog.yml 2>/dev/null`,
  but the script actually emits three fixed boolean keys
  (`makefile`, `backlog_render_target`, `has_validate_sh`). Updated the
  row text to match the impl contract and noted the map is extensible.
  Also updated the default OUT_PATH shape in the description and added
  pr:TBD ref (PR number unknown until open).

- [low] CLI `--help` / unknown-flag paths had no test coverage. Added
  cli-help (exits 0, usage on stderr), cli-unknown-flag (non-zero,
  "unknown argument" message), and branch-base-warn-on-missing-origin-main
  (the medium #2 fix's positive case).

Local verify: 31 tests passed (was 29), lint-scripts OK, validator
emits 31 E-* lines (= CC-228 baseline unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: add test-pm-prep-snapshot.sh to skill-refine shellcheck exclude

Same pattern as the other test-*.sh harness consumers in this exclude
list (test-codex-dispatch.sh, test-dispatch-handover.sh, ...): they
source scripts/lib/test-harness.sh which exposes shared vars like
tmp_root, and that pattern triggers a slew of style-tier shellcheck
warnings (SC1090 non-constant source, SC2154 referenced-but-unassigned,
SC2120 function arg passthrough). Excluding the file matches existing
project policy rather than scattering inline disable directives.

CI test-skill-refine job was red on PR #132 because the new file
wasn't on the list. test-pm-prep-snapshot.sh itself was clean per the
local `bash scripts/lint-scripts.sh` run (which uses a different
shellcheck severity profile).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant