Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
name: lint

on:
push:
branches: [main]
pull_request:

jobs:
lint-agents:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Validate agent frontmatter
run: ./scripts/lint-agents.sh
11 changes: 8 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ Idempotent — re-run safely after adding files. Per-file symlinks so other tool
### Agents

**Orchestration**
- **project-pm** — PM across `~/github/` repos. Triages requests, decomposes work, dispatches to `codex-executor`, runs the PR gate, maintains per-project memory at `~/.claude/projects/-home-screenleon-github/memory/project_<repo>.md`.
- **codex-executor** — Thin wrapper subagent. Accepts a complete brief, dispatches to the Codex CLI via `scripts/codex-dispatch.sh`, verifies via `git diff`, reports back.
- **project-pm** — PM across `~/github/` repos. Triages requests, decomposes work, writes briefs (main thread dispatches), synthesizes PR-gate reviews, maintains per-project memory at `~/.claude/projects/-home-screenleon-github/memory/project_<repo>.md`.
- **codex-executor** — Thin wrapper subagent, dispatched by the main thread. Accepts a complete brief, calls Codex via `scripts/codex-dispatch.sh`, verifies via `git diff`, reports back.

**Reviewers (advisors — PM may override with reasoning)**
- **critic** — Adversarial review of plan / diff. Scope creep, incompleteness, convention drift.
Expand All @@ -53,12 +53,17 @@ Idempotent — re-run safely after adding files. Per-file symlinks so other tool

## Design notes

- **Subagents cannot spawn subagents.** Claude Code intentionally restricts nested `Agent` tool calls regardless of frontmatter declaration ([Agent SDK docs](https://code.claude.com/docs/en/agent-sdk/subagents.md)). The **main thread** orchestrates: it spawns subagents (PM, reviewers, codex-executor) and relays outputs between them. PM produces briefs and synthesizes verdicts; it does not dispatch. Reviewers run in parallel from the main thread, not from PM. Never include `Agent` in any subagent's `tools:` frontmatter — `scripts/lint-agents.sh` enforces this.
- **PM thinks, Codex implements.** `project-pm` writes the brief; `codex-executor` is a dispatcher, not a designer. Architecture, scope, and acceptance criteria stay with the PM.
- **Definitions in repo, state on disk.** Agent and command definitions are version-controlled here. Per-project state (memory, traces) lives in `~/.claude/` and stays out of this repo.
- **Decoupled from agent-playbook-template.** The playbook is a methodology framework; this repo is a personal config. They evolve independently.

## Adding new pieces

- New agent: drop a `name.md` (with frontmatter) into `agents/`, re-run `install.sh`.
- New agent: drop a `name.md` (with frontmatter) into `agents/`, re-run `install.sh`. **Don't include `Agent` in `tools:`** — `scripts/lint-agents.sh` will reject the install.
- New command: drop a `name.md` into `commands/`, re-run `install.sh`.
- Settings allowlist additions: edit `~/.claude/settings.json` directly (or use the `update-config` skill); don't try to symlink settings.

## Codex briefs

Schema and reusable self-verify macros: [`docs/codex-brief.md`](docs/codex-brief.md). All briefs dispatched to `codex-executor` must include `working_dir`, `goal`, `files`, and `acceptance`; the executor rejects briefs missing those fields.
7 changes: 6 additions & 1 deletion agents/codex-executor.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,12 @@ Thin dispatcher. You write nothing yourself; you invoke Codex.

# Job

1. Receive brief. If working dir is missing or the change is ambiguous, stop and ask — do not guess.
1. **Validate brief against schema** at `~/github/claude-config/docs/codex-brief.md`. REJECT (stop and ask the caller) if missing any of:
- `working_dir` (absolute path that exists)
- `goal` (one sentence — what changes after this runs)
- `files` (concrete paths or search hint; create-new and edit-existing both enumerated)
- `acceptance` (testable post-conditions Codex can verify before declaring done)
Do not improvise missing fields.
2. Dispatch via `~/github/claude-config/scripts/codex-dispatch.sh`. Never call `codex exec` directly.
3. Verify the result against `git diff` — Codex's self-report may not match reality.
4. Report back in the shape below.
Expand Down
7 changes: 2 additions & 5 deletions agents/project-pm.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,12 +71,9 @@ The main thread runs reviewers in parallel (single message, multiple Agent calls

# Writing a brief for codex-executor

Required: **working dir** (abs path), **goal** (one sentence), **files to touch** (paths or search hint), **constraints** (don't-change, conventions, tests that must still pass), **acceptance criteria** (test, build, concrete check).
The canonical schema lives in `~/github/claude-config/docs/codex-brief.md`. Briefs must declare `working_dir`, `goal`, `files`, and `acceptance`. Reach for the self-verify macros (`cross-source`, `sample-N OK re-check`, `git-status no-collateral-damage`, `dedup-across-N`, `schema-match`) when the task warrants them. `codex-executor` rejects briefs missing the four required fields — write the full set up front rather than getting bounced.

Example:
> In `~/github/foo/`, `src/auth/Login.tsx` drops the redirect param after OAuth callback — `/auth/callback?next=/dashboard` lands on `/`. Fix redirect handling. Existing tests in `src/auth/__tests__/` must still pass; add a test for the redirect case. Sandbox: workspace-write.

Return the brief to the main thread; main thread dispatches via `Agent(subagent_type: "codex-executor", ...)` (or directly via `Bash(codex exec ...)` if `codex-executor` is unavailable). Verify the resulting report against `git diff` before claiming success.
Return the brief to the main thread; main thread dispatches it. Verify the resulting report against `git diff` before claiming success.

# Per-project memory shape

Expand Down
6 changes: 5 additions & 1 deletion commands/pm.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,8 @@ Invoke `project-pm` via Agent. Brief with: request ($ARGUMENTS), current working

Relay the PM's user-facing summary. Do not do the PM's job yourself.

**Note**: Subagents cannot spawn subagents. If the PM's reply is "dispatch this brief to codex-executor" or "spawn reviewers X / Y / Z", the **main thread** must do the dispatching. Treat the PM's brief as the input to your own `Agent(subagent_type: "codex-executor", ...)` call (or `Bash(codex exec ...)` if `codex-executor` itself is unavailable in your environment).
**Note**: Subagents cannot spawn subagents. If the PM's reply is "dispatch this brief to codex-executor", the **main thread** must do the dispatching — treat the PM's brief as input to your own `Agent(subagent_type: "codex-executor", ...)` call (illustrative — emit it as a real Agent tool call). If `codex-executor` is unavailable, fallback to invoking `scripts/codex-dispatch.sh` directly via Bash — the script encodes the canonical sandbox / approval / trace-capture flags so this command stays in sync with the rest of the dispatch path.

Briefs must follow the schema at `docs/codex-brief.md` (working_dir / goal / files / acceptance, plus optional self-verify macros). codex-executor rejects briefs missing the required fields.

For PR-gate flows, use `/pr-gate` instead — that skill handles reviewer orchestration; do not re-implement it inline here.
42 changes: 28 additions & 14 deletions commands/pr-gate.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,34 +3,48 @@ description: Run the pre-PR review pipeline (critic + architecture + security +
argument-hint: [optional context, e.g. "skip qa, already audited"]
---

Run the PR gate. Subagents cannot spawn subagents in Claude Code, so the **main thread** orchestrates reviewers; `project-pm` synthesizes.
Run the PR gate. Subagents cannot spawn subagents in Claude Code, so the **main thread** orchestrates reviewers; `project-pm` is invoked once at the end to synthesize.

## Step 1 — classify the diff
## Step 1 — classify the diff (main thread, no PM hop)

In the main thread, run `git diff main...HEAD --stat` (substitute the actual integration branch). Decide:
Detect the integration branch and check the diff:

- **docs / content-only** (no runtime code change) → spawn `critic` + `architecture-reviewer` only. Security / risk / qa are `pass-not-applicable`.
- **implementation change** (any runtime code diff) → spawn all four advisors + qa-tester.
```bash
BASE=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's@^refs/remotes/origin/@@' || echo main)
git diff "$BASE"...HEAD --stat
```

Apply the heuristic — bias toward `implementation` when ambiguous (an unnecessary security/risk spawn returns `pass-not-applicable` cheaply; a missed implementation review can ship a real bug):

```bash
non_docs=$(git diff "$BASE"...HEAD --name-only | grep -vE '\.(md|jsonl|txt)$|^\.gitignore$|^audits/|^docs/|^\.github/' || true)
[ -z "$non_docs" ] && CLASS=docs-only || CLASS=implementation
```

- `docs-only` → spawn `critic` + `architecture-reviewer`. Security / risk / qa are implicitly `pass-not-applicable`; do not spawn.
- `implementation` → spawn all four advisors + `qa-tester`.

If unsure, invoke `project-pm` first with the diff stat to get the classification, then proceed to step 2.
Do not invoke PM at this step. PM's role is synthesis only.

## Step 2 — spawn reviewers in parallel from main thread

In a single message, make multiple Agent calls (this is the parallel pattern):
In a single message, make N parallel Agent tool calls — one per applicable reviewer. Pseudocode (illustrative, not literal call syntax):

```
Agent(subagent_type: "critic", prompt: <branch + diff context + $ARGUMENTS>)
Agent(subagent_type: "architecture-reviewer", prompt: <same>)
Agent(subagent_type: "security-reviewer", prompt: <same>) # implementation only
Agent(subagent_type: "risk-reviewer", prompt: <same>) # implementation only
Agent(subagent_type: "qa-tester", prompt: <same>) # implementation only
# pseudocode — emit each as a real Agent tool call in one message
Agent(subagent_type: "critic", ...)
Agent(subagent_type: "architecture-reviewer", ...)
Agent(subagent_type: "security-reviewer", ...) # implementation only
Agent(subagent_type: "risk-reviewer", ...) # implementation only
Agent(subagent_type: "qa-tester", ...) # implementation only
```

Each reviewer brief should include: working dir, branch name vs integration branch, diff summary, scope hints from $ARGUMENTS.

## Step 3 — synthesize via project-pm
## Step 3 — synthesize via project-pm (single hop)

After all reviewers return, invoke `project-pm` once with the classification + their verbatim outputs and ask it to:

After all reviewers return, invoke `project-pm` with their verbatim outputs and ask it to:
- Compose the final gate summary (each reviewer's verdict, any blocks with override paths, final go/no-go).
- Record `block-soft` overrides or trade-off advisories into project memory.

Expand Down
97 changes: 97 additions & 0 deletions docs/codex-brief.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Codex brief schema

The canonical structure for any brief dispatched to `codex-executor` (directly via Agent, or indirectly via `scripts/codex-dispatch.sh`).

`codex-executor` rejects briefs missing the four required fields. PMs and main-thread dispatchers should always write briefs against this schema; reach for the optional macros below when the task warrants them.

## Required fields

| Field | What | Example |
|---|---|---|
| `working_dir` | Absolute path. Must exist. | `/home/screenleon/github/japanese-site/` |
| `goal` | One sentence. What changes after this runs. | "Backfill 40 N4 / 40 N3 / 40 N2 kanji entries to fill the empty middle-tier overlay." |
| `files` | Concrete paths or a search hint. Both create-new and edit-existing must be enumerated. | `server/data/corpus/kanji/{N4,N3,N2}.jsonl` (new); read `N1.jsonl` and `N5.jsonl` for schema |
| `acceptance` | Testable post-conditions Codex itself can verify before declaring done. | `wc -l shows 40/40/40`; every line parses as JSON; no character collisions across N1-N5 |

A brief missing any of these is a request for guesswork. Reject and ask the caller.

## Optional sections

Use as needed; not all briefs require all of them.

- **`constraints`** — what NOT to do. File paths off-limits, conventions to preserve, tests that must still pass after the change.
- **`self_verify`** — see macros below. Use whenever the work has external authority (sources, schema, level tag) Codex must not invent.
- **`output_format`** — when the deliverable is a report (audit, plan), specify the file path and required sections.
- **`sandbox`** / **`approval`** — only set when overriding the defaults (`workspace-write` / `never`). Caller must authorize.

## Self-verify macros

Reusable phrases. Drop into `self_verify` block of any brief.

### `cross-source` — N independent sources per item

> For each `<item type>`, cross-check against ≥`<N>` independent authoritative sources from `<allowed source list>`. Cite the source in the report column. Do not rely on internal knowledge alone.

Used for: JLPT level audits, fact-checked content batches.

### `sample-N OK re-check` — false-positive guard

> After producing the report, sample `<N>` random items you marked `OK` and re-verify them against the same source list. Note the re-check at the report footer (which items, sources re-checked, conclusion held).

Used when most items will be `OK`; protects against rubber-stamping.

### `git-status no-collateral-damage` — scope discipline

> When done, run `git status --short` and confirm only the brief's allowlisted files changed. If anything else is modified or created, flag it in the report — do not claim success.

Used for any brief with a strict allowlist (audit reports that must NOT touch source data, etc.).

### `dedup-across-N` — collision check across multiple files

> After writing, concatenate `<files>` and confirm <key> is unique across all of them (`sort -u | wc -l` matches input line count).

Used when the same key (e.g. kanji character) must not appear in multiple level files.

### `schema-match` — preserve existing JSONL/JSON shape

> Read `<reference file>` first to learn the schema. Every new entry must include exactly the same keys, same types, same canonical values for non-content fields (e.g. `source`, `license`).

Used when extending an existing data file family.

## Example brief (good)

```
working_dir: /home/screenleon/github/japanese-site/
goal: Audit PR #8 N1 corpus additions for JLPT level appropriateness — flag mis-classification.
files:
- read: server/data/corpus/grammar/N1/{ga-hayai-ka,...}.{json,examples.jsonl}
- read: server/data/corpus/vocab/N1.jsonl (60 net-new rows)
- read: server/data/corpus/kanji/N1.jsonl (40 net-new rows)
- write: audits/pr-8-jlpt-level-audit-2026-05-02.md
constraints:
- READ-ONLY on all files under server/data/corpus/
- Only write the audit report
self_verify:
- cross-source: vocab+kanji ≥1 source from Jisho/Tanos; grammar ≥2 sources from Bunpro/JLPT-Sensei/Maggie/Tanos
- sample-N OK re-check: 3 random OK items at the footer
- git-status no-collateral-damage: only the audit file as new
acceptance:
- report file exists at the specified path
- every flagged entry has citations
- footer contains the 3-item OK re-check
output_format: markdown with three grouped sections (## OK / ## needs review / ## re-classify); table columns per row
```

## Example brief (bad — would be rejected)

```
"Audit the N1 stuff and flag anything wrong."
```

No working_dir, no files, no acceptance criteria. Codex would have to guess what corpus, what sources, how to verify, where to write the report. Reject and ask.

## Style notes

- Prose is fine; YAML-like keys above are conventions, not strict syntax. Use a heredoc when piping the brief on stdin.
- Keep briefs in the active voice ("Audit X", not "X should be audited"). Codex parses imperatives more reliably than declaratives.
- Don't write implementation steps. The brief tells Codex *what* and *what counts as done*; *how* is Codex's job.
7 changes: 7 additions & 0 deletions install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,13 @@ echo " claude home: $CLAUDE_HOME"
if [[ "$DRY_RUN" -eq 1 ]]; then echo " mode: DRY RUN"; fi
echo

# Pre-flight: agent frontmatter must not declare Agent (subagents can't spawn subagents)
if [[ -x "$REPO_ROOT/scripts/lint-agents.sh" ]]; then
echo "==> lint agents"
"$REPO_ROOT/scripts/lint-agents.sh"
echo
fi

install_dir agents
install_dir skills
install_dir commands
Expand Down
59 changes: 59 additions & 0 deletions scripts/lint-agents.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#!/usr/bin/env bash
# Validate that no subagent declares the `Agent` tool in its frontmatter.
# Claude Code strips Agent from subagent runtime schemas regardless of
# declaration, so leaving it in frontmatter is misleading drift.
# See README.md "Design notes" for the rule.

set -euo pipefail

repo_root="$(cd "$(dirname "$0")/.." && pwd)"
agents_dir="$repo_root/agents"

if [ ! -d "$agents_dir" ]; then
echo "lint-agents: $agents_dir not found" >&2
exit 2
fi

violations=0
for f in "$agents_dir"/*.md; do
[ -e "$f" ] || continue

# Require well-formed frontmatter (>=2 `---` markers)
fence_count=$(grep -c '^---$' "$f" || true)
if [ "$fence_count" -lt 2 ]; then
echo "WARN: $(basename "$f") has no YAML frontmatter; skipping" >&2
continue
fi

# Extract content between the first two `---` lines
fm=$(awk '/^---$/{c++; next} c==1' "$f")

# Capture the tools: block — the inline scalar form OR a block-list
# spanning indented `- ` lines until the next non-indented key.
tools_block=$(printf '%s\n' "$fm" | awk '
/^tools:/ { print; in_block=1; next }
in_block {
if ($0 ~ /^[[:space:]]/) { print; next }
else { in_block=0 }
}
')

if [ -z "$tools_block" ]; then
continue
fi

# Match Agent as a whole token. Separators in YAML scalar/flow:
# `,` ` ` `[` `]` `:` and start/end of line. In block-list, `-` precedes.
if printf '%s' "$tools_block" | grep -qE '(^|[][, :-])Agent([][, :]|$)'; then
echo "FAIL: $(basename "$f") declares Agent in tools" >&2
printf '%s\n' "$tools_block" | sed 's/^/ /' >&2
violations=$((violations + 1))
fi
done

if [ "$violations" -gt 0 ]; then
echo "lint-agents: $violations violation(s). Subagents cannot spawn subagents — remove Agent from tools." >&2
exit 1
fi

echo "lint-agents: OK ($(ls "$agents_dir"/*.md 2>/dev/null | wc -l) agent files checked)"
Loading