Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions BACKLOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,7 @@ CC-001/CC-002 were consumed by PR #24 fix bundle inline, with no standalone entr
| CC-247 | 🔵 active | **[Reuse debt]** `th_init --format=<preset>` — extract the 6 surviving per-file `pass`/`fail` print-format overrides into 6 named harness presets (CC-203 GROUP-B residue). Mechanical; no behavior change. | ops/test | 2026-05-23 | pr:#142 | P2 | reuse-debt |
| CC-248 | 🔵 active | **[Reuse debt]** `th_init --fail-fast` — promote the 3 fail-fast test scripts (test-usage-weekly, test-usage-tracker, test-skill-refine) from per-script `exit 1` overrides to a first-class harness option. | ops/test | 2026-05-23 | pr:#142 | P3 | reuse-debt |
| CC-249 | ⏸ deferred | **[Reuse debt]** Consolidate divergent `assert_*` helpers in `scripts/lib/test-harness.sh` (3-way `assert_contains` divergence + `assert_exit` arg-order conflict). Gated by a `/pre-impl` spike — do NOT start before the spike resolves the unified API shape. | ops/test | 2026-05-23 | pr:TBD | P3 | reuse-debt |
| CC-250 | 🔵 active | **[/pr-gate v2: machine-readable result + escalation]** Bundle: (A) YAML frontmatter on every gate result file (`gate_result_version: pr_gate_result_v1` + final/tier/mode/most_severe/reviewers/escalation), (B) `## Escalation` body section emitted by both sequential + parallel synthesis briefs (recommended=true requires sensitive-path AND non-fatal-uncertain reviewer verdict), (C) `--base` fallback prepends `gh pr view --json baseRefName` when available, (D) `## Override policy` section in each of the 5 reviewer agent .md files consolidating override discipline already prose-scattered. Preserves `^Final: GO\|NO-GO$` line for validate.sh + downstream parser back-compat. Out-of-scope: structured `verdict:` enum (CC-231), backlog_candidates output (CC-215), auto-escalation execution. | gate/ops | 2026-05-23 | pr:TBD | P2 | oss |

---

Expand Down Expand Up @@ -1566,3 +1567,25 @@ Add `scripts/spike-validate.sh` (mirror `handover-validate.sh`) + `scripts/gen-b
**Priority**: P3 — reuse-debt; correctness ceiling, not a daily friction. Spike first, implement second.

**Cross-link**: CC-203 (origin epic), CC-247 / CC-248 (sibling harness work — separate PR A), `commands/pre-impl.md` (spike gate).

## CC-250 — `/pr-gate v2`: machine-readable result + escalation hint(active)

**Problem**: `/pr-gate` result files today are prose-only Markdown with a `Final: GO|NO-GO` grep target; consumers (validate.sh, downstream automation) can read the binary verdict but cannot see per-reviewer verdicts, mode, tier, or whether the gate recommends a follow-up targeted re-gate without parsing the prose. Reviewer override discipline is scattered across `agents/project-pm.md` + each reviewer's verdict scale, so a person reading one reviewer agent cannot find its override policy without cross-reference.

**Why**: As `/pr-gate` matures and feeds into the v0.3.0 M1 runtime layer (CC-231 reviewer-policy extraction, CC-215 pmctl), the result file becomes the contract between the gate and downstream tooling. A typed frontmatter + an explicit escalation hint section lets future consumers act on the gate output without re-parsing prose. Override-policy consolidation eliminates the per-reviewer documentation gap noticed during recurring gate cycles.

**Requirement**:
- **A**. Every gate result file (sequential + parallel) starts with a YAML frontmatter block (`gate_result_version: pr_gate_result_v1`, `final`, `tier`, `mode`, `most_severe`, `reviewers:` map with every reviewer in `$ALL_REVIEWERS` keyed to verdict-or-`skipped`, `escalation:` block). Existing `Final: GO|NO-GO` line in `## Gate Conclusion` is preserved verbatim.
- **B**. New `## Escalation` body section mirrors the frontmatter `escalation:` block. Both empty-list (recommended=false) and populated cases are valid emissions. Trigger: sensitive-path keyword in diff AND at least one reviewer returned advise|block-soft.
- **C**. `--base` detection prepends `gh pr view --json baseRefName` when no `--base` flag is given and `gh` is on PATH; gracefully degrades to current `origin/HEAD → main` chain.
- **D**. Each of the 5 reviewer agent .md files gains a `## Override policy` section consolidating discipline already documented in `agents/project-pm.md` §"User override discipline".

**Acceptance**:
- `test-pr-gate.sh` adds at least 3 new cases: (a) sequential result file starts with valid YAML frontmatter containing `gate_result_version: pr_gate_result_v1` + `final:` + `reviewers:` map; (b) `## Escalation` section is present with `**Recommended**:` line; (c) `^Final: (GO|NO-GO)$` line still present and unique (back-compat).
- `bash pm/scripts/validate.sh BACKLOG.md DECISIONS.md CHANGELOG.md 2>&1 | grep -c '^E-'` ≤ 30 (baseline).
- `bash scripts/run-tests.sh` exit 0.
- `shellcheck --severity=style scripts/pr-gate.sh` exits 0.

**Priority**: P2 — gate-infra prerequisite for v0.3.0 M1 (CC-231 reviewer-policy extraction depends on a typed gate result surface to extract policy from).

**Cross-link**: CC-231 (M1 reviewer-policy extraction consumer of this typed surface), CC-215 (pmctl downstream consumer), CC-208 (gate reviewer hallucination — related gate hardening), `MILESTONES.md` §v0.3.0 M1 prerequisite sub-table.
6 changes: 6 additions & 0 deletions MILESTONES.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,12 @@
| CC-217 | claude-executor 背景 dispatch | ✅ (#124) |
| CC-060 | Codex model/config 外部化 | ✅ (#131) |

### M1 prerequisite — gate-infra typed surface

| 票號 | 說明 | 狀態 |
|---|---|---|
| CC-250 | `/pr-gate v2` typed result + escalation hint(為 CC-231 reviewer-policy 抽取提供 typed gate output surface) | ⏳ |

### M1 — state / schema substrate(核心交付)

| 票號 | 說明 | 狀態 |
Expand Down
4 changes: 4 additions & 0 deletions agents/architecture-reviewer.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,7 @@ verdict: <2-3 sentences>
- Be specific: "Module A imports three internals of module B (lines X/Y/Z); should go through B's public interface" — not "coupling too high".
- If the diff is too small for architecture review, say so and pass.
- **Scope rule**: Only block on structural issues *introduced or worsened by this PR's diff*. Pre-existing design gaps the diff does not touch must be `advise` at most. If unsure whether an issue pre-existed, check `git log`/`git blame` before blocking.

## Override policy

`block-soft` is overridable. PM may override with written reasoning recorded in project memory (`Decisions / constraints` section) and shown in the gate summary. Architecture verdicts are advisory; structural concerns surfaced as `advise` do not block. See `agents/project-pm.md` §"User override discipline".
4 changes: 4 additions & 0 deletions agents/critic.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,7 @@ verdict: <2-3 sentences>
- Never use `block-soft` for taste-level disagreements — those go in `advise`.
- Be specific. "Line 42 swallows the DB error and returns 200" — not "error handling could be better".
- **Scope rule**: Only block on issues *introduced or worsened by this PR's diff*. Pre-existing issues the diff does not touch must not be blocking verdicts — list them as `advise` at most. If unsure whether an issue pre-existed, use `git log`/`git blame` to verify before blocking.

## Override policy

`block-soft` is overridable. PM may override with written reasoning recorded in project memory (`Decisions / constraints` section) and shown in the gate summary. `advise` is non-blocking by definition. See `agents/project-pm.md` §"User override discipline" for the override protocol.
4 changes: 4 additions & 0 deletions agents/qa-tester.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,3 +96,7 @@ verdict: <2-3 sentences — does the test phase clear the gate?>
- Never silently skip a category. `N/A` requires a reason.
- The three red lines are absolute. No `sleep(N)` async waits; no mocking SUT's own logic; no test you don't expect to fail when the impl breaks.
- **Scope rule**: Only block on test coverage gaps for behavior *introduced or changed by this PR's diff*. Missing coverage for pre-existing untested code is a non-blocking audit finding at most — file it as a separate issue, not a blocker on this PR.

## Override policy

`block` from a red-line violation (test wouldn't fail under plausible mutation; mocked SUT logic; `sleep`-based async sync; missing coverage for new behavioral unit) is **not PM-overridable**. The user must accept the reviewer's stated `override_path` verbatim; PM cannot self-override. `needs-tests` is the normal pre-write state, not a block. See `agents/project-pm.md` §"User override discipline".
5 changes: 5 additions & 0 deletions agents/risk-reviewer.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,3 +71,8 @@ override_path: <exact statement user must make, or "none — must be fixed">
- Risk ≠ security: "what breaks if wrong" not "what an attacker does" (that's security-reviewer).
- Be specific: "Adding NOT NULL to `users.role` (migration 0042 line 12) on a 50M-row table without backfill locks writes during ALTER and fails under concurrent INSERTs" — not "migration is risky".
- **Scope rule**: Only block on risks *introduced or worsened by this PR's diff*. Pre-existing risks the diff does not touch must be `advise` at most. If unsure whether a risk pre-existed, verify with `git log`/`git blame` before issuing a block.

## Override policy

`block` is **not PM-overridable** — same hard-gate discipline as security-reviewer. Only the user overrides, by quoting the reviewer's `override_path:` verbatim (or by stating the same scope in their own words). PM may not paraphrase or imply consent.
See `agents/project-pm.md` §"User override discipline".
4 changes: 4 additions & 0 deletions agents/security-reviewer.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,3 +69,7 @@ override_path: <exact statement user must make to override, or "none — must be
- Be reproducible: cite file:line and the exact problematic construct.
- If the diff is too large to review thoroughly, decline — do not rubber-stamp.
- **Scope rule**: Only block on vulnerabilities *introduced or worsened by this PR's diff*. Pre-existing security gaps the diff does not touch must be `advise` at most — they warrant a separate issue, not a PR block. Verify pre-existence with `git log`/`git blame` before blocking.

## Override policy

`block` is **not PM-overridable**. Only the user overrides, and only by quoting the reviewer's `override_path:` verbatim (or by stating the same scope in their own words). PM may not paraphrase, summarise, or imply consent on the user's behalf. See `agents/project-pm.md` §"User override discipline".
90 changes: 85 additions & 5 deletions scripts/pr-gate.sh
Original file line number Diff line number Diff line change
Expand Up @@ -84,8 +84,23 @@ cd "$WORK_DIR"
if [[ -n "$BASE_OVERRIDE" ]]; then
BASE="$BASE_OVERRIDE"
else
BASE=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's@^refs/remotes/origin/@@' || true)
: "${BASE:=main}"
if command -v gh >/dev/null 2>&1; then
if GH_BASE=$(gh pr view --json baseRefName -q .baseRefName 2>/dev/null); then
if [[ -n "$GH_BASE" ]]; then
BASE="$GH_BASE"
printf 'pr-gate: base detected from gh pr view: %s\n' "$BASE"
else
BASE=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's@^refs/remotes/origin/@@' || true)
: "${BASE:=main}"
fi
else
BASE=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's@^refs/remotes/origin/@@' || true)
: "${BASE:=main}"
fi
else
BASE=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's@^refs/remotes/origin/@@' || true)
: "${BASE:=main}"
fi
fi
if ! git rev-parse --verify "$BASE" > /dev/null 2>&1; then
printf 'Error: base ref not found: %s\n' "$BASE" >&2
Expand Down Expand Up @@ -388,6 +403,24 @@ task:
Write the complete result to ${OUTPUT_FILE}.

output_format: |
---
gate_result_version: pr_gate_result_v1
final: GO|NO-GO
tier: express|standard|full|targeted
mode: sequential
most_severe: approve|advise|block-soft|block
reviewers:
critic: approve|advise|block-soft|skipped
qa-tester: pass|needs-tests|block|skipped
architecture-reviewer: approve|advise|block-soft|skipped
security-reviewer: pass|block|pass-not-applicable|skipped
risk-reviewer: pass|block|pass-not-applicable|skipped
escalation:
recommended: true|false
reviewers: []
reason: []
---

# PR-Gate Result — ${TIER} tier (codex mode)
**Date**: $(date '+%Y-%m-%d')
**Reviewers**: ${REVIEWER_DISPLAY}
Expand All @@ -407,9 +440,19 @@ output_format: |
## Gate Conclusion
**Overall verdict**: {most severe}
**Most severe individual verdict**: {most severe}
Final: GO | NO-GO
Final: GO|NO-GO
{required fixes if NO-GO; override path if any block-soft}

## Escalation
**Recommended**: true|false
**Reviewers**: <comma-list or "none">
**Reason**:
- <bullet> (or "none" when recommended=false)

Escalation is recommended when:
(a) any diff file matches (^|[/_.-])(auth|oauth|jwt|session|secret|password|token|credential|cors|csrf|webhook|sudo|ssh|payment|billing)([/_.-]|$)|(^|/)migrations?/|^\.github/
(b) at least one reviewer returned advise|block-soft.

self_verify:
- file-exists: ${OUTPUT_FILE}
- has-conclusion: grep -c 'Final' ${OUTPUT_FILE} should be >= 1
Expand Down Expand Up @@ -737,6 +780,24 @@ task:
5. Write the complete consolidated result to ${OUTPUT_FILE}.

output_format: |
---
gate_result_version: pr_gate_result_v1
final: GO|NO-GO
tier: ${TIER}
mode: parallel
most_severe: approve|advise|block-soft|block
reviewers:
critic: approve|advise|block-soft|skipped
qa-tester: pass|needs-tests|block|skipped
architecture-reviewer: approve|advise|block-soft|skipped
security-reviewer: pass|block|pass-not-applicable|skipped
risk-reviewer: pass|block|pass-not-applicable|skipped
escalation:
recommended: true|false
reviewers: []
reason: []
---

# PR-Gate Result — ${TIER} tier (parallel codex mode)
**Date**: $(date '+%Y-%m-%d')
**Reviewers**: ${REVIEWER_DISPLAY}
Expand All @@ -758,9 +819,18 @@ output_format: |
## Gate Conclusion
**Overall verdict**: {most severe across all reviewers}
**Most severe individual verdict**: {most severe}
Final: GO | NO-GO
Final: GO|NO-GO
{required fixes if NO-GO; override path if any block or block-soft}

Required fixes before GO: {bulleted list if NO-GO; "none" if GO}
## Escalation
**Recommended**: true|false
**Reviewers**: <comma-list or "none">
**Reason**:
- <bullet> (or "none" when recommended=false)

Escalation is recommended when:
(a) any diff file matches (^|[/_.-])(auth|oauth|jwt|session|secret|password|token|credential|cors|csrf|webhook|sudo|ssh|payment|billing)([/_.-]|$)|(^|/)migrations?/|^\.github/
(b) at least one reviewer returned advise|block-soft.

Recommended follow-ups:
{non-blocking improvements from advise-level findings, if any}
Expand Down Expand Up @@ -803,6 +873,16 @@ SBRIEF_P2
"$SYNTHESIS_FINAL" "$SHELL_FINAL" >&2
exit 1
fi
FRONTMATTER_FINAL=$(awk 'BEGIN{s=0} /^---$/ { if (s == 0) { s=1; next } else if (s == 1) { exit } } s && $1 == "final:" { print $2; exit }' "$OUTPUT_FILE")
if [[ -z "$FRONTMATTER_FINAL" ]]; then
printf 'Error: gate result frontmatter final missing: cannot verify shell/Synthesis parity.\n' >&2
exit 1
fi
if [[ "$FRONTMATTER_FINAL" != "$SHELL_FINAL" ]]; then
printf 'Error: frontmatter final (%s) does not match shell-computed verdict (%s).\n' \
"$FRONTMATTER_FINAL" "$SHELL_FINAL" >&2
exit 1
fi

# Verify reviewer artifact files were not modified by synthesis.
# These are gitignored and not covered by the tracked-file hash above.
Expand Down
Loading