Skip to content

plan: Phase 0 design-lock for quality-gate structured-marker contract#226

Merged
cbeaulieu-gt merged 1 commit intomainfrom
issue-185-quality-gate-structured-marker
May 7, 2026
Merged

plan: Phase 0 design-lock for quality-gate structured-marker contract#226
cbeaulieu-gt merged 1 commit intomainfrom
issue-185-quality-gate-structured-marker

Conversation

@cbeaulieu-gt
Copy link
Copy Markdown
Member

Summary

Phase 0 design-lock for issue #185 (Adopt structured marker contract for quality-gate). Ships the implementation plan only; no code changes. Phases 1-4 are separate PRs.

Refs #185, #183, #223. Closures land in Phase 4 (#185, #183, #223).

Background

Issue #223 is the concrete reproducer: PR #222's review used ### Critical Issues Found 🚨 heading + **two critical inconsistencies** lowercase phrasing; the gate's prose-regex didn't match either pattern; gate posted success despite blocker findings. The strategic fix has been long-tracked as #185 (Option A: HTML-comment JSON), but no schema or rollout had been locked. This plan locks 8 design decisions and phases the implementation.

Locked design decisions (full table in plan file)

# Decision Lock
1 Schema JSON block, sentinel claude-pr-review-summary-v1, required schemaVersion + findings.{critical,high,medium,low}
2 Marker placement Last in review body; persona reserves ~2 final turns (honor system)
3 Verdict source Derived by gate (blocking if critical>0 OR high>0)
4 Shadow mechanism Second commit status claude-pr-review/quality-gate-shadow, advisory
5 Cutover ≥20 reviews / ≥5 PRs / zero disagreement / zero marker-missing
6 Phase 4 marker-missing Fail-closed; recovery via gh run rerun
7 #183 disposition Open through Phase 3, close at Phase 4
8 Plan location docs/superpowers/plans/2026-05-06-quality-gate-structured-marker.md

Phasing

Phase PR shape Closes
0 (this PR) Plan file only
1 Persona emits marker; rebuild review overlay; repoint digest pin
2 Gate runs both regex (auth) + marker (advisory shadow status); fixture corpus seeded with PR #222 review body
3 Cutover — marker authoritative; regex retained as fallback for missing markers
4 Cleanup — remove regex; marker-missing fails closed #185, #183, #223

Why Option A (HTML-comment JSON), not B or C

#185 originally proposed three options:

  • A (HTML-comment JSON) — the locked choice. Fully implementable in this repo with no upstream cooperation; jq-parseable; survives Markdown rendering invisibly.
  • B (action step outputs) — would require upstream anthropics/claude-code-action cooperation; out of our control.
  • C (separate artifact) — decouples comment from signal; overkill for current needs.

Files

docs/superpowers/plans/2026-05-06-quality-gate-structured-marker.md (+406 lines, new file)

Test plan

  • Plan file committed to issue-185-quality-gate-structured-marker (not main)
  • All 8 locked decisions preserved verbatim in the plan
  • Phase 4 acceptance criteria includes closes #185, closes #183, closes #223
  • Out-of-scope section explicitly names Quality-gate fails open on transient gh api failure #182 and multi-bot collision
  • Reviewer eye on whether the JSON schema matches what the persona will reasonably emit (no surprise required-fields gotchas)
  • Reviewer eye on whether the cutover criterion (≥20/≥5/zero) is achievable given current PR cadence
  • Reviewer eye on Phase 1's acceptance — manual test PR is the validation, not CI

Out of scope (carried forward in plan)

🤖 Generated by Claude Code on behalf of @cbeaulieu-gt

Phase 0 design-lock for issue #185 (Adopt structured marker contract for
quality-gate). Implements Option A (HTML-comment JSON block). Locks 8
design decisions made during planning conversation: schema format, marker
placement, verdict source, shadow-mode mechanism, cutover criterion,
Phase 4 marker-missing handling, #183 disposition, plan-file location.

Phases 1-4 are separate PRs; this one ships the plan only. References
#222/#223 reproducer; PR #222's review body becomes Phase 2's regression
fixture.

Refs #185, #183, #223.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Claude finished @cbeaulieu-gt's task in 2m 44s —— View job


PR Review Complete

I've reviewed the Phase 0 design-lock plan for the quality-gate structured-marker contract. This is a well-structured design document with clear decisions and phasing. Below are my findings organized by severity.


✅ Strengths

  1. Excellent phasing strategy: Shadow-mode (Phase 2) before cutover (Phase 3) is the right approach for de-risking the transition
  2. Clear locked decisions: All 8 decisions are explicitly documented and authoritative
  3. Strong rationale for Option A: HTML-comment JSON is the right choice given the constraints
  4. Fail-closed posture: Phase 4's strict marker-missing handling is appropriate after the shadow period
  5. Test fixtures: Using PR docs: refresh consumer-facing README for Phase 5+ container-pinned workflows #222 as the regression case directly addresses issue quality-gate reports success on review with explicit "two critical" language and "### Critical Issues Found" heading #223
  6. Explicit scope boundaries: "Out of scope" section prevents feature creep

🟡 Medium-Priority Concerns

1. Schema validation specification missing (Phase 2)

Location: Lines 226-234 (Phase 2 deliverable)

The plan specifies what the marker parser should do but not how it validates the schema. Should explicitly specify:

  • Field presence validation (all 4 required fields present?)
  • Type validation (all values are integers?)
  • Range validation (reject negative numbers? maximum value?)
  • Malformed JSON handling (syntax errors vs missing fields)

Recommendation: Add a "Schema validation rules" subsection under "Marker schema (v1)" that specifies:

- All fields (critical, high, medium, low) MUST be present
- All values MUST be non-negative integers
- Values outside [0, 2^31-1] are rejected as invalid
- Missing fields → marker_invalid
- Syntax errors → marker_invalid

2. Cutover criterion tracking mechanism unclear

Location: Line 291-295 (Phase 3 cutover criterion)

Phase 3 requires "≥20 reviews across ≥5 PRs" to be documented, but there's no aggregator script mentioned (unlike Phase 2-3 which has marker-emission-aggregate.yml). The PR body will need to link to workflow runs that prove this criterion is met.

Recommendation: Add to Phase 2's deliverables: extend marker-emission-aggregate.yml to also track successful reviews (not just marker_missing events) so the Phase 3 cutover data is automatically available.

3. Phase 1 manual validation is fragile

Location: Lines 208-209 (Phase 1 acceptance)

"After merge, the next 5 PR reviews on this repo each contain the marker block (manual visual check)" assumes the persona will work correctly. If there's a turn-budget regression, reviews might silently omit the marker.

Recommendation: Consider adding a Phase 1.5 validation script that greps recent bot comments for the marker sentinel and alerts if missing. This could run weekly and open an issue, similar to overlay-smoke.yml.

4. Backward compatibility for future schema versions unspecified

Location: Lines 136-138 (Backward compatibility)

The plan says "The gate MUST accept schemaVersion: 1 indefinitely" but doesn't specify what happens when the persona emits schemaVersion: 2. Will the gate:

  • Reject it as invalid?
  • Fall back to prose-regex?
  • Attempt to parse known fields and ignore unknown ones?

Recommendation: Add a forward-compatibility rule to the schema section:

The gate MUST reject unknown schemaVersion values (>1) as marker_invalid until
the gate is updated to handle that version. This prevents silent misparsing.

5. Phase 4 K=10 tracking has no automation

Location: Line 353 (Phase 4 acceptance)

"K=10 consecutive marker-authoritative runs across ≥3 distinct PRs since Phase 3 cutover documented in PR body" has no tracking script mentioned.

Recommendation: Reuse the marker-emission-aggregate.yml from Phase 2 or add a step to count successful marker-derived verdicts. Otherwise this is purely manual and error-prone.


🔵 Low-Priority / Clarifications

6. Turn reservation discipline is vague

Location: Line 58 (locked decision #2)

"Reserve final 2 turns for marker emission" — the tilde () makes this ambiguous. Is it 2 turns or 3?

Recommendation: Change to "Reserve final 2-3 turns" or "Reserve at least 2 final turns" for clarity.

7. Verdict rule doesn't explicitly state the zero-findings case

Location: Lines 125-130 (Verdict rule)

The rule says if critical > 0 OR high > 0: blocking, else: clean, but doesn't explicitly say "all zeros → clean". This is implied but could be clearer.

Recommendation: Add an explicit example:

Example: findings: {critical: 0, high: 0, medium: 0, low: 0} → verdict "clean" → state=success

8. Multi-bot collision acknowledged but not mitigated

Location: Lines 371-374 (Out of scope)

The plan explicitly defers multi-bot scenarios. This is fine for Phase 0, but the gate currently fetches "most recent comment authored by github-actions[bot]" (line 238-249 of pr-review/action.yml). If multiple bots start posting, this could break.

Recommendation: Add a note in "Out of scope" that if multi-bot support is needed in the future, the marker sentinel should include a bot-specific prefix (e.g., claude-pr-review-summary-v1-glitchwerks-bot).

9. PR #222 comment ID should be verified

Location: Line 142-143 (Regression fixture)

The plan references "issue-comment id 4392784955" as PR #222's review body. Should verify this is the correct comment before using it as the regression fixture in Phase 2.

Recommendation: In Phase 2 PR, include a verification step that fetches this comment ID and confirms it contains the expected text ("Critical Issues Found 🚨").


📋 Questions for Author (from PR body)

The PR body asks for reviewer input on two questions:

Q1: "Reviewer eye on whether the JSON schema matches what the persona will reasonably emit"

Answer: The schema is minimal and reasonable. The persona should be able to emit this reliably. However:

Q2: "Reviewer eye on whether the cutover criterion (≥20/≥5/zero) is achievable given current PR cadence"

⚠️ Cannot assess without data: I don't have access to historical PR metrics for this repo. The cutover criterion requires:

  • 20 reviews across 5 PRs
  • Zero disagreements between prose-regex and marker
  • Zero marker-missing events

If the repo averages 5+ PRs per month with automated reviews, this is achievable in 1-2 months. If cadence is lower, it could take 3-6 months. Recommendation: Run a historical query to check how many PRs with automated reviews occurred in the last 3 months.


✅ Phase 0 Acceptance Criteria Status

Checking against the plan's own Phase 0 acceptance criteria (lines 161-164):


🎯 Recommendation

APPROVE with requested changes. The plan is sound and well-structured. Address the medium-priority concerns (especially #1 schema validation and #2 cutover tracking) before Phase 1 implementation. The low-priority items can be refined during Phase 1-2 implementation.


📝 Summary

This is a high-quality design document that locks a concrete solution to the prose-regex fragility problem. The phasing is well-thought-out, and the fail-closed approach in Phase 4 is the right call. The main gaps are around schema validation specification and automated tracking for the cutover criteria. Once those are addressed in the plan or in Phase 1/2 implementation, this should be a smooth rollout.


@cbeaulieu-gt cbeaulieu-gt merged commit f396c27 into main May 7, 2026
4 checks passed
@cbeaulieu-gt cbeaulieu-gt deleted the issue-185-quality-gate-structured-marker branch May 7, 2026 01:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant