RFC: Deterministic pre-agentic review gate to cut token burn (draft-until-green) #743

don-petry · 2026-06-11T19:47:00Z

don-petry
Jun 11, 2026
Maintainer

Idea: A deterministic pre-agentic review gate to cut token burn

Problem. Our paid agentic reviewers (Dev-Lead, CodeRabbit, Copilot, the .github-private review agent) spend reasoning — and tokens — on issues a --fix flag or a type checker would have erased for free. Agent-authored code fails in characteristic, cheaply-detectable ways (hallucinated imports, dead code, format drift, plausible-but-wrong security patterns). We're paying LLM rates to catch lint debt.

Goal. Run the deterministic, $0 checks first, auto-fix what's auto-fixable, and only let token-billed agents engage once the cheap tier is green — so their reasoning is spent on real logic, not cleanup.

The frame: ordering, not just tools

Tier 0  AUTO-FIX (commits silently; agents never see it)
        ruff --fix · biome check --write · prettier · gofmt · eslint --fix
                 │  trivial issues gone at $0
                 ▼
Tier 1  FAIL-LOUD DETERMINISTIC GATE (blocks PR, no tokens)
        type check (tsc/mypy) · semgrep · ruff/golangci-lint · knip · SonarCloud · gitleaks
                 │  PR must be green here…
                 ▼
Tier 2  AGENTIC REVIEW (only now do we spend tokens)
        Dev-Lead · CodeRabbit · Copilot · pr-review agent

The leverage is the Tier 1 → Tier 2 gate.

What we already have (and it's good)

pr-auto-review-reusable.yml is a model of this pattern: it dispatches the heavy .github-private review agent only when all CI checks pass, the PR is non-draft, there's no CHANGES_REQUESTED, and no unresolved threads. That reviewer is already correctly gated. ✅

The leak is the other reviewers, which engage on PR-open before CI goes green:

Reviewer	Trigger today	Gated on green CI?
`.github-private` review agent (via `pr-auto-review`)	dispatched on CI success	✅ yes
CodeRabbit (GitHub App)	PR open + every push (incremental)	❌ no
Copilot native review	PR open	❌ no
Dev-Lead	`pull_request: [opened, synchronize]`	❌ no (proactive path)

Proposal: "draft until deterministic-green" — one gate for every reviewer

Every reviewer above already respects (or can be configured to respect) draft state. So instead of bolting a CI-precondition onto each one separately, use a single convention:

Agents open PRs as draft. A workflow flips the PR to ready_for_review the moment Tier-1 CI is green.

Result: on a red PR, zero paid reviewers engage. When CI turns green, the promote step marks the PR ready, which fires pull_request: ready_for_review → pr-auto-review dispatches the heavy reviewer, and CodeRabbit/Copilot pick it up — all at once, all post-green.

Diff A — new pr-promote-when-green.yml reusable (promotes only agent/bot PRs; humans keep control of their own readiness):

name: Promote Draft PR When Green (Reusable)
on:
  workflow_call:
    secrets:
      GH_PAT_WORKFLOWS: { required: true }
jobs:
  promote:
    runs-on: ubuntu-latest
    permissions:
      pull-requests: write
      checks: read
      actions: read
    steps:
      - name: Mark agent draft PR ready when deterministic CI is green
        env:
          GH_TOKEN: ${{ secrets.GH_PAT_WORKFLOWS }}
        run: |
          set -euo pipefail
          [ "${{ github.event.workflow_run.conclusion }}" = "success" ] \
            || { echo "CI not green — leave as draft"; exit 0; }
          SHA="${{ github.event.workflow_run.head_sha }}"
          PR=$(gh api "/repos/${{ github.repository }}/commits/${SHA}/pulls" \
            --jq '[.[] | select(.state=="open" and .draft==true)][0]')
          [ -n "$PR" ] && [ "$PR" != "null" ] || { echo "No open draft PR for $SHA"; exit 0; }
          NUM=$(echo "$PR" | jq -r '.number')
          AUTHOR=$(echo "$PR" | jq -r '.user.login')
          case "$AUTHOR" in
            *"[bot]"|donpetry-bot)
              gh pr ready "$NUM" --repo "${{ github.repository }}"
              echo "::notice::Promoted #$NUM to ready_for_review (Tier-1 CI green)";;
            *) echo "Human-authored #$NUM — not auto-promoting";;
          esac

Caller stub triggers on workflow_run: { workflows: ["CI"], types: [completed] }.

Diff B — .coderabbit.yaml org standard (skip drafts so CodeRabbit waits for promotion):

reviews:
  auto_review:
    drafts: false        # do not review until promoted to ready_for_review

Diff C — Copilot: set repo/org review setting to not auto-review drafts (native Copilot review honors draft state).

Diff D — AGENTS.md directive: "When opening a PR programmatically, open it as draft. Promotion to ready_for_review is automated by pr-promote-when-green once the deterministic CI tier passes." Dev-Lead's proactive pull_request path early-exits on draft.

Open question for maintainers: Dev-Lead's intent classifier (dev-lead-intent.sh, in .github-private) — does its pull_request: opened/synchronize path do anything billable before CI, or is it a cheap no-op until a real intent (CI-failure / review / mention) arrives? If the latter, Diff D is belt-and-suspenders; if the former, it's the main saving.

Type-check enforcement audit (Task 3)

Checked every non-archived repo for whether typed code is matched by a CI type-check step:

Repo	Typed code?	Type-check in CI	Verdict
`markets`	TS frontend	✅ `npx tsc --noEmit`	Compliant
`broodly`	TS (pnpm)	✅ `pnpm run typecheck`	Compliant
`google-app-scripts`	TS	✅ `npm run typecheck`	Compliant
`ContentTwin`	none (shell)	n/a	Not applicable
`bmad-bgreat-suite`	none (shell/skills; 0 `.py`)	n/a	Not applicable
`.github-private`	none (shell)	n/a	Not applicable
`TalkTerm`	not yet (BMAD docs; TS/Electron planned)	❌ CI runs only gitleaks	Watch

Headline: enforcement is solid where it matters — all three repos with real TS already run a type-check. This is not a widespread gap.

Two real findings:

TalkTerm has a stub CI (gitleaks only). When its Electron/TS code lands, it needs the full ci.yml (lint + type-check + test) onboarded before the first feature PR.
The systemic gap is enforcement-of-the-rule, not the rule itself. compliance-audit.sh doesn't verify that a TS repo actually has a typecheck step — compliance is by convention today. One repo onboarded without it (like TalkTerm will be) would regress silently. Add a type-check presence check to the compliance audit.

Tooling gaps worth adding to Tier 1

Semgrep CE in ci.yml — free, community rules, SARIF → Security tab (same pipeline as CodeQL), runs in seconds vs CodeQL's minutes (the fast PR-feedback layer). Killer feature: custom rules that encode our own AGENTS.md directives ("never swallow exceptions", "no undisclosed synthetic-data fallback") — turning prose rules into enforced, deterministic checks.
knip (JS/TS) — dead code, unused exports, phantom deps. Agentic code leaves these constantly; nothing in our stack catches them.
Auto-fix-and-commit step (Tier 0) — run formatters/--fix on agent-authored branches and commit, so trivial issues never reach any reviewer.

Consolidated alternatives (DeepSource — deterministic pass before the AI agent; SonarQube AI Code Assurance — AI-snippet taint analysis) exist but overlap what we already own; noted for completeness, not recommended as a switch.

Proposed next steps

Land Diff A–D (the draft-until-green gate) as a standards PR. Biggest token-burn lever.
Add Semgrep CE to the ci.yml template + 2–3 custom rules from AGENTS.md directives.
Add a type-check presence check to compliance-audit.sh; queue TalkTerm full-CI onboarding before its first code PR.
Pilot knip on broodly/markets.

Feedback welcome — especially on the Dev-Lead open question above and whether "draft until green" should apply to human PRs too (proposal keeps it bot-only).

Drafted with Claude Code from a repo-grounded analysis of our current CI/agent architecture.

2026-06-17T12:01:55Z

github-actions[bot]
Bot Jun 17, 2026

Enhancement: Pre-Agentic Gate — Integration Point, Tool Scope & ROI Measurement

Sharpened problem & goal
The proposal is to run deterministic $0 checks before token-billed agents engage, auto-fix what's auto-fixable, and only invoke agents on clean code. Three implementation ambiguities to resolve before building: (1) where does the gate run? — as a blocking preflight step in the existing agent job, or as a separate required CI job; (2) which tools are in scope? — the org runs shell, YAML, and (likely) other language stacks with different tooling; (3) what does "auto-fix" mean for CI? — auto-committing back to the PR branch requires write access and its own token cost; a "fail + show diff" approach is a safer first step.

Context
Directly relevant existing infrastructure:

scripts/dev-lead-preflight.sh — the preflight hook already exists for dev-lead; this is the natural integration point, not a new workflow
scripts/lib/advisory-review-gate.sh — the advisory gate pattern; the pre-agentic gate should follow this script's conventions
lint.yml — already runs shellcheck and BATS as required CI; the pre-agentic gate conceptually surfaces these same checks to the agent as a preflight rather than a parallel CI job, avoiding extra fan-out
scripts/lib/token-metrics.sh + token-report.yml — ET savings from the gate are directly measurable via the existing Token Cost Observatory; this should be the primary success metric

High-value, zero-false-positive auto-fixable categories for agent-authored code:

Import sorting / unused imports (isort, goimports, rustfmt)
Format drift (prettier, black, gofmt, shfmt)
shellcheck --fix for shell scripts

Start with "fail + diff" rather than auto-commit; auto-commit can be a auto_fix: true opt-in input added in a follow-on once the detection is validated.

Impact / Effort

Impact: HIGH — paying Opus rates to catch a missing import is a concrete, measurable waste; ROI is trackable via the existing Token Cost Observatory
Effort: S for adding lint as a blocking preflight in scripts/dev-lead-preflight.sh; M for auto-fix + push-back with proper identity and guards; L for full org-wide coverage across all language stacks

Suggested acceptance criteria

scripts/dev-lead-preflight.sh is extended to run the deterministic lint suite (shellcheck at minimum, plus any formatter in scope) before the agent engages; preflight failure blocks the agent with a structured error listing violations
The gate runs as a preflight step inside the existing dev-lead.yml job, not as a new CI job — to avoid increasing the required-check count and CI fan-out
The tool list is defined in a config file (not hardcoded) so repos can extend or override it via workflow inputs without forking the reusable
Token savings are measured for 2 weeks post-launch via token-report.yml — report must show ET per PR broken down by "blocked by preflight" vs. "passed to agent"; target: ≥10% ET reduction on PRs that fail preflight
Auto-fix (if implemented in a follow-on) is gated behind an explicit auto_fix: true workflow input (off by default), uses the existing scripts/lib/git-identity.sh commit-author pattern, and is restricted to zero-false-positive formatters only

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Deterministic pre-agentic review gate to cut token burn (draft-until-green) #743

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

RFC: Deterministic pre-agentic review gate to cut token burn (draft-until-green) #743

Uh oh!

don-petry Jun 11, 2026 Maintainer

Idea: A deterministic pre-agentic review gate to cut token burn

The frame: ordering, not just tools

What we already have (and it's good)

Proposal: "draft until deterministic-green" — one gate for every reviewer

Type-check enforcement audit (Task 3)

Tooling gaps worth adding to Tier 1

Proposed next steps

Replies: 1 comment

Uh oh!

github-actions[bot] Bot Jun 17, 2026

Enhancement: Pre-Agentic Gate — Integration Point, Tool Scope & ROI Measurement

don-petry
Jun 11, 2026
Maintainer

github-actions[bot]
Bot Jun 17, 2026