Replies: 1 comment
-
Enhancement: Pre-Agentic Gate — Integration Point, Tool Scope & ROI MeasurementSharpened problem & goal Context
High-value, zero-false-positive auto-fixable categories for agent-authored code:
Start with "fail + diff" rather than auto-commit; auto-commit can be a Impact / Effort
Suggested acceptance criteria
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Idea: A deterministic pre-agentic review gate to cut token burn
Problem. Our paid agentic reviewers (Dev-Lead, CodeRabbit, Copilot, the
.github-privatereview agent) spend reasoning — and tokens — on issues a--fixflag or a type checker would have erased for free. Agent-authored code fails in characteristic, cheaply-detectable ways (hallucinated imports, dead code, format drift, plausible-but-wrong security patterns). We're paying LLM rates to catch lint debt.Goal. Run the deterministic, $0 checks first, auto-fix what's auto-fixable, and only let token-billed agents engage once the cheap tier is green — so their reasoning is spent on real logic, not cleanup.
The frame: ordering, not just tools
The leverage is the Tier 1 → Tier 2 gate.
What we already have (and it's good)
pr-auto-review-reusable.ymlis a model of this pattern: it dispatches the heavy.github-privatereview agent only when all CI checks pass, the PR is non-draft, there's noCHANGES_REQUESTED, and no unresolved threads. That reviewer is already correctly gated. ✅The leak is the other reviewers, which engage on PR-open before CI goes green:
.github-privatereview agent (viapr-auto-review)pull_request: [opened, synchronize]Proposal: "draft until deterministic-green" — one gate for every reviewer
Every reviewer above already respects (or can be configured to respect) draft state. So instead of bolting a CI-precondition onto each one separately, use a single convention:
Result: on a red PR, zero paid reviewers engage. When CI turns green, the promote step marks the PR ready, which fires
pull_request: ready_for_review→pr-auto-reviewdispatches the heavy reviewer, and CodeRabbit/Copilot pick it up — all at once, all post-green.Diff A — new
pr-promote-when-green.ymlreusable (promotes only agent/bot PRs; humans keep control of their own readiness):Caller stub triggers on
workflow_run: { workflows: ["CI"], types: [completed] }.Diff B —
.coderabbit.yamlorg standard (skip drafts so CodeRabbit waits for promotion):Diff C — Copilot: set repo/org review setting to not auto-review drafts (native Copilot review honors draft state).
Diff D — AGENTS.md directive: "When opening a PR programmatically, open it as draft. Promotion to
ready_for_reviewis automated bypr-promote-when-greenonce the deterministic CI tier passes." Dev-Lead's proactivepull_requestpath early-exits on draft.Type-check enforcement audit (Task 3)
Checked every non-archived repo for whether typed code is matched by a CI type-check step:
marketsnpx tsc --noEmitbroodlypnpm run typecheckgoogle-app-scriptsnpm run typecheckContentTwinbmad-bgreat-suite.py).github-privateTalkTermHeadline: enforcement is solid where it matters — all three repos with real TS already run a type-check. This is not a widespread gap.
Two real findings:
TalkTermhas a stub CI (gitleaks only). When its Electron/TS code lands, it needs the fullci.yml(lint + type-check + test) onboarded before the first feature PR.compliance-audit.shdoesn't verify that a TS repo actually has atypecheckstep — compliance is by convention today. One repo onboarded without it (like TalkTerm will be) would regress silently. Add a type-check presence check to the compliance audit.Tooling gaps worth adding to Tier 1
ci.yml— free, community rules, SARIF → Security tab (same pipeline as CodeQL), runs in seconds vs CodeQL's minutes (the fast PR-feedback layer). Killer feature: custom rules that encode our own AGENTS.md directives ("never swallow exceptions", "no undisclosed synthetic-data fallback") — turning prose rules into enforced, deterministic checks.--fixon agent-authored branches and commit, so trivial issues never reach any reviewer.Consolidated alternatives (DeepSource — deterministic pass before the AI agent; SonarQube AI Code Assurance — AI-snippet taint analysis) exist but overlap what we already own; noted for completeness, not recommended as a switch.
Proposed next steps
ci.ymltemplate + 2–3 custom rules from AGENTS.md directives.compliance-audit.sh; queueTalkTermfull-CI onboarding before its first code PR.broodly/markets.Feedback welcome — especially on the Dev-Lead open question above and whether "draft until green" should apply to human PRs too (proposal keeps it bot-only).
Drafted with Claude Code from a repo-grounded analysis of our current CI/agent architecture.
Beta Was this translation helpful? Give feedback.
All reactions