Optimize PR Sous Chef token usage with early-exit setup gating and tighter processing limits#36075
Conversation
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
|
✅ PR Code Quality Reviewer completed the code quality review. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
🧪 Test Quality Sentinel completed test quality analysis. |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #36075 does not have the 'implementation' label (has_implementation_label=false) and has only 4 new lines of code in business logic directories (default_business_additions=4, well under the 100-line threshold). Neither Condition A nor Condition B is met. |
There was a problem hiding this comment.
Pull request overview
This PR reduces PR Sous Chef run cost by moving PR eligibility detection ahead of formatter/tool setup and tightening the agent prompt’s processing limits and failure handling.
Changes:
- Adds
eligible_countoutput from the PR prefilter and gates Go/Node formatter setup on eligible PRs. - Reduces PR Sous Chef processing from 10 to 5 PRs per run and adds sub-agent error/JSON guardrails.
- Updates workflow contract assertions and generated workflow artifacts, including unrelated Ruflo routing changes.
Show a summary per file
| File | Description |
|---|---|
.github/workflows/pr-sous-chef.md |
Moves PR queue fetch before formatter setup and tightens PR Sous Chef prompt limits. |
.github/workflows/pr-sous-chef.lock.yml |
Compiled PR Sous Chef workflow reflecting gated setup and prompt changes. |
pkg/cli/pr_sous_chef_workflow_contract_test.go |
Adds contract assertions for the new processing cap, sub-agent guardrail, and eligible_count output. |
.github/workflows/ruflo-backed-task.lock.yml |
Regenerated Ruflo workflow lock with tool/runtime updates and command metadata. |
.github/workflows/agentic_commands.yml |
Updates centralized slash-command routing, including /ruflo. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 5/5 changed files
- Comments generated: 1
🧪 Test Quality Sentinel Report✅ Test Quality Score: 85/100 — Excellent
📊 Metrics & Test Classification (4 assertions analyzed)
Test Classification Details
Language SupportTests analyzed:
Verdict
📖 Understanding Test ClassificationsDesign Tests (High Value) verify what the system does:
Implementation Tests (Low Value) verify how the system does it:
Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.
|
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /zoom-out, /tdd, and /grill-with-docs — COMMENT with 4 suggestions, none blocking.
📋 Key Themes & Highlights
Key Themes
eligible_countempty-string edge case: Ifjqfails (missing file, malformed JSON),eligible_countis set to""rather than"0". Since"" != '0'istrue, setup steps will run on a failed fetch — the opposite of the intended optimization. Easy to fix with a|| echo 0fallback.- 4-tool-call sub-agent budget has no documented graceful-degradation path: A minimal non-trivial pr-processor flow (read context → check review threads → decide on branch → emit output) already saturates the limit. The spec should document what to do when 4 calls are insufficient.
- Brittle contract test assertion: Line 32 asserts the exact verbatim bash command, tightly coupling the test to implementation rather than the observable behavior (the
eligible_countoutput exists and reflectsprslength). skip_reason: "sub_agent_error"destination unspecified: The spec says to record the skip reason but doesn't say where — noop payload? skipped array? — leaving room for inconsistent implementations.
Positive Highlights
- ✅ Excellent structural refactor — moving fetch-before-setup is the right order-of-operations fix
- ✅ Scoped
npm ci --prefixis much cleaner than the broadmake deps-dev - ✅ Contract tests covering the new behavioral invariants are exactly the right safety net for workflow changes
- ✅ Clear PR description with concrete before/after examples
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 2.2M
There was a problem hiding this comment.
REQUEST_CHANGES — Two issues block merge: a silent jq failure bypass that invalidates the eligibility gate, and a 4-tool-call cap on pr-processor that will systematically exhaust the budget on real PRs and produce cascading sub_agent_error skips.
### Findings summary
High — eligible_count jq-failure bypass (line 118)
If the final jq pipeline fails, command substitution returns "", echo exits 0 (step succeeds), and "" != '0' evaluates to true — setup runs, agent executes against broken input. Fix: || echo 0 inside the substitution.
Medium — 4 tool call cap on pr-processor (line 273)
A minimal useful assessment (fetch PR, check CI, check reviews, emit JSON) consumes all 4 calls. Any PR needing one extra lookup hits the cap mid-assessment, triggers sub_agent_error, and gets skipped until the next run. Raise to 6–8 or pre-batch context retrieval.
Approach is sound: early-exit eligibility gate, gated setup, reduced PR cap, sub-agent error guard — all correct. The two fixes are straightforward.
🔎 Code quality review by PR Code Quality Reviewer · sonnet46 2.3M
|
@copilot Address all review comments by github-actions and Copilot. |
…s.yml Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
…s.yml Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
|
|
|
|
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
…d by compile) Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
All review comments addressed:
Commits: 85e4089, f34e21a (latest push) |
Updated PR body below. All four recommendations from #35742 were addressed — summary follows. Proposed PR body:
|
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
PR Sous Chef was consuming ~3.35M tokens/run with high turn counts, including wasted setup on runs with no eligible PRs and runaway sub-agent behavior in failures. This change shifts expensive work behind an eligibility gate and reduces per-run processing scope.
Workflow execution order + early exit
Fetch open non-draft PR queueahead of heavy setup.eligible_countfrom prefilter step with a|| echo 0fallback sojqfailures default to0and safely block setup.eligible_count != 0.Lighter formatter setup
make deps-dev.npm ci --prefix actions/setup/js(only when PRs are eligible).Prompt-level token/turn controls
{pr_number: <N>, skip_reason: "sub_agent_error"}in theskippedarray of the run-summary noop payload and continue without retry.pr-processorto max 8 tool calls with a documentedinsufficient_contextescape hatch when the budget is exhausted before a confident decision can be reached.Workflow contract coverage
eligible_countoutput emission (intent-based, not exact bash-line)