Optimize PR Sous Chef token usage with early-exit setup gating and tighter processing limits by Copilot · Pull Request #36075 · github/gh-aw

Copilot · 2026-05-31T06:35:24Z

PR Sous Chef was consuming ~3.35M tokens/run with high turn counts, including wasted setup on runs with no eligible PRs and runaway sub-agent behavior in failures. This change shifts expensive work behind an eligibility gate and reduces per-run processing scope.

Workflow execution order + early exit
- Move Fetch open non-draft PR queue ahead of heavy setup.
- Export eligible_count from prefilter step with a || echo 0 fallback so jq failures default to 0 and safely block setup.
- Gate setup/tooling steps behind eligible_count != 0.
Lighter formatter setup
- Remove unconditional make deps-dev.
- Replace with formatter-scoped install: npm ci --prefix actions/setup/js (only when PRs are eligible).
Prompt-level token/turn controls
- Lower processing cap from 10 PRs to 5 PRs per run.
- Add sub-agent failure guard: on non-JSON/error, record {pr_number: <N>, skip_reason: "sub_agent_error"} in the skipped array of the run-summary noop payload and continue without retry.
- Constrain pr-processor to max 8 tool calls with a documented insufficient_context escape hatch when the budget is exhausted before a confident decision can be reached.
Workflow contract coverage
- Extend PR Sous Chef contract test assertions for:
  - 5-PR cap
  - sub-agent guardrails
  - eligible_count output emission (intent-based, not exact bash-line)

- name: Fetch open non-draft PR queue
  id: fetch-prs
  run: |
    # ...build /tmp/gh-aw/agent/pr-sous-chef-candidates-compact.json
    echo "eligible_count=$(jq '.prs | length' /tmp/gh-aw/agent/pr-sous-chef-candidates-compact.json || echo 0)" >> "$GITHUB_OUTPUT"

- name: Setup Go
  if: steps.fetch-prs.outputs.eligible_count != '0'
  uses: actions/setup-go@v6.4.0

- name: Install formatter dependencies
  if: steps.fetch-prs.outputs.eligible_count != '0'
  run: npm ci --prefix actions/setup/js

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>

github-actions · 2026-05-31T06:55:32Z

✅ PR Code Quality Reviewer completed the code quality review.

github-actions · 2026-05-31T06:55:32Z

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

github-actions · 2026-05-31T06:55:36Z

🧪 Test Quality Sentinel completed test quality analysis.

github-actions · 2026-05-31T06:55:47Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #36075 does not have the 'implementation' label (has_implementation_label=false) and has only 4 new lines of code in business logic directories (default_business_additions=4, well under the 100-line threshold). Neither Condition A nor Condition B is met.

Copilot

Pull request overview

This PR reduces PR Sous Chef run cost by moving PR eligibility detection ahead of formatter/tool setup and tightening the agent prompt’s processing limits and failure handling.

Changes:

Adds eligible_count output from the PR prefilter and gates Go/Node formatter setup on eligible PRs.
Reduces PR Sous Chef processing from 10 to 5 PRs per run and adds sub-agent error/JSON guardrails.
Updates workflow contract assertions and generated workflow artifacts, including unrelated Ruflo routing changes.

Show a summary per file

File	Description
`.github/workflows/pr-sous-chef.md`	Moves PR queue fetch before formatter setup and tightens PR Sous Chef prompt limits.
`.github/workflows/pr-sous-chef.lock.yml`	Compiled PR Sous Chef workflow reflecting gated setup and prompt changes.
`pkg/cli/pr_sous_chef_workflow_contract_test.go`	Adds contract assertions for the new processing cap, sub-agent guardrail, and `eligible_count` output.
`.github/workflows/ruflo-backed-task.lock.yml`	Regenerated Ruflo workflow lock with tool/runtime updates and command metadata.
`.github/workflows/agentic_commands.yml`	Updates centralized slash-command routing, including `/ruflo`.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 5/5 changed files
Comments generated: 1

github-actions · 2026-05-31T07:00:52Z

🧪 Test Quality Sentinel Report

✅ Test Quality Score: 85/100 — Excellent

Analyzed 4 new assertion(s) in 1 modified test function: all 4 classified as design tests (behavioral contracts), 0 guideline violations.

📊 Metrics & Test Classification (4 assertions analyzed)

Metric	Value
New/modified assertions analyzed	4
✅ Design tests (behavioral contracts)	4 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Assertions with error/edge cases	1 (25%) — `skip_reason` checks error-recovery behavior
Duplicate test clusters	0
Test inflation detected	No (4 lines test / 25 lines production = 0.16 ratio)
🚨 Coding-guideline violations	0

Test Classification Details

Assertion	File	Classification	Notes
`"Process at most 5 PRs per run."`	`pkg/cli/pr_sous_chef_workflow_contract_test.go:29`	✅ Design	Enforces per-run PR processing cap
`"Make at most 4 tool calls total."`	`pkg/cli/pr_sous_chef_workflow_contract_test.go:30`	✅ Design	Enforces hard tool-call budget
`"skip_reason: \"sub_agent_error\""`	`pkg/cli/pr_sous_chef_workflow_contract_test.go:31`	✅ Design	Enforces error-recovery contract
`"echo \"eligible_count=..."`	`pkg/cli/pr_sous_chef_workflow_contract_test.go:32`	✅ Design	Enforces early-exit gating output

Language Support

Tests analyzed:

🐹 Go (*_test.go): 4 assertions in 1 function — unit (//go:build !integration) ✅ build tag present

Verdict

✅ Check passed. 0% of new assertions are implementation tests (threshold: 30%). All 4 new assertions are behavioral contract checks that act as regression guards for the token-optimization constraints introduced by this PR.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

Assert on observable outputs, return values, or state changes
Cover error paths and boundary conditions
Would catch a behavioral regression if deleted
Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

Assert on internal function calls (mocking internals)
Only test the happy path with typical inputs
Break during legitimate refactoring even when behavior is correct
Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

🧪 Test quality analysis by Test Quality Sentinel · sonnet46 1.5M · ◷

github-actions

✅ Test Quality Sentinel: 85/100. Test quality is excellent — 0% of new assertions are implementation tests (threshold: 30%). All 4 new assertions enforce behavioral contracts for the token-optimization constraints introduced by this PR.

github-actions

Skills-Based Review 🧠

Applied /zoom-out, /tdd, and /grill-with-docs — COMMENT with 4 suggestions, none blocking.

📋 Key Themes & Highlights

Key Themes

eligible_count empty-string edge case: If jq fails (missing file, malformed JSON), eligible_count is set to "" rather than "0". Since "" != '0' is true, setup steps will run on a failed fetch — the opposite of the intended optimization. Easy to fix with a || echo 0 fallback.
4-tool-call sub-agent budget has no documented graceful-degradation path: A minimal non-trivial pr-processor flow (read context → check review threads → decide on branch → emit output) already saturates the limit. The spec should document what to do when 4 calls are insufficient.
Brittle contract test assertion: Line 32 asserts the exact verbatim bash command, tightly coupling the test to implementation rather than the observable behavior (the eligible_count output exists and reflects prs length).
skip_reason: "sub_agent_error" destination unspecified: The spec says to record the skip reason but doesn't say where — noop payload? skipped array? — leaving room for inconsistent implementations.

Positive Highlights

✅ Excellent structural refactor — moving fetch-before-setup is the right order-of-operations fix
✅ Scoped npm ci --prefix is much cleaner than the broad make deps-dev
✅ Contract tests covering the new behavioral invariants are exactly the right safety net for workflow changes
✅ Clear PR description with concrete before/after examples

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 2.2M

github-actions

REQUEST_CHANGES — Two issues block merge: a silent jq failure bypass that invalidates the eligibility gate, and a 4-tool-call cap on pr-processor that will systematically exhaust the budget on real PRs and produce cascading sub_agent_error skips.

### Findings summary

High — eligible_count jq-failure bypass (line 118)
If the final jq pipeline fails, command substitution returns "", echo exits 0 (step succeeds), and "" != '0' evaluates to true — setup runs, agent executes against broken input. Fix: || echo 0 inside the substitution.

Medium — 4 tool call cap on pr-processor (line 273)
A minimal useful assessment (fetch PR, check CI, check reviews, emit JSON) consumes all 4 calls. Any PR needing one extra lookup hits the cap mid-assessment, triggers sub_agent_error, and gets skipped until the next run. Raise to 6–8 or pre-batch context retrieval.

Approach is sound: early-exit eligibility gate, gated setup, reduced PR cap, sub-agent error guard — all correct. The two fixes are straightforward.

🔎 Code quality review by PR Code Quality Reviewer · sonnet46 2.3M

mnkiefer · 2026-05-31T07:04:49Z

@copilot Address all review comments by github-actions and Copilot.

…s.yml Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>

github-actions · 2026-05-31T07:13:18Z

@copilot review all comments and address unresolved review feedback.

Generated by 👨‍🍳 PR Sous Chef · gpt54 8.1M · ◷

github-actions · 2026-05-31T07:13:18Z

@copilot please summarize the remaining blockers and the next step to finish review.

Generated by 👨‍🍳 PR Sous Chef · gpt54 8.1M · ◷

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>

…d by compile) Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>

Copilot · 2026-05-31T07:19:30Z

@copilot Address all review comments by github-actions and Copilot.

All review comments addressed:

eligible_count jq failure — added || echo 0 fallback so any jq error defaults to 0 and blocks the eligibility gate; lock file regenerated to match.
skip_reason: "sub_agent_error" destination — updated spec to explicitly state it goes in the skipped array of the run-summary noop payload.
4 tool call cap too tight — raised to 8 tool calls, with a documented escape hatch: if 8 calls are insufficient, set all fields to null and skip_reason: "insufficient_context".
Brittle test assertion — replaced the exact bash-line assertion with two intent-based checks (eligible_count= and .prs | length).
/ruflo routing — removed from agentic_commands.yml (re-removed after a compile step re-added it from the existing ruflo-backed-task.md).

Commits: 85e4089, f34e21a (latest push)

mnkiefer · 2026-05-31T07:22:43Z

@copilot Update PR body of what was done and whether all concerns from #35742 were addressed adequately.

Copilot · 2026-05-31T07:28:40Z

@copilot Update PR body of what was done and whether all concerns from #35742 were addressed adequately.

Updated PR body below. All four recommendations from #35742 were addressed — summary follows.

Proposed PR body:

PR Sous Chef was consuming ~3.35M tokens/run (6.71M over 2 runs, 187 avg turns/run) with wasted setup on zero-eligible-PR runs and a runaway sub-agent in the failed 220-turn run. This change shifts all expensive work behind an eligibility gate and tightens the agent prompt's processing scope.

Closes #35742.

What changed

1. Early-exit eligibility gate (issue rec #1)
Moved Fetch open non-draft PR queue to run before Setup Go, Setup Node.js, and formatter installs.
Added id: fetch-prs and emits eligible_count as a step output:
echo "eligible_count=$(jq '.prs | length' /tmp/gh-aw/agent/pr-sous-chef-candidates-compact.json || echo 0)" >> "$GITHUB_OUTPUT"
The || echo 0 fallback means any jq failure safely defaults to 0 and blocks the gate.
All subsequent setup steps are gated on steps.fetch-prs.outputs.eligible_count != '0'.

Estimated savings: ~400–600k tokens/run when no eligible PRs exist.
2. Lighter formatter setup (issue rec #2)
Removed unconditional make deps-dev (full dev toolchain, ~5–8 min).

Removed unconditional actions/setup-node + npm ci.
Replaced with a single targeted step (gated):
- name: Install formatter dependencies
  if: steps.fetch-prs.outputs.eligible_count != '0'
  run: npm ci --prefix actions/setup/js
go fmt is already available via actions/setup-go; no extra Go tool install needed.
Estimated savings: ~200–400k tokens/run + 4–6 action-minutes on eligible-PR runs.
3. 5-PR per-run processing cap (issue rec #3)

Lowered the main agent instruction from 10 PRs to 5 PRs per run.

Added explicit note: "Remaining eligible PRs will be handled in the next scheduled run."

Estimated savings: ~600–900k tokens/run (~30–40% fewer agent turns).

4. Sub-agent turn guard (issue rec #4)

Added to main agent prompt: if a pr-processor call returns non-JSON or an error, record {pr_number: <N>, skip_reason: "sub_agent_error"} in the skipped array of the run-summary noop payload and continue without retrying.

Added to pr-processor sub-agent prompt: max 8 tool calls total; if budget is exhausted before a confident decision, set all output fields to null and skip_reason: "insufficient_context".

Estimated savings: prevents ~1.6M token losses from runaway failed runs.

5. Contract test coverage

Extended TestPRSousChefWorkflowAddCommentTargetContract with five intent-based assertions:

5-PR per-run cap is in the prompt

8-tool-call sub-agent budget is in the prompt

sub_agent_error skip guard is in the prompt

eligible_count= is emitted by the fetch-prs step

.prs | length is used to compute the count

6. Out-of-scope cleanup

Removed an unrelated /ruflo slash-command routing entry that was added to agentic_commands.yml by a compile step re-processing ruflo-backed-task.md. That routing change is out of scope and should be introduced in a dedicated PR.

Issue #35742 coverage

Recommendation Status Notes

Rec #1 — Move prefilter before setup + early-exit gate ✅ Done fetch-prs is now the first step; all setup gated on eligible_count != '0'

Rec #2 — Replace make deps-dev with targeted formatter install ✅ Done make deps-dev removed; npm ci --prefix actions/setup/js added (gated)

Rec #3 — Cap PRs per run at 5 ✅ Done Prompt lowered from 10 to 5 with next-run note

Rec #4 — Hard turn/tool-call limit + sub-agent error guard ✅ Done 8-call cap + insufficient_context escape hatch; sub_agent_error skip without retry

Conservative estimated token savings: ~1.2–1.9M tokens/run (matching the issue's own projection), plus the additional safeguard of preventing the ~1.6M-token runaway failure pattern.

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>

Initial plan

8c00e66

Copilot AI assigned Copilot and mnkiefer May 31, 2026

Copilot started work on behalf of mnkiefer May 31, 2026 06:35 View session

Copilot AI linked an issue May 31, 2026 that may be closed by this pull request

[agentic-token-optimizer] Token Optimization: PR Sous Chef — reduce 3.35M avg tokens/run via early-exit guard and lighter setup #35742

Closed

Copilot AI and others added 2 commits May 31, 2026 06:43

Plan PR Sous Chef token optimizations

db6b945

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>

Optimize PR Sous Chef workflow token usage

887d197

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Optimize token usage for PR Sous Chef~~ Optimize PR Sous Chef token usage with early-exit setup gating and tighter processing limits May 31, 2026

Copilot finished work on behalf of mnkiefer May 31, 2026 06:49

Copilot AI requested a review from mnkiefer May 31, 2026 06:49

mnkiefer marked this pull request as ready for review May 31, 2026 06:53

Copilot AI review requested due to automatic review settings May 31, 2026 06:53

Copilot started reviewing on behalf of mnkiefer May 31, 2026 06:53 View session

Copilot AI reviewed May 31, 2026

View reviewed changes

Comment thread .github/workflows/agentic_commands.yml

Copilot started work on behalf of mnkiefer May 31, 2026 06:58 View session

github-actions Bot mentioned this pull request May 31, 2026

[aw] No-Op Runs #35753

Closed

github-actions Bot approved these changes May 31, 2026

View reviewed changes

github-actions Bot reviewed May 31, 2026

View reviewed changes

Comment thread .github/workflows/pr-sous-chef.md Outdated

Comment thread .github/workflows/pr-sous-chef.md Outdated

Comment thread pkg/cli/pr_sous_chef_workflow_contract_test.go Outdated

Comment thread .github/workflows/pr-sous-chef.md Outdated

github-actions Bot reviewed May 31, 2026

View reviewed changes

Comment thread .github/workflows/pr-sous-chef.md Outdated

Comment thread .github/workflows/pr-sous-chef.md Outdated

Copilot AI and others added 2 commits May 31, 2026 07:05

Remove out-of-scope /ruflo slash-command routing from agentic_command…

134d192

…s.yml Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>

Remove out-of-scope /ruflo slash-command routing from agentic_command…

21f671b

…s.yml Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>

Copilot finished work on behalf of mnkiefer May 31, 2026 07:06

Copilot started work on behalf of mnkiefer May 31, 2026 07:07 View session

Copilot AI and others added 3 commits May 31, 2026 07:14

Plan: address all github-actions and Copilot review comments

8f684f4

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>

Address all github-actions and Copilot review comments

85e4089

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>

Remove out-of-scope /ruflo routing from agentic_commands.yml (re-adde…

d0d6a27

…d by compile) Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>

Copilot finished work on behalf of mnkiefer May 31, 2026 07:20

Copilot started work on behalf of mnkiefer May 31, 2026 07:22 View session

Update PR body with full issue #35742 coverage summary

749b1b4

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>

Copilot finished work on behalf of mnkiefer May 31, 2026 07:29

mnkiefer approved these changes May 31, 2026

View reviewed changes

mnkiefer merged commit 7d1c1b2 into main May 31, 2026

mnkiefer deleted the copilot/agentic-token-optimizer branch May 31, 2026 07:31

Conversation

Copilot AI commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

github-actions Bot commented May 31, 2026

🧪 Test Quality Sentinel Report

Test Classification Details

Language Support

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Positive Highlights

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mnkiefer commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

Copilot AI commented May 31, 2026

Uh oh!

mnkiefer commented May 31, 2026

Uh oh!

Copilot AI commented May 31, 2026

What changed

1. Early-exit eligibility gate (issue rec #1)

2. Lighter formatter setup (issue rec #2)

3. 5-PR per-run processing cap (issue rec #3)

4. Sub-agent turn guard (issue rec #4)

5. Contract test coverage

6. Out-of-scope cleanup

Issue #35742 coverage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented May 31, 2026 •

edited

Loading

github-actions Bot commented May 31, 2026 •

edited

Loading

github-actions Bot commented May 31, 2026 •

edited

Loading

github-actions Bot commented May 31, 2026 •

edited

Loading

github-actions Bot commented May 31, 2026 •

edited

Loading