Skip to content

Optimize PR Sous Chef token usage with early-exit setup gating and tighter processing limits#36075

Merged
mnkiefer merged 9 commits into
mainfrom
copilot/agentic-token-optimizer
May 31, 2026
Merged

Optimize PR Sous Chef token usage with early-exit setup gating and tighter processing limits#36075
mnkiefer merged 9 commits into
mainfrom
copilot/agentic-token-optimizer

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 31, 2026

PR Sous Chef was consuming ~3.35M tokens/run with high turn counts, including wasted setup on runs with no eligible PRs and runaway sub-agent behavior in failures. This change shifts expensive work behind an eligibility gate and reduces per-run processing scope.

  • Workflow execution order + early exit

    • Move Fetch open non-draft PR queue ahead of heavy setup.
    • Export eligible_count from prefilter step with a || echo 0 fallback so jq failures default to 0 and safely block setup.
    • Gate setup/tooling steps behind eligible_count != 0.
  • Lighter formatter setup

    • Remove unconditional make deps-dev.
    • Replace with formatter-scoped install: npm ci --prefix actions/setup/js (only when PRs are eligible).
  • Prompt-level token/turn controls

    • Lower processing cap from 10 PRs to 5 PRs per run.
    • Add sub-agent failure guard: on non-JSON/error, record {pr_number: <N>, skip_reason: "sub_agent_error"} in the skipped array of the run-summary noop payload and continue without retry.
    • Constrain pr-processor to max 8 tool calls with a documented insufficient_context escape hatch when the budget is exhausted before a confident decision can be reached.
  • Workflow contract coverage

    • Extend PR Sous Chef contract test assertions for:
      • 5-PR cap
      • sub-agent guardrails
      • eligible_count output emission (intent-based, not exact bash-line)
- name: Fetch open non-draft PR queue
  id: fetch-prs
  run: |
    # ...build /tmp/gh-aw/agent/pr-sous-chef-candidates-compact.json
    echo "eligible_count=$(jq '.prs | length' /tmp/gh-aw/agent/pr-sous-chef-candidates-compact.json || echo 0)" >> "$GITHUB_OUTPUT"

- name: Setup Go
  if: steps.fetch-prs.outputs.eligible_count != '0'
  uses: actions/setup-go@v6.4.0

- name: Install formatter dependencies
  if: steps.fetch-prs.outputs.eligible_count != '0'
  run: npm ci --prefix actions/setup/js

Copilot AI and others added 2 commits May 31, 2026 06:43
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Copilot AI changed the title [WIP] Optimize token usage for PR Sous Chef Optimize PR Sous Chef token usage with early-exit setup gating and tighter processing limits May 31, 2026
Copilot AI requested a review from mnkiefer May 31, 2026 06:49
@mnkiefer mnkiefer marked this pull request as ready for review May 31, 2026 06:53
Copilot AI review requested due to automatic review settings May 31, 2026 06:53
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 31, 2026

PR Code Quality Reviewer completed the code quality review.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 31, 2026

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 31, 2026

🧪 Test Quality Sentinel completed test quality analysis.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 31, 2026

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #36075 does not have the 'implementation' label (has_implementation_label=false) and has only 4 new lines of code in business logic directories (default_business_additions=4, well under the 100-line threshold). Neither Condition A nor Condition B is met.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces PR Sous Chef run cost by moving PR eligibility detection ahead of formatter/tool setup and tightening the agent prompt’s processing limits and failure handling.

Changes:

  • Adds eligible_count output from the PR prefilter and gates Go/Node formatter setup on eligible PRs.
  • Reduces PR Sous Chef processing from 10 to 5 PRs per run and adds sub-agent error/JSON guardrails.
  • Updates workflow contract assertions and generated workflow artifacts, including unrelated Ruflo routing changes.
Show a summary per file
File Description
.github/workflows/pr-sous-chef.md Moves PR queue fetch before formatter setup and tightens PR Sous Chef prompt limits.
.github/workflows/pr-sous-chef.lock.yml Compiled PR Sous Chef workflow reflecting gated setup and prompt changes.
pkg/cli/pr_sous_chef_workflow_contract_test.go Adds contract assertions for the new processing cap, sub-agent guardrail, and eligible_count output.
.github/workflows/ruflo-backed-task.lock.yml Regenerated Ruflo workflow lock with tool/runtime updates and command metadata.
.github/workflows/agentic_commands.yml Updates centralized slash-command routing, including /ruflo.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 5/5 changed files
  • Comments generated: 1

Comment thread .github/workflows/agentic_commands.yml
@github-actions
Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 85/100 — Excellent

Analyzed 4 new assertion(s) in 1 modified test function: all 4 classified as design tests (behavioral contracts), 0 guideline violations.

📊 Metrics & Test Classification (4 assertions analyzed)
Metric Value
New/modified assertions analyzed 4
✅ Design tests (behavioral contracts) 4 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Assertions with error/edge cases 1 (25%) — skip_reason checks error-recovery behavior
Duplicate test clusters 0
Test inflation detected No (4 lines test / 25 lines production = 0.16 ratio)
🚨 Coding-guideline violations 0

Test Classification Details

Assertion File Classification Notes
"Process at most **5 PRs** per run." pkg/cli/pr_sous_chef_workflow_contract_test.go:29 ✅ Design Enforces per-run PR processing cap
"Make at most 4 tool calls total." pkg/cli/pr_sous_chef_workflow_contract_test.go:30 ✅ Design Enforces hard tool-call budget
"skip_reason: \"sub_agent_error\"" pkg/cli/pr_sous_chef_workflow_contract_test.go:31 ✅ Design Enforces error-recovery contract
"echo \"eligible_count=..." pkg/cli/pr_sous_chef_workflow_contract_test.go:32 ✅ Design Enforces early-exit gating output

Language Support

Tests analyzed:

  • 🐹 Go (*_test.go): 4 assertions in 1 function — unit (//go:build !integration) ✅ build tag present

Verdict

Check passed. 0% of new assertions are implementation tests (threshold: 30%). All 4 new assertions are behavioral contract checks that act as regression guards for the token-optimization constraints introduced by this PR.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

🧪 Test quality analysis by Test Quality Sentinel · sonnet46 1.5M ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 85/100. Test quality is excellent — 0% of new assertions are implementation tests (threshold: 30%). All 4 new assertions enforce behavioral contracts for the token-optimization constraints introduced by this PR.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /zoom-out, /tdd, and /grill-with-docs — COMMENT with 4 suggestions, none blocking.

📋 Key Themes & Highlights

Key Themes

  • eligible_count empty-string edge case: If jq fails (missing file, malformed JSON), eligible_count is set to "" rather than "0". Since "" != '0' is true, setup steps will run on a failed fetch — the opposite of the intended optimization. Easy to fix with a || echo 0 fallback.
  • 4-tool-call sub-agent budget has no documented graceful-degradation path: A minimal non-trivial pr-processor flow (read context → check review threads → decide on branch → emit output) already saturates the limit. The spec should document what to do when 4 calls are insufficient.
  • Brittle contract test assertion: Line 32 asserts the exact verbatim bash command, tightly coupling the test to implementation rather than the observable behavior (the eligible_count output exists and reflects prs length).
  • skip_reason: "sub_agent_error" destination unspecified: The spec says to record the skip reason but doesn't say where — noop payload? skipped array? — leaving room for inconsistent implementations.

Positive Highlights

  • ✅ Excellent structural refactor — moving fetch-before-setup is the right order-of-operations fix
  • ✅ Scoped npm ci --prefix is much cleaner than the broad make deps-dev
  • ✅ Contract tests covering the new behavioral invariants are exactly the right safety net for workflow changes
  • ✅ Clear PR description with concrete before/after examples

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 2.2M

Comment thread .github/workflows/pr-sous-chef.md Outdated
Comment thread .github/workflows/pr-sous-chef.md Outdated
Comment thread pkg/cli/pr_sous_chef_workflow_contract_test.go Outdated
Comment thread .github/workflows/pr-sous-chef.md Outdated
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REQUEST_CHANGES — Two issues block merge: a silent jq failure bypass that invalidates the eligibility gate, and a 4-tool-call cap on pr-processor that will systematically exhaust the budget on real PRs and produce cascading sub_agent_error skips.

### Findings summary

High — eligible_count jq-failure bypass (line 118)
If the final jq pipeline fails, command substitution returns "", echo exits 0 (step succeeds), and "" != '0' evaluates to true — setup runs, agent executes against broken input. Fix: || echo 0 inside the substitution.

Medium — 4 tool call cap on pr-processor (line 273)
A minimal useful assessment (fetch PR, check CI, check reviews, emit JSON) consumes all 4 calls. Any PR needing one extra lookup hits the cap mid-assessment, triggers sub_agent_error, and gets skipped until the next run. Raise to 6–8 or pre-batch context retrieval.

Approach is sound: early-exit eligibility gate, gated setup, reduced PR cap, sub-agent error guard — all correct. The two fixes are straightforward.

🔎 Code quality review by PR Code Quality Reviewer · sonnet46 2.3M

Comment thread .github/workflows/pr-sous-chef.md Outdated
Comment thread .github/workflows/pr-sous-chef.md Outdated
@mnkiefer
Copy link
Copy Markdown
Collaborator

@copilot Address all review comments by github-actions and Copilot.

Copilot AI and others added 2 commits May 31, 2026 07:05
…s.yml

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
…s.yml

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

@copilot review all comments and address unresolved review feedback.

Generated by 👨‍🍳 PR Sous Chef · gpt54 8.1M ·

@github-actions
Copy link
Copy Markdown
Contributor

@copilot please summarize the remaining blockers and the next step to finish review.

Generated by 👨‍🍳 PR Sous Chef · gpt54 8.1M ·

Copilot AI and others added 3 commits May 31, 2026 07:14
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
…d by compile)

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 31, 2026

@copilot Address all review comments by github-actions and Copilot.

All review comments addressed:

  • eligible_count jq failure — added || echo 0 fallback so any jq error defaults to 0 and blocks the eligibility gate; lock file regenerated to match.
  • skip_reason: "sub_agent_error" destination — updated spec to explicitly state it goes in the skipped array of the run-summary noop payload.
  • 4 tool call cap too tight — raised to 8 tool calls, with a documented escape hatch: if 8 calls are insufficient, set all fields to null and skip_reason: "insufficient_context".
  • Brittle test assertion — replaced the exact bash-line assertion with two intent-based checks (eligible_count= and .prs | length).
  • /ruflo routing — removed from agentic_commands.yml (re-removed after a compile step re-added it from the existing ruflo-backed-task.md).

Commits: 85e4089, f34e21a (latest push)

@mnkiefer
Copy link
Copy Markdown
Collaborator

@copilot Update PR body of what was done and whether all concerns from #35742 were addressed adequately.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 31, 2026

@copilot Update PR body of what was done and whether all concerns from #35742 were addressed adequately.

Updated PR body below. All four recommendations from #35742 were addressed — summary follows.


Proposed PR body:

PR Sous Chef was consuming ~3.35M tokens/run (6.71M over 2 runs, 187 avg turns/run) with wasted setup on zero-eligible-PR runs and a runaway sub-agent in the failed 220-turn run. This change shifts all expensive work behind an eligibility gate and tightens the agent prompt's processing scope.

Closes #35742.


What changed

1. Early-exit eligibility gate (issue rec #1)

  • Moved Fetch open non-draft PR queue to run before Setup Go, Setup Node.js, and formatter installs.
  • Added id: fetch-prs and emits eligible_count as a step output:
    echo "eligible_count=$(jq '.prs | length' /tmp/gh-aw/agent/pr-sous-chef-candidates-compact.json || echo 0)" >> "$GITHUB_OUTPUT"
    The || echo 0 fallback means any jq failure safely defaults to 0 and blocks the gate.
  • All subsequent setup steps are gated on steps.fetch-prs.outputs.eligible_count != '0'.
  • Estimated savings: ~400–600k tokens/run when no eligible PRs exist.

2. Lighter formatter setup (issue rec #2)

  • Removed unconditional make deps-dev (full dev toolchain, ~5–8 min).
  • Removed unconditional actions/setup-node + npm ci.
  • Replaced with a single targeted step (gated):
    - name: Install formatter dependencies
      if: steps.fetch-prs.outputs.eligible_count != '0'
      run: npm ci --prefix actions/setup/js
    go fmt is already available via actions/setup-go; no extra Go tool install needed.
  • Estimated savings: ~200–400k tokens/run + 4–6 action-minutes on eligible-PR runs.

3. 5-PR per-run processing cap (issue rec #3)

  • Lowered the main agent instruction from 10 PRs to 5 PRs per run.
  • Added explicit note: "Remaining eligible PRs will be handled in the next scheduled run."
  • Estimated savings: ~600–900k tokens/run (~30–40% fewer agent turns).

4. Sub-agent turn guard (issue rec #4)

  • Added to main agent prompt: if a pr-processor call returns non-JSON or an error, record {pr_number: <N>, skip_reason: "sub_agent_error"} in the skipped array of the run-summary noop payload and continue without retrying.
  • Added to pr-processor sub-agent prompt: max 8 tool calls total; if budget is exhausted before a confident decision, set all output fields to null and skip_reason: "insufficient_context".
  • Estimated savings: prevents ~1.6M token losses from runaway failed runs.

5. Contract test coverage

  • Extended TestPRSousChefWorkflowAddCommentTargetContract with five intent-based assertions:
    • 5-PR per-run cap is in the prompt
    • 8-tool-call sub-agent budget is in the prompt
    • sub_agent_error skip guard is in the prompt
    • eligible_count= is emitted by the fetch-prs step
    • .prs | length is used to compute the count

6. Out-of-scope cleanup

  • Removed an unrelated /ruflo slash-command routing entry that was added to agentic_commands.yml by a compile step re-processing ruflo-backed-task.md. That routing change is out of scope and should be introduced in a dedicated PR.

Issue #35742 coverage

Recommendation Status Notes
Rec #1 — Move prefilter before setup + early-exit gate ✅ Done fetch-prs is now the first step; all setup gated on eligible_count != '0'
Rec #2 — Replace make deps-dev with targeted formatter install ✅ Done make deps-dev removed; npm ci --prefix actions/setup/js added (gated)
Rec #3 — Cap PRs per run at 5 ✅ Done Prompt lowered from 10 to 5 with next-run note
Rec #4 — Hard turn/tool-call limit + sub-agent error guard ✅ Done 8-call cap + insufficient_context escape hatch; sub_agent_error skip without retry

Conservative estimated token savings: ~1.2–1.9M tokens/run (matching the issue's own projection), plus the additional safeguard of preventing the ~1.6M-token runaway failure pattern.

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
@mnkiefer mnkiefer merged commit 7d1c1b2 into main May 31, 2026
@mnkiefer mnkiefer deleted the copilot/agentic-token-optimizer branch May 31, 2026 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[agentic-token-optimizer] Token Optimization: PR Sous Chef — reduce 3.35M avg tokens/run via early-exit guard and lighter setup

3 participants