You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Restructure agent workflows to move all deterministic data fetching (git diff, CI status, label checks, file listing, PR metadata) into shell-based preflight steps before invoking the LLM. Feed pre-gathered context as structured input rather than making the agent discover it via tool calls, eliminating tool-invocation token overhead and reasoning-loop waste.
Market Signal
GitHub's own engineering team achieved 62% Effective Token (ET) reduction in their agentic workflows by moving pre-agentic data gathering into workflow setup steps (documented in their May 2026 blog post "Improving token efficiency in GitHub Agentic Workflows"). Their Auto-Triage workflow showed sustained 62% ET savings across 109 post-fix runs. Security Guard achieved 43%, Smoke Claude 59%. The pattern is now considered a production best practice for LLM-based CI/CD. Key insight: every MCP tool schema adds 8-12KB of context per call, and unused tools waste context budget without providing value.
User Signal
The Token Cost Observatory discussion (#332) already identified pre-agentic optimization as a goal. Weekly token reports (issue #464) show ongoing cost visibility and awareness. The project already has dev-lead-preflight.sh proving the pattern works at small scale, but it has not been systematically applied to the PR review pipeline — the highest-volume agent workflow.
Technical Opportunity
The review pipeline currently has the LLM fetch diffs, CI status, and PR metadata via tool calls inside the reasoning loop. Each tool call adds:
Schema overhead (8-12KB per MCP tool definition, sent with every API request)
Reasoning tokens for the agent to decide which tool to use
Round-trip latency for tool execution
Moving these to shell steps in review-one-pr.sh and passing as structured context would eliminate this overhead. The existing dev-lead-preflight.sh pattern provides a proven template. GitHub's approach:
Run gh CLI commands in a setup step to gather all deterministic data
Format as structured markdown/JSON context
Pass to the LLM as pre-populated context, not as discoverable tools
Remove unused MCP tool definitions from agent configurations
Assessment
Dimension
Score
Rationale
Feasibility
high
dev-lead-preflight.sh already proves the pattern; review pipeline refactoring is bounded engineering work
Impact
high
62% ET savings demonstrated by GitHub on comparable workloads; directly reduces the largest cost driver
Urgency
high
Every day without this optimization burns 2-3x more tokens than necessary on deterministic reads
Adversarial Review
Strongest objection: Pre-gathered data could become stale if a force push occurs between the preflight step and the agent's analysis. Refactoring existing review scripts risks introducing bugs in a production pipeline.
Rebuttal: Staleness is mitigable — gather data at the start of each agent turn, not minutes before. The triage tier (cheap, fast) can validate freshness by checking the HEAD SHA matches what was pre-fetched before the expensive deep review runs. Engineering risk is bounded because dev-lead-preflight.sh already proves the pattern, and the review pipeline's multi-tier architecture means changes can be validated at the triage tier before rolling to deeper tiers. Start with the highest-volume, most deterministic reads (PR metadata, labels, file list) and expand incrementally.
Suggested Next Step
Audit review-one-pr.sh and review-batch.sh for all LLM-initiated data fetches. Categorize each as deterministic (can preflight) vs. dynamic (must stay in-loop). Implement preflight gathering for deterministic reads and measure ET savings via the Token Cost Observatory.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Restructure agent workflows to move all deterministic data fetching (git diff, CI status, label checks, file listing, PR metadata) into shell-based preflight steps before invoking the LLM. Feed pre-gathered context as structured input rather than making the agent discover it via tool calls, eliminating tool-invocation token overhead and reasoning-loop waste.
Market Signal
GitHub's own engineering team achieved 62% Effective Token (ET) reduction in their agentic workflows by moving pre-agentic data gathering into workflow setup steps (documented in their May 2026 blog post "Improving token efficiency in GitHub Agentic Workflows"). Their Auto-Triage workflow showed sustained 62% ET savings across 109 post-fix runs. Security Guard achieved 43%, Smoke Claude 59%. The pattern is now considered a production best practice for LLM-based CI/CD. Key insight: every MCP tool schema adds 8-12KB of context per call, and unused tools waste context budget without providing value.
User Signal
The Token Cost Observatory discussion (#332) already identified pre-agentic optimization as a goal. Weekly token reports (issue #464) show ongoing cost visibility and awareness. The project already has
dev-lead-preflight.shproving the pattern works at small scale, but it has not been systematically applied to the PR review pipeline — the highest-volume agent workflow.Technical Opportunity
The review pipeline currently has the LLM fetch diffs, CI status, and PR metadata via tool calls inside the reasoning loop. Each tool call adds:
Moving these to shell steps in
review-one-pr.shand passing as structured context would eliminate this overhead. The existingdev-lead-preflight.shpattern provides a proven template. GitHub's approach:ghCLI commands in a setup step to gather all deterministic dataAssessment
Adversarial Review
Strongest objection: Pre-gathered data could become stale if a force push occurs between the preflight step and the agent's analysis. Refactoring existing review scripts risks introducing bugs in a production pipeline.
Rebuttal: Staleness is mitigable — gather data at the start of each agent turn, not minutes before. The triage tier (cheap, fast) can validate freshness by checking the HEAD SHA matches what was pre-fetched before the expensive deep review runs. Engineering risk is bounded because
dev-lead-preflight.shalready proves the pattern, and the review pipeline's multi-tier architecture means changes can be validated at the triage tier before rolling to deeper tiers. Start with the highest-volume, most deterministic reads (PR metadata, labels, file list) and expand incrementally.Suggested Next Step
Audit
review-one-pr.shandreview-batch.shfor all LLM-initiated data fetches. Categorize each as deterministic (can preflight) vs. dynamic (must stay in-loop). Implement preflight gathering for deterministic reads and measure ET savings via the Token Cost Observatory.Beta Was this translation helpful? Give feedback.
All reactions