💡 Batch API Cost Arbitrage for Scheduled Agent Workloads #639
Replies: 1 comment
-
|
Lets exclude health checks, fleet monitoring, and dependency audits from batch also. Feature Ideation is batchable and can and should use batch with the highest level model at highest effort without incurring higher cost. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Integrate Anthropic Message Batches API and Gemini Batch API into scheduled, non-real-time agent workflows (daily-pr-review-health, actions-fleet-monitor, dependency-audit, feature-ideation) to capture the universal 50% batch discount. These workflows already run on cron schedules and produce async reports — they don't need real-time responses.
Market Signal
Both Anthropic and Google offer 50% discounts on batch API endpoints (24-hour SLA). Combined with prompt caching, batch processing achieves up to 95% cost reduction on cached inputs. The Anthropic June 15, 2026 billing split makes this more urgent: Agent SDK credits (
claude -p, GitHub Actions invocations) are now a finite monthly pool ($20 Pro / $100 Max 5x / $200 Max 20x), and batch processing stretches that budget 2x further. Gemini 2.5 Pro drops to $0.625/$5 per MTok on batch. Industry research shows well-implemented gateways achieve 40% lower operating costs via routing optimization.User Signal
Discussion #631 explicitly requests evaluating "alternative consumption approach for some workloads: Batch Endpoints." The Token Cost Observatory (#332) and weekly token reports show active cost monitoring. Multiple scheduled workflows run on overnight cron schedules where 24-hour batch latency is acceptable:
daily-pr-review-health.yml,actions-fleet-monitor.yml,dependency-audit.yml,feature-ideation.yml. Discussion #635 (Agent SDK Credit Budget Circuit Breaker) highlights the finite credit pool risk — batch processing is a demand-side complement to that supply-side guard.Technical Opportunity
engine.sh'srun_writer/run_agenticfunctions currently make synchronous API calls. A batch variant would: (1) submit prompts via Anthropic's/v1/messages/batchesendpoint, (2) poll for completion within the workflow's timeout, (3) extract results. Themodel-pricing.tsvalready has per-model rates; adding aBATCH_DISCOUNT=0.5multiplier keeps cost reporting accurate. Workflow YAML files declare their schedule via cron triggers — this metadata can drive automatic batch routing. Thetoken-metrics.shlibrary already supports per-workflow token logging, enabling precise savings measurement.Assessment
Adversarial Review
Strongest objection: Batch API has a 24-hour SLA, making it unsuitable for PR review which needs real-time response. Adding batch polling complexity to workflow scripts increases maintenance burden and introduces new failure modes (batch timeouts, partial results).
Rebuttal: Scope is explicitly limited to scheduled/async workloads that already tolerate multi-hour latency. PR review stays real-time — only health checks, fleet monitoring, dependency audits, and feature ideation qualify. The polling complexity is bounded: Anthropic's batch API returns a simple status endpoint, and the 24-hour SLA is generous for overnight cron jobs. Batch failures fall back to real-time API calls, preserving the existing behavior as a safety net.
Suggested Next Step
Audit all cron-scheduled workflows to identify candidates for batch processing. Prototype a
batch_submit/batch_pollhelper inscripts/lib/that wraps the Anthropic Message Batches API, with fallback to synchronous API on batch failure. Estimate monthly savings by cross-referencing Token Cost Observatory data with batch-eligible workflow runs.Beta Was this translation helpful? Give feedback.
All reactions