Safe, budget-aware multi-model agent experiments via OpenRouter.
When you're running a field of LLMs on an autonomous task — each in its own Docker container, each with its own API key and spend cap — you need: preflight validation, per-model budget enforcement, eject policies, event logging, and a status dashboard. That's what this library provides.
============================================================
PREFLIGHT CHECKS
============================================================
[v] Account balance $12.34 available
[v] Provisioning budget 2 models x $0.25 = $0.50 (OK)
[v] Model anthropic/claude-sonnet-4 $3.00/$15.00 per M tokens
[v] Key provisioning (1¢ smoke) key=sk-or-v1-abc... reply='OK'
[!] Prompt caching cache created but no read hit
[x] Docker image 'my-agent' not found — build it first
PREFLIGHT FAILED (1 issue(s)):
x Docker image 'my-agent' not found
pip install ai-preflightRequires Python 3.11+. Docker is used via the CLI (no Docker SDK dependency).
The package ships with py.typed for mypy/pyright users.
from agent_preflight import preflight, provision_key, BudgetTracker
# 1. Validate before spending anything
preflight(
admin_key="sk-or-admin-...",
or_key="sk-or-...",
models=["meta-llama/llama-3.3-70b-instruct"],
budget_per_model=0.25,
)
# 2. Provision a capped sub-key
key = provision_key(admin_key, label="my-run", limit_usd=0.25)
# Optional: prefer {} on model-list fetch errors
from agent_preflight import get_available_models_or_empty
models = get_available_models_or_empty("sk-or-...")
# 3. Track spend inside your agent loop
tracker = BudgetTracker(api_key=key, limit_usd=0.25, mode="detailed")
status = tracker.poll()
print(status) # Budget: $0.25 remaining of $0.25 (100% left)Prompt caching is built into the runner, cache report, and PromptBuilder.
- Anthropic defaults to explicit per-block
cache_controlbecause that is the most reliable OpenRouter behavior we have observed. - OpenAI-style providers benefit from a stable, append-only prompt prefix.
- The detailed guide, behavior notes, and practical best practices live in CACHE.md.
from agent_preflight import run_multi, BudgetFractionEject, IdleTimeoutEject, CompositeEject
results = run_multi(
models=[
("meta-llama/llama-3.3-70b-instruct", "llama"),
("google/gemini-flash-1.5", "gemini"),
],
image_name="my-agent:latest",
admin_key="sk-or-admin-...",
or_key="sk-or-...",
budget_usd=0.25,
mounts=[
("/path/to/TASK.md", "/workspace/TASK.md", "ro"),
"/path/to/cache:/workspace/cache:delegated",
],
eject_policy=CompositeEject([
BudgetFractionEject(check_after_pct=0.25,
progress_fn=lambda ctx: ctx["tool_calls"] > 0),
IdleTimeoutEject(timeout_s=180),
]),
budget_mode="detailed",
)Each container:
- Gets a provisioned sub-key with a hard spend cap
- Receives
budget.txtin its workspace every 30s (so the agent can see its balance) - Is killed if an eject policy triggers
# Watch live run progress
watch -n 10 python -m agent_preflight status
# Or a specific run
python -m agent_preflight status run-20260312-120000Output:
Run: run-20260312-120000
============================================================
alias status events tools snaps cost time note
──────────────────────────────────────────────────────────────────
llama DONE 87 42 5 $0.18 4m22s
gemini running 34 18 2 $0.07 1m45s* $0.18 remaining...
preflight() runs up to 7 checks before your experiment starts:
| Check | What it validates |
|---|---|
balance |
Account balance > $0.10 |
math |
Balance covers n_models × budget_per_model |
models |
Each model exists on OpenRouter (with pricing) |
tool_support |
Models in required_params / common_required_params support those request params |
key_smoke |
Provisions a $0.01 test key and makes a real API call |
cache_smoke |
Sends two Anthropic cacheable requests to verify prompt caching |
docker_image |
Docker image exists locally (if docker_image= set) |
preflight(
admin_key=..., or_key=...,
models=[...], budget_per_model=0.25,
checks=["balance", "math", "models", "tool_support"],
common_required_params=["tools"],
required_params={"anthropic/claude-sonnet-4": ["tools"]},
custom_checks=[my_health_check], # zero-arg or context-aware
)from agent_preflight import run_single
result = run_single(
model_id="z-ai/glm-5",
extra_models=[
"google/gemini-2.5-flash",
"anthropic/claude-sonnet-4",
],
image_name="my-agent:latest",
admin_key="sk-or-admin-...",
or_key="sk-or-...",
budget_usd=0.25,
)Context-aware custom checks receive a dict with already-fetched data:
def check_tool_support(ctx):
model = "anthropic/claude-sonnet-4"
supported = set((ctx["available_models"] or {}).get(model, {}).get("supported_parameters", []))
missing = {"tools"} - supported
return (
"Custom tool support",
not missing,
"ok" if not missing else f"missing: {', '.join(sorted(missing))}",
)Built-in policies:
| Class | Triggers when |
|---|---|
BudgetExceededEject |
Measured spend ≥ limit |
IdleTimeoutEject(300) |
No events for N seconds |
BudgetFractionEject(0.20, progress_fn) |
progress_fn returns False at N% budget spent |
CompositeEject([...]) |
Any sub-policy triggers |
default_eject_policies() |
BudgetExceededEject + IdleTimeoutEject(300s) |
Custom policy:
from agent_preflight import EjectPolicy, EjectDecision
class MyEject(EjectPolicy):
def check(self, *, budget_status, events, context, elapsed_s, **_):
if context.get("output_written"):
return EjectDecision(should_eject=False)
if budget_status.pct_spent > 0.5:
return EjectDecision(should_eject=True, reason="no output after 50% budget")
return EjectDecision(should_eject=False)Agents that read /workspace/output/budget.txt get real-time cost visibility:
Simple mode:
Budget: $0.086 remaining of $0.15 (57% left)
Detailed mode:
Budget: $0.086 remaining of $0.15 (57% left)
Recent costs:
- $0.0012 (bash)
- $0.0031 (read_file)
- $0.0008 (bash)
Total actions: 14 | Avg cost/action: $0.0010
Estimated actions remaining: ~86
Your Docker image needs to:
- Accept
OPENROUTER_API_KEYas an env var - Optionally read
/workspace/output/budget.txtfor cost awareness - Write output files to
/workspace/output/ - (For pi-style RPC mode) Accept a JSON prompt on stdin
Beyond that, the harness is framework-agnostic. Works with pi-coding-agent, Claude Code, custom agents, or any container that calls OpenRouter.
OR_ADMIN_KEY=sk-or-admin-...
OPENROUTER_API_KEY=sk-or-...
PREFLIGHT_MODELS=meta-llama/llama-3.3-70b-instruct,google/gemini-flash-1.5
PREFLIGHT_BUDGET=0.25
python -m agent_preflight preflightThis pattern emerged from two independent experiments:
-
Loanville2 — an LLM lending benchmark where models autonomously underwrite loan applications through a full Loan Origination System REST API. Each model gets a provisioned sub-key and budget cap; a preflight check validates balance, model availability, and provisioning before the run starts.
-
pi-docker — a game strategy experiment where models compete to improve a Kurve AI in Docker containers. Budget awareness was injected as a file every 30 seconds; eject policies killed models that spent 25%+ of their budget without running a single test.
Both converged on the same infrastructure. This library extracts it.
MIT