agent-preflight

Safe, budget-aware multi-model agent experiments via OpenRouter.

When you're running a field of LLMs on an autonomous task — each in its own Docker container, each with its own API key and spend cap — you need: preflight validation, per-model budget enforcement, eject policies, event logging, and a status dashboard. That's what this library provides.

    ============================================================
    PREFLIGHT CHECKS
    ============================================================
    [v] Account balance          $12.34 available
    [v] Provisioning budget      2 models x $0.25 = $0.50 (OK)
    [v] Model anthropic/claude-sonnet-4   $3.00/$15.00 per M tokens
    [v] Key provisioning (1¢ smoke)       key=sk-or-v1-abc... reply='OK'
    [!] Prompt caching           cache created but no read hit
    [x] Docker image 'my-agent'  not found — build it first

    PREFLIGHT FAILED (1 issue(s)):
      x Docker image 'my-agent' not found

Install

pip install ai-preflight

Requires Python 3.11+. Docker is used via the CLI (no Docker SDK dependency). The package ships with py.typed for mypy/pyright users.

Quickstart

from agent_preflight import preflight, provision_key, BudgetTracker

# 1. Validate before spending anything
preflight(
    admin_key="sk-or-admin-...",
    or_key="sk-or-...",
    models=["meta-llama/llama-3.3-70b-instruct"],
    budget_per_model=0.25,
)

# 2. Provision a capped sub-key
key = provision_key(admin_key, label="my-run", limit_usd=0.25)

# Optional: prefer {} on model-list fetch errors
from agent_preflight import get_available_models_or_empty
models = get_available_models_or_empty("sk-or-...")

# 3. Track spend inside your agent loop
tracker = BudgetTracker(api_key=key, limit_usd=0.25, mode="detailed")
status = tracker.poll()
print(status)  # Budget: $0.25 remaining of $0.25 (100% left)

Prompt caching

Prompt caching is built into the runner, cache report, and PromptBuilder.

Anthropic defaults to explicit per-block cache_control because that is the most reliable OpenRouter behavior we have observed.
OpenAI-style providers benefit from a stable, append-only prompt prefix.
The detailed guide, behavior notes, and practical best practices live in CACHE.md.

Full Docker harness

from agent_preflight import run_multi, BudgetFractionEject, IdleTimeoutEject, CompositeEject

results = run_multi(
    models=[
        ("meta-llama/llama-3.3-70b-instruct", "llama"),
        ("google/gemini-flash-1.5", "gemini"),
    ],
    image_name="my-agent:latest",
    admin_key="sk-or-admin-...",
    or_key="sk-or-...",
    budget_usd=0.25,
    mounts=[
        ("/path/to/TASK.md", "/workspace/TASK.md", "ro"),
        "/path/to/cache:/workspace/cache:delegated",
    ],
    eject_policy=CompositeEject([
        BudgetFractionEject(check_after_pct=0.25,
                            progress_fn=lambda ctx: ctx["tool_calls"] > 0),
        IdleTimeoutEject(timeout_s=180),
    ]),
    budget_mode="detailed",
)

Each container:

Gets a provisioned sub-key with a hard spend cap
Receives budget.txt in its workspace every 30s (so the agent can see its balance)
Is killed if an eject policy triggers

Status dashboard

# Watch live run progress
watch -n 10 python -m agent_preflight status

# Or a specific run
python -m agent_preflight status run-20260312-120000

Output:

  Run: run-20260312-120000
  ============================================================
  alias           status      events  tools  snaps  cost  time  note
  ──────────────────────────────────────────────────────────────────
  llama           DONE            87     42      5  $0.18  4m22s
  gemini          running         34     18      2  $0.07  1m45s*  $0.18 remaining...

Preflight checks

preflight() runs up to 7 checks before your experiment starts:

Check	What it validates
`balance`	Account balance > $0.10
`math`	Balance covers `n_models × budget_per_model`
`models`	Each model exists on OpenRouter (with pricing)
`tool_support`	Models in `required_params` / `common_required_params` support those request params
`key_smoke`	Provisions a $0.01 test key and makes a real API call
`cache_smoke`	Sends two Anthropic cacheable requests to verify prompt caching
`docker_image`	Docker image exists locally (if `docker_image=` set)

preflight(
    admin_key=..., or_key=...,
    models=[...], budget_per_model=0.25,
    checks=["balance", "math", "models", "tool_support"],
    common_required_params=["tools"],
    required_params={"anthropic/claude-sonnet-4": ["tools"]},
    custom_checks=[my_health_check],  # zero-arg or context-aware
)

from agent_preflight import run_single

result = run_single(
    model_id="z-ai/glm-5",
    extra_models=[
        "google/gemini-2.5-flash",
        "anthropic/claude-sonnet-4",
    ],
    image_name="my-agent:latest",
    admin_key="sk-or-admin-...",
    or_key="sk-or-...",
    budget_usd=0.25,
)

Context-aware custom checks receive a dict with already-fetched data:

def check_tool_support(ctx):
    model = "anthropic/claude-sonnet-4"
    supported = set((ctx["available_models"] or {}).get(model, {}).get("supported_parameters", []))
    missing = {"tools"} - supported
    return (
        "Custom tool support",
        not missing,
        "ok" if not missing else f"missing: {', '.join(sorted(missing))}",
    )

Eject policies

Built-in policies:

Class	Triggers when
`BudgetExceededEject`	Measured spend ≥ limit
`IdleTimeoutEject(300)`	No events for N seconds
`BudgetFractionEject(0.20, progress_fn)`	progress_fn returns False at N% budget spent
`CompositeEject([...])`	Any sub-policy triggers
`default_eject_policies()`	BudgetExceededEject + IdleTimeoutEject(300s)

Custom policy:

from agent_preflight import EjectPolicy, EjectDecision

class MyEject(EjectPolicy):
    def check(self, *, budget_status, events, context, elapsed_s, **_):
        if context.get("output_written"):
            return EjectDecision(should_eject=False)
        if budget_status.pct_spent > 0.5:
            return EjectDecision(should_eject=True, reason="no output after 50% budget")
        return EjectDecision(should_eject=False)

Budget injection

Agents that read /workspace/output/budget.txt get real-time cost visibility:

Simple mode:

Budget: $0.086 remaining of $0.15 (57% left)

Detailed mode:

Budget: $0.086 remaining of $0.15 (57% left)

Recent costs:
  - $0.0012  (bash)
  - $0.0031  (read_file)
  - $0.0008  (bash)

Total actions: 14  |  Avg cost/action: $0.0010
Estimated actions remaining: ~86

Agent contract

Your Docker image needs to:

Accept OPENROUTER_API_KEY as an env var
Optionally read /workspace/output/budget.txt for cost awareness
Write output files to /workspace/output/
(For pi-style RPC mode) Accept a JSON prompt on stdin

Beyond that, the harness is framework-agnostic. Works with pi-coding-agent, Claude Code, custom agents, or any container that calls OpenRouter.

Environment variables for the CLI

OR_ADMIN_KEY=sk-or-admin-...
OPENROUTER_API_KEY=sk-or-...
PREFLIGHT_MODELS=meta-llama/llama-3.3-70b-instruct,google/gemini-flash-1.5
PREFLIGHT_BUDGET=0.25

python -m agent_preflight preflight

Background

This pattern emerged from two independent experiments:

Loanville2 — an LLM lending benchmark where models autonomously underwrite loan applications through a full Loan Origination System REST API. Each model gets a provisioned sub-key and budget cap; a preflight check validates balance, model availability, and provisioning before the run starts.
pi-docker — a game strategy experiment where models compete to improve a Kurve AI in Docker containers. Budget awareness was injected as a file every 30 seconds; eject policies killed models that spent 25%+ of their budget without running a single test.

Both converged on the same infrastructure. This library extracts it.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
agent_preflight		agent_preflight
examples		examples
tests		tests
.gitignore		.gitignore
CACHE.md		CACHE.md
EVENTS.md		EVENTS.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agent-preflight

Install

Quickstart

Prompt caching

Full Docker harness

Status dashboard

Preflight checks

Eject policies

Budget injection

Agent contract

Environment variables for the CLI

Background

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agent-preflight

Install

Quickstart

Prompt caching

Full Docker harness

Status dashboard

Preflight checks

Eject policies

Budget injection

Agent contract

Environment variables for the CLI

Background

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages