Skip to content

seadotdev/agent-preflight

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

agent-preflight

Safe, budget-aware multi-model agent experiments via OpenRouter.

When you're running a field of LLMs on an autonomous task — each in its own Docker container, each with its own API key and spend cap — you need: preflight validation, per-model budget enforcement, eject policies, event logging, and a status dashboard. That's what this library provides.

    ============================================================
    PREFLIGHT CHECKS
    ============================================================
    [v] Account balance          $12.34 available
    [v] Provisioning budget      2 models x $0.25 = $0.50 (OK)
    [v] Model anthropic/claude-sonnet-4   $3.00/$15.00 per M tokens
    [v] Key provisioning (1¢ smoke)       key=sk-or-v1-abc... reply='OK'
    [!] Prompt caching           cache created but no read hit
    [x] Docker image 'my-agent'  not found — build it first

    PREFLIGHT FAILED (1 issue(s)):
      x Docker image 'my-agent' not found

Install

pip install ai-preflight

Requires Python 3.11+. Docker is used via the CLI (no Docker SDK dependency). The package ships with py.typed for mypy/pyright users.

Quickstart

from agent_preflight import preflight, provision_key, BudgetTracker

# 1. Validate before spending anything
preflight(
    admin_key="sk-or-admin-...",
    or_key="sk-or-...",
    models=["meta-llama/llama-3.3-70b-instruct"],
    budget_per_model=0.25,
)

# 2. Provision a capped sub-key
key = provision_key(admin_key, label="my-run", limit_usd=0.25)

# Optional: prefer {} on model-list fetch errors
from agent_preflight import get_available_models_or_empty
models = get_available_models_or_empty("sk-or-...")

# 3. Track spend inside your agent loop
tracker = BudgetTracker(api_key=key, limit_usd=0.25, mode="detailed")
status = tracker.poll()
print(status)  # Budget: $0.25 remaining of $0.25 (100% left)

Prompt caching

Prompt caching is built into the runner, cache report, and PromptBuilder.

  • Anthropic defaults to explicit per-block cache_control because that is the most reliable OpenRouter behavior we have observed.
  • OpenAI-style providers benefit from a stable, append-only prompt prefix.
  • The detailed guide, behavior notes, and practical best practices live in CACHE.md.

Full Docker harness

from agent_preflight import run_multi, BudgetFractionEject, IdleTimeoutEject, CompositeEject

results = run_multi(
    models=[
        ("meta-llama/llama-3.3-70b-instruct", "llama"),
        ("google/gemini-flash-1.5", "gemini"),
    ],
    image_name="my-agent:latest",
    admin_key="sk-or-admin-...",
    or_key="sk-or-...",
    budget_usd=0.25,
    mounts=[
        ("/path/to/TASK.md", "/workspace/TASK.md", "ro"),
        "/path/to/cache:/workspace/cache:delegated",
    ],
    eject_policy=CompositeEject([
        BudgetFractionEject(check_after_pct=0.25,
                            progress_fn=lambda ctx: ctx["tool_calls"] > 0),
        IdleTimeoutEject(timeout_s=180),
    ]),
    budget_mode="detailed",
)

Each container:

  • Gets a provisioned sub-key with a hard spend cap
  • Receives budget.txt in its workspace every 30s (so the agent can see its balance)
  • Is killed if an eject policy triggers

Status dashboard

# Watch live run progress
watch -n 10 python -m agent_preflight status

# Or a specific run
python -m agent_preflight status run-20260312-120000

Output:

  Run: run-20260312-120000
  ============================================================
  alias           status      events  tools  snaps  cost  time  note
  ──────────────────────────────────────────────────────────────────
  llama           DONE            87     42      5  $0.18  4m22s
  gemini          running         34     18      2  $0.07  1m45s*  $0.18 remaining...

Preflight checks

preflight() runs up to 7 checks before your experiment starts:

Check What it validates
balance Account balance > $0.10
math Balance covers n_models × budget_per_model
models Each model exists on OpenRouter (with pricing)
tool_support Models in required_params / common_required_params support those request params
key_smoke Provisions a $0.01 test key and makes a real API call
cache_smoke Sends two Anthropic cacheable requests to verify prompt caching
docker_image Docker image exists locally (if docker_image= set)
preflight(
    admin_key=..., or_key=...,
    models=[...], budget_per_model=0.25,
    checks=["balance", "math", "models", "tool_support"],
    common_required_params=["tools"],
    required_params={"anthropic/claude-sonnet-4": ["tools"]},
    custom_checks=[my_health_check],  # zero-arg or context-aware
)
from agent_preflight import run_single

result = run_single(
    model_id="z-ai/glm-5",
    extra_models=[
        "google/gemini-2.5-flash",
        "anthropic/claude-sonnet-4",
    ],
    image_name="my-agent:latest",
    admin_key="sk-or-admin-...",
    or_key="sk-or-...",
    budget_usd=0.25,
)

Context-aware custom checks receive a dict with already-fetched data:

def check_tool_support(ctx):
    model = "anthropic/claude-sonnet-4"
    supported = set((ctx["available_models"] or {}).get(model, {}).get("supported_parameters", []))
    missing = {"tools"} - supported
    return (
        "Custom tool support",
        not missing,
        "ok" if not missing else f"missing: {', '.join(sorted(missing))}",
    )

Eject policies

Built-in policies:

Class Triggers when
BudgetExceededEject Measured spend ≥ limit
IdleTimeoutEject(300) No events for N seconds
BudgetFractionEject(0.20, progress_fn) progress_fn returns False at N% budget spent
CompositeEject([...]) Any sub-policy triggers
default_eject_policies() BudgetExceededEject + IdleTimeoutEject(300s)

Custom policy:

from agent_preflight import EjectPolicy, EjectDecision

class MyEject(EjectPolicy):
    def check(self, *, budget_status, events, context, elapsed_s, **_):
        if context.get("output_written"):
            return EjectDecision(should_eject=False)
        if budget_status.pct_spent > 0.5:
            return EjectDecision(should_eject=True, reason="no output after 50% budget")
        return EjectDecision(should_eject=False)

Budget injection

Agents that read /workspace/output/budget.txt get real-time cost visibility:

Simple mode:

Budget: $0.086 remaining of $0.15 (57% left)

Detailed mode:

Budget: $0.086 remaining of $0.15 (57% left)

Recent costs:
  - $0.0012  (bash)
  - $0.0031  (read_file)
  - $0.0008  (bash)

Total actions: 14  |  Avg cost/action: $0.0010
Estimated actions remaining: ~86

Agent contract

Your Docker image needs to:

  1. Accept OPENROUTER_API_KEY as an env var
  2. Optionally read /workspace/output/budget.txt for cost awareness
  3. Write output files to /workspace/output/
  4. (For pi-style RPC mode) Accept a JSON prompt on stdin

Beyond that, the harness is framework-agnostic. Works with pi-coding-agent, Claude Code, custom agents, or any container that calls OpenRouter.

Environment variables for the CLI

OR_ADMIN_KEY=sk-or-admin-...
OPENROUTER_API_KEY=sk-or-...
PREFLIGHT_MODELS=meta-llama/llama-3.3-70b-instruct,google/gemini-flash-1.5
PREFLIGHT_BUDGET=0.25

python -m agent_preflight preflight

Background

This pattern emerged from two independent experiments:

  1. Loanville2 — an LLM lending benchmark where models autonomously underwrite loan applications through a full Loan Origination System REST API. Each model gets a provisioned sub-key and budget cap; a preflight check validates balance, model availability, and provisioning before the run starts.

  2. pi-docker — a game strategy experiment where models compete to improve a Kurve AI in Docker containers. Budget awareness was injected as a file every 30 seconds; eject policies killed models that spent 25%+ of their budget without running a single test.

Both converged on the same infrastructure. This library extracts it.

License

MIT

About

reusable library for agent simulations

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages