v0.8.7 — Per-model-tier cost budgets
Added — Per-model-tier cost budgets with deny-by-default fallback (v0.8.7)
ModelTierBudget — a new policy primitive that caps per-call cost AND/OR
output tokens per model tier label (e.g. "frontier" / "mid" /
"small"), evaluated before the tool executes. Distinct from the
existing flat BudgetConfig (no tier dimension), ModelCapabilityTier
(capability gating, not cost), and AgentSDKCreditBudget (monthly
subscription credits, not per-call). Closes the gap routers had when
fanning calls across tiers: a runaway agent can no longer burn frontier
tokens against a small-tier cap.
ModelTierBudget(agent_airlock.cost_tracking): mapping
{tier_label → TierBudget}plus a mandatorystrict_tierthat serves
as the deny-by-default fallback for untagged calls. Optional
tier_resolver: Callable[[str], str]callback maps model IDs to tier
labels — the router stays in the caller's code; airlock just invokes
the callback. Includesresolve_tier(),check_pre_execute()(raises
AirlockBudgetExceededon cap breach), andreconcile_post_execute()
(observability-only — logs the actual-vs-estimated delta but never
raises).TierBudget(frozen dataclass): per-tiermax_cost_centsand/or
max_output_tokenscaps. Worst-case cost estimate is
input_tokens × input_price + max_output_tokens × output_price—
reusesCostTracker.calculate_cost()for pricing (no duplicate
pricing table).AirlockBudgetExceeded(subclassesAirlockError): pre-execute
block carryingtier,cap,estimated_cost_cents,
estimated_output_tokens,budget_type,model_id. Surfaced as a
structuredAirlockResponsewithblock_reason="budget_exceeded".SecurityPolicy.model_tier_budget(new field) +
SecurityPolicy.check_model_tier_budget(): optional wiring point.
The@Airlockseam invokes it as Step 6 of_pre_execution()(after
RBAC / capability / filesystem / endpoint checks, before execute), and
threads the resultingBudgetEstimateinto_post_execution()for
actual-vs-estimated reconciliation when the tool result carries a
token_usageattribute / dict key. Frozen-policy digest covers the
budget so frozen policies don't drift on mutation.- Tier extraction at the call site, in priority order:
_airlock_tierkwarg (stripped before ghost-arg validation) →
arg-extractedcontext.metadata["airlock_tier"]→
contextvar-stored context's metadata →tier_resolver(model_id)→
strict_tier. The strict-tier fallback is the deny-by-default
guarantee — untagged calls hit the tightest cap. STRICT_MODEL_TIER_BUDGETpreset (policy_presets): three-tier
configuration with caps 50¢ / 10¢ / 2¢ (frontier / mid / small) and
strict_tier="small".strict_tier_budget_policy(tier_resolver=None): factory returning
aSecurityPolicyseeded with the strict preset.- New example at
examples/model_tier_budget.pydemonstrating four
routing patterns: explicit_airlock_tierkwarg,context.metadata
tagging via contextvar,model_id→tier_resolvercallback, and
composition with allow/deny lists. BlockReason.BUDGET_EXCEEDEDandhandle_budget_exceeded()in
self_heal.pyfor structured response building.- 39 new tests in
tests/test_model_tier_budget.pycovering
construction, tier resolution priority, worst-case cost estimation,
cap-exceeded blocks, reconciliation observability, full@Airlock
integration (sync + async), and digest stability.
Reconciliation never raises — a call that estimates 5¢ and actually costs
50¢ logs delta_cents=+45 but doesn't retroactively block. Users who
want a hard session cap should layer BudgetConfig.max_cost_per_session
on top of the global CostTracker.