Skip to content

Proxy-enforced compute budgets and rate limits #310

Description

@mostlydev

Part of #302 (Phase 3). Closes the manifesto promise tracked in the Phase 0 overclaim issue.

Problem

The manifesto (Principle 6, §VII) promises hard compute budgets and rate limits (429s) at the proxy. The cost accumulator meters per-agent/provider/model spend but takes no enforcement action. fleet.budget.set exists as a claw-api write verb — verify what it currently affects; it must become effective at the proxy for the loop to close.

Scope

  • Per-agent budget caps declared in pod YAML (x-claw budget block or extension of the model-policy surface, ADR-019) and compiled into the per-agent context (metadata.json or sibling).
  • On breach: reject with 429 + structured error body, emit intervention: budget_exceeded telemetry.
  • Rate limits (requests/window per agent) with the same enforcement + telemetry shape.
  • Runtime adjustment path: fleet.budget.set updates the effective cap without pod recompile (mechanism TBD — context-mount rewrite + reload, or proxy admin endpoint).

Design split (recommended)

Simple hard caps and counters belong in cllama core — an infrastructure guarantee that works with no policy service attached. Adaptive/conditional budgeting (time-of-day, task-class, burst credit) belongs to the policy plane (#302) once it exists. This issue covers only the core guarantee.

Acceptance

  • A spike test demonstrates: agent exceeds its cap mid-session → 429 → claw audit shows the intervention → fleet.budget.set raises the cap → traffic resumes.
  • Manifesto/site claims (softened in Phase 0) get re-promoted to present tense.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions