Skip to content

feat: add Limits and support it during invoke/stream#2360

Merged
notowen333 merged 2 commits into
strands-agents:mainfrom
notowen333:feat/per-invocation-limits
May 28, 2026
Merged

feat: add Limits and support it during invoke/stream#2360
notowen333 merged 2 commits into
strands-agents:mainfrom
notowen333:feat/per-invocation-limits

Conversation

@notowen333
Copy link
Copy Markdown
Contributor

Description

Motivation

An agent loop today runs until the model stops requesting tools, a hook halts it, or it's cancelled. There's no caller-side guardrail on cost or runaway behavior — anyone wanting to bound a run has to wire a hook themselves or wrap the call in a timeout that races against the model rather than terminating cleanly at a turn boundary. This adds first-class, per-invocation budget caps so callers can put a ceiling on a single invoke_async / stream_async without fighting the loop.

Ports the equivalent TypeScript feature merged in strands-agents/sdk-typescript#1106.

Public API Changes

Agent.__call__, Agent.invoke_async, and Agent.stream_async accept an optional limits argument with three caps:

# Cap loop iterations
result = await agent.invoke_async("Plan a trip", limits={"turns": 5})
if result.stop_reason == "limit_turns":
    ...

# Cap cumulative model-generated tokens across the loop
result = await agent.invoke_async("Summarize", limits={"output_tokens": 10_000})
if result.stop_reason == "limit_output_tokens":
    ...

# Cap cumulative input + output tokens (compounds across turns; approximates billed spend)
result = await agent.invoke_async("Research X", limits={"total_tokens": 100_000})

# Combine
result = await agent.invoke_async("Plan a trip", limits={
    "turns": 10, "output_tokens": 50_000, "total_tokens": 200_000,
})

When a cap is reached the loop terminates gracefully at the next turn boundary and returns an AgentResult with one of three new stop_reason values: "limit_turns", "limit_output_tokens", "limit_total_tokens". No exception is raised — this mirrors how "cancelled" already works, and leaves result.message and result.metrics accessible.

The existing "max_tokens" reason (and MaxTokensReachedException) is unchanged. That one signals the model provider's per-call output cap and still raises; the new caller-set caps are graceful.

Caps are scoped to a single invocation — counters reset on each invoke_async / stream_async call against the same agent. Tools requested by the previous turn always run to completion before a cap fires, so agent.messages stays in a state the agent can be reinvoked from. On simultaneous trip the priority is turnstotal_tokensoutput_tokens, giving the most informative reason in the result. Caps are soft: a single oversized model response can overshoot the budget by one turn, since checks happen at turn boundaries, not within an individual model call.

Each cap, when set, must be a positive int; invalid values raise TypeError at the start of the invocation. Backward compatible — omitting limits preserves prior behavior.

Why a limits dict instead of top-level kwargs

Grouping under limits keeps the top-level kwarg surface stable as future caps (wall-clock, cost, etc.) are added. The stop reasons keep a limit_ prefix rather than collapsing to a single "limit_exceeded" so granular telemetry / log buckets are preserved — callers shouldn't have to derive which cap tripped from result.metrics plus the limits they passed in.

Use Cases

  • Cost ceilings on user-facing agents — bound an interactive turn so a misbehaving model can't burn an unbounded number of tokens before the user sees a response.
  • Eval / batch harnesses — give every run a fixed turns budget so a single stuck agent doesn't pin a worker.
  • Cheap timeouts at turn boundaries — terminate cleanly with a real stop_reason instead of cancelling mid-stream.

Related Issues

#2124

Documentation PR

N/A — will follow up.

Type of Change

New feature

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Codecov Report

❌ Patch coverage is 92.00000% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
strands-py/src/strands/event_loop/event_loop.py 84.61% 2 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

zastrowm
zastrowm previously approved these changes May 28, 2026
mypy narrows the loop key to a literal union, so the TypedDict access
no longer needs the literal-required ignore.
@github-actions github-actions Bot added size/m and removed size/m labels May 28, 2026
@notowen333 notowen333 enabled auto-merge (squash) May 28, 2026 19:58
@notowen333 notowen333 merged commit 9c17f2e into strands-agents:main May 28, 2026
36 of 40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants