feat: add programmatic tool caller#387
Conversation
This tool enables programmatic/code-based tool invocation for Strands Agents,
inspired by Anthropic's Programmatic Tool Calling feature. It allows an agent
to write Python code that calls other tools as functions.
Key features:
- Tools exposed as callable methods via 'tools.<tool_name>(**kwargs)'
- Supports complex orchestration with loops, conditionals, data processing
- Captures stdout/stderr from executed code
- Records all tool calls for transparency
- Validates code for potentially dangerous patterns
- User confirmation required unless BYPASS_TOOL_CONSENT is set
Example usage:
result = agent.tool.programmatic_tool_caller(
code='''
result = tools.calculator(expression="2 + 2")
print(f"Result: {result}")
'''
)
The tool integrates with Strands' DecoratedFunctionTool pattern, calling
tools directly with keyword arguments and handling both string and dict
return values.
Includes comprehensive unit tests covering:
- ToolProxy functionality
- Code validation
- Tool execution
- Integration with real tools
- Edge cases and error handling
Changes: - Use tool_context via @tool(context=True) instead of agent parameter - Handle multiple content blocks in tool results (combine all text) - Remove allowed_tools parameter (let agent decide which tools to use) - Add comprehensive integration tests with real tools - Fix test assertions and add more edge case coverage Test coverage: - 43 unit tests - 10 integration tests - All tests passing
- Add tool entry to the tools table - Add usage example section with code sample - Note that tool does not work on Windows (uses exec)
Major changes: - Remove ToolProxy class, inject tools directly as functions - Tools exposed as both async (tool_name) and sync (tool_name_sync) - Only return print() output, not tool call summary or execution time - Support async tool calls via asyncio This aligns with Anthropic's design where: - Tools are callable as async functions: await tool_name(...) - Only print() output is captured and returned to agent - Tool results stay in code execution context, don't enter agent messages
- Remove sync functions, only expose async (await tool_name(...)) - Auto-wrap user code in async function - no boilerplate needed - Support asyncio.gather() for parallel execution - Simplified implementation and tests
- Add Executor abstract base class for custom execution environments - LocalAsyncExecutor as default (local exec with asyncio) - Custom executors can be set via: programmatic_tool_caller.executor = MyExecutor() - Add PROGRAMMATIC_TOOL_CALLER_ALLOWED_TOOLS env var to control exposed tools - Tests for executor swapping and env var filtering
This reverts commit ca41d0f.
be93226 to
717566e
Compare
Use agent.tool.<name>() instead of directly calling tool_impl() from registry. This properly handles all tool types including MCP tools which are not directly callable but work through the ToolExecutor._stream() mechanism. - Changed _execute_tool to use getattr(agent.tool, tool_name)() - Added record_direct_tool_call=False to prevent polluting message history - Handle AttributeError for tool not found case
717566e to
b273bd6
Compare
There was a problem hiding this comment.
Interrupt limitation with programmatic tool calls
Tools that use interrupts (human-in-the-loop) will raise a RuntimeError when invoked through the programmatic tool caller. The SDK's _ToolCaller has two guards:
- If an interrupt is already active when a direct tool call is made →
RuntimeError("cannot directly call tool during interrupt") - If a tool raises an interrupt during a direct call →
RuntimeError("cannot raise interrupt in direct tool call")
This is a known SDK limitation — direct/programmatic tool calls have no mechanism to pause execution, collect human input, and resume. The error surfaces as a failed tool result back to the agent, so the agent can reason about it and try an alternative approach.
Add limitations note regarding interrupt-capable tools.
Aligns programmatic_tool_caller with the sandboxes design doc (strands-agents/docs#681) Phase 1 requirements: - Remove Executor ABC and LocalAsyncExecutor classes The design doc separates Sandbox (SDK-level, where code runs) from the programmatic tool caller (tools-level, runs in host process). The Executor abstraction competed with the Sandbox design. - Inline async execution logic directly in the tool function Phase 1 always runs orchestration code in-process. The ~15 lines of execution logic are now directly in programmatic_tool_caller(). - Use compile() for better error tracebacks Per the design doc: compile(code, '<programmatic_tool_caller>', 'exec') gives clearer tracebacks than raw exec(). - Remove custom executor documentation and examples The Custom Executors section in the module docstring is removed. The Sandbox + Tool Proxy design (Phase 2) replaces this concept. - Remove executor-related tests TestExecutor class and test_custom_executor removed. Added test_stderr_captured and test_syntax_error_handled for coverage. The core tool logic (tool wrappers, _execute_tool, _create_async_tool_function, _validate_code, _get_allowed_tools) is unchanged. The tool gets simpler, not more complex. Refs: strands-agents/docs#681, strands-agents#387
…configurable extras
Changes:
- Remove _validate_code() — python_repl has no code validation, neither should we
- Match base namespace to python_repl: {"__name__": "__main__"} instead of
custom {__builtins__, asyncio, json, re, math}
- asyncio always injected (required for async tool wrappers)
- Add PROGRAMMATIC_TOOL_CALLER_EXTRA_MODULES env var for configurable namespace
(comma-separated module names, e.g. "json,re,math,collections")
- Extract _build_namespace() helper for clarity and testability
- Add TestBuildNamespace test class (7 tests)
- Add test_extra_modules_available_in_code and test_code_can_import_modules
- Remove TestValidateCode test class and all _validate_code references
- Remove unused imports (json, math, re from top-level)
agent-of-mkmeral
left a comment
There was a problem hiding this comment.
🔴 Adversarial Testing Result: FAIL — 3 issues found
Scope: Full adversarial testing of programmatic_tool_caller — input boundaries, stdout/stderr restoration, async execution, namespace security, tool execution error paths, env var handling, contract verification, BaseException handling, indentation/wrapping, and concurrency.
Tests written: 47
Tests passing: 47 (all findings confirmed with reproducible artifacts)
Tests failing (findings): 3 bugs found and proven
Findings Summary
| # | Category | Severity | Description |
|---|---|---|---|
| 1 | Bug | Critical | SystemExit and KeyboardInterrupt escape the tool — sys.exit() in user code kills the host process |
| 2 | Bug | Medium | Empty code and comment-only code cause SyntaxError due to async wrapping producing empty function body |
| 3 | Bug | Medium | Tool named asyncio shadows the required asyncio module in namespace, breaking all async functionality |
Finding 1 — BaseException Escape (Critical)
Category: Bug
Severity: Critical
Reproduction:
# This kills the host process:
result = programmatic_tool_caller(code="import sys; sys.exit(42)", tool_context=ctx)
# ^^^ SystemExit propagates past except Exception — host dies
# This also escapes:
result = programmatic_tool_caller(code="raise KeyboardInterrupt()", tool_context=ctx)
# ^^^ KeyboardInterrupt propagates to callerRoot cause: The exception handlers are:
except SyntaxError:
...
except Exception:
...SystemExit and KeyboardInterrupt inherit from BaseException, NOT Exception. They bypass both catch blocks.
Observed behavior: SystemExit propagates up the call stack, killing the host process. KeyboardInterrupt similarly escapes.
Expected behavior: User code should never be able to kill the host. The tool should catch these and return an error dict.
Fix:
except SyntaxError:
...
except (SystemExit, KeyboardInterrupt) as e:
error_msg = f"Execution error: {type(e).__name__}: {e}"
console.print(Panel(error_msg, title="[bold red]Error[/]", border_style="red"))
return {"status": "error", "content": [{"text": error_msg}]}
except Exception:
...Artifact: TestFinding2_BaseExceptionEscape::test_system_exit_escapes_tool, test_keyboard_interrupt_escapes_tool
Finding 2 — Empty/Comment-Only Code SyntaxError (Medium)
Category: Unhandled Edge Case
Severity: Medium
Reproduction:
# Empty code → SyntaxError
result = programmatic_tool_caller(code="", tool_context=ctx)
# Returns: {"status": "error", "content": [{"text": "Syntax error: expected an indented block..."}]}
# Comment-only code → SyntaxError
result = programmatic_tool_caller(code="# just a comment", tool_context=ctx)
# Same SyntaxErrorRoot cause: The async wrapping does:
indented_code = textwrap.indent(code, " ")
wrapped_code = f"async def __user_code__():\n{indented_code}\n"Empty code → async def __user_code__():\n\n → empty function body = SyntaxError.
Comment-only → async def __user_code__():\n # comment\n → comments aren't statements = SyntaxError.
Expected behavior: Empty/comment-only code should succeed with (no output), same as passing x = 42.
Fix: Add pass if code strip is empty or comments-only:
if not code.strip() or all(line.strip().startswith('#') for line in code.strip().splitlines()):
indented_code = " pass"
else:
indented_code = textwrap.indent(code, " ")Artifact: TestFinding1_EmptyCodeWrapping::test_empty_code_string_fails, test_comment_only_code_fails
Finding 3 — asyncio Module Shadowing (Medium)
Category: Bug
Severity: Medium
Reproduction:
# If a tool named 'asyncio' exists in the registry:
ns = _build_namespace({"asyncio"}, agent)
assert ns["asyncio"] is not asyncio_module # True! It's now a tool wrapper
# asyncio.gather(), asyncio.sleep() etc. are now brokenRoot cause: In _build_namespace(), asyncio module is injected first, then tools overwrite by name:
namespace["asyncio"] = asyncio # ← injected
for tool_name in available_tools:
namespace[tool_name] = _create_async_tool_function(...) # ← overwrites 'asyncio'Expected behavior: Reserved names (asyncio, __name__, extra modules) should not be overwritable by tool names.
Fix: Either exclude reserved names from tool injection, or inject tools first and modules after:
# Option A: Exclude reserved names
RESERVED = {"asyncio", "__name__"}
for tool_name in available_tools - RESERVED:
namespace[tool_name] = ...
# Option B: Inject tools first, then required modules (modules win)
for tool_name in available_tools:
namespace[tool_name] = ...
namespace["asyncio"] = asyncio # Overwrite any tool shadowingArtifact: TestFinding3_AsyncioShadowing::test_asyncio_tool_overwrites_module
What Survived
41 adversarial tests passed without finding issues:
- ✅ stdout/stderr properly restored after RuntimeError, SyntaxError, and user-replaced stdout
- ✅ Unicode, null bytes, large output all handled correctly
- ✅ Async code with user-defined coroutines,
asyncio.sleep,asyncio.gatherall work - ✅ Tool errors properly propagate to user code via try/except
- ✅
asyncio.gatherwith 20 concurrent calls completes correctly - ✅ Sequential tool calls in loops work correctly
- ✅ Namespace isolation: agent object not leaked,
__name__properly set - ✅ Self-exclusion:
programmatic_tool_callernever available in namespace - ✅ Return format consistent (always
{"status": ..., "content": [{"text": ...}]}) - ✅ Env var edge cases (extra commas, whitespace, empty strings) handled correctly
- ✅ Complex code structures (decorators, classes, nested indentation) work through async wrapping
🤖 AI agent response. Strands Agents. Feedback welcome!
Description
Adds
programmatic_tool_callertool that enables code-based tool invocation. Agents can write Python code that calls other tools as async functions, reducing API round-trips and enabling complex orchestration patterns like loops, parallel execution, and chaining.Note: Does not work on Windows.
Key Features
await tool_name(...)- code runs in async context automaticallyprint()output returned: Tool results stay in code execution context, don't enter agent's context window unless explicitly printedExecutorimplementations for sandboxed environments (Docker, Lambda, etc.)BYPASS_TOOL_CONSENT), configurable allowed toolsExample Usage
Environment Variables
BYPASS_TOOL_CONSENT"true"PROGRAMMATIC_TOOL_CALLER_ALLOWED_TOOLSCustom Executors
Related Issues
Type of Change
New Tool
Testing
asyncio.gather, custom executors, user cancellationChecklist