Skip to content

feat: add programmatic tool caller#387

Open
mkmeral wants to merge 19 commits intostrands-agents:mainfrom
mkmeral:feat/programmatic-tool-caller
Open

feat: add programmatic tool caller#387
mkmeral wants to merge 19 commits intostrands-agents:mainfrom
mkmeral:feat/programmatic-tool-caller

Conversation

@mkmeral
Copy link
Copy Markdown
Contributor

@mkmeral mkmeral commented Feb 5, 2026

Description

Adds programmatic_tool_caller tool that enables code-based tool invocation. Agents can write Python code that calls other tools as async functions, reducing API round-trips and enabling complex orchestration patterns like loops, parallel execution, and chaining.

Note: Does not work on Windows.

Key Features

  • Async-first design: Tools exposed as await tool_name(...) - code runs in async context automatically
  • Only print() output returned: Tool results stay in code execution context, don't enter agent's context window unless explicitly printed
  • Pluggable executor: Custom Executor implementations for sandboxed environments (Docker, Lambda, etc.)
  • Security controls: Code validation for dangerous patterns, user confirmation (bypassable via BYPASS_TOOL_CONSENT), configurable allowed tools

Example Usage

from strands import Agent
from strands_tools import programmatic_tool_caller, calculator

agent = Agent(tools=[programmatic_tool_caller, calculator])

result = agent.tool.programmatic_tool_caller(
    code="""
# Simple tool call
result = await calculator(expression="2 + 2")
print(f"Result: {result}")

# Loop with tool calls
total = 0
for i in range(1, 6):
    square = await calculator(expression=f"{i} ** 2")
    total += int(square)
print(f"Sum of squares: {total}")

# Parallel execution
results = await asyncio.gather(
    calculator(expression="10 * 1"),
    calculator(expression="10 * 2"),
    calculator(expression="10 * 3"),
)
print(f"Parallel results: {results}")
"""
)

Environment Variables

Variable Description
BYPASS_TOOL_CONSENT Skip user confirmation if "true"
PROGRAMMATIC_TOOL_CALLER_ALLOWED_TOOLS Comma-separated list of tools to expose (default: all except self)

Custom Executors

from strands_tools.programmatic_tool_caller import programmatic_tool_caller, Executor

class DockerExecutor(Executor):
    def execute(self, code: str, namespace: dict) -> str:
        # Run in sandboxed container
        ...

programmatic_tool_caller.executor = DockerExecutor()

Related Issues

Type of Change

New Tool

Testing

  • Unit tests for executor, tool execution, validation, and allowed tools filtering
  • Integration tests with real Agent and tools
  • Tests cover async execution, loops, asyncio.gather, custom executors, user cancellation

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly (README.md)
  • I have added an appropriate example to the documentation to outline the feature
  • My changes generate no new warnings

Containerized Agent added 7 commits February 5, 2026 04:37
This tool enables programmatic/code-based tool invocation for Strands Agents,
inspired by Anthropic's Programmatic Tool Calling feature. It allows an agent
to write Python code that calls other tools as functions.

Key features:
- Tools exposed as callable methods via 'tools.<tool_name>(**kwargs)'
- Supports complex orchestration with loops, conditionals, data processing
- Captures stdout/stderr from executed code
- Records all tool calls for transparency
- Validates code for potentially dangerous patterns
- User confirmation required unless BYPASS_TOOL_CONSENT is set

Example usage:
  result = agent.tool.programmatic_tool_caller(
      code='''
      result = tools.calculator(expression="2 + 2")
      print(f"Result: {result}")
      '''
  )

The tool integrates with Strands' DecoratedFunctionTool pattern, calling
tools directly with keyword arguments and handling both string and dict
return values.

Includes comprehensive unit tests covering:
- ToolProxy functionality
- Code validation
- Tool execution
- Integration with real tools
- Edge cases and error handling
Changes:
- Use tool_context via @tool(context=True) instead of agent parameter
- Handle multiple content blocks in tool results (combine all text)
- Remove allowed_tools parameter (let agent decide which tools to use)
- Add comprehensive integration tests with real tools
- Fix test assertions and add more edge case coverage

Test coverage:
- 43 unit tests
- 10 integration tests
- All tests passing
- Add tool entry to the tools table
- Add usage example section with code sample
- Note that tool does not work on Windows (uses exec)
Major changes:
- Remove ToolProxy class, inject tools directly as functions
- Tools exposed as both async (tool_name) and sync (tool_name_sync)
- Only return print() output, not tool call summary or execution time
- Support async tool calls via asyncio

This aligns with Anthropic's design where:
- Tools are callable as async functions: await tool_name(...)
- Only print() output is captured and returned to agent
- Tool results stay in code execution context, don't enter agent messages
@mkmeral mkmeral changed the title Feat/programmatic tool caller feat: Add programmatic tool caller Feb 5, 2026
Containerized Agent added 2 commits February 5, 2026 06:10
- Remove sync functions, only expose async (await tool_name(...))
- Auto-wrap user code in async function - no boilerplate needed
- Support asyncio.gather() for parallel execution
- Simplified implementation and tests
Containerized Agent added 3 commits February 5, 2026 15:08
- Add Executor abstract base class for custom execution environments
- LocalAsyncExecutor as default (local exec with asyncio)
- Custom executors can be set via: programmatic_tool_caller.executor = MyExecutor()
- Add PROGRAMMATIC_TOOL_CALLER_ALLOWED_TOOLS env var to control exposed tools
- Tests for executor swapping and env var filtering
@mkmeral mkmeral marked this pull request as ready for review February 5, 2026 22:09
@mkmeral mkmeral force-pushed the feat/programmatic-tool-caller branch from be93226 to 717566e Compare February 6, 2026 17:43
Use agent.tool.<name>() instead of directly calling tool_impl() from registry.
This properly handles all tool types including MCP tools which are not directly
callable but work through the ToolExecutor._stream() mechanism.

- Changed _execute_tool to use getattr(agent.tool, tool_name)()
- Added record_direct_tool_call=False to prevent polluting message history
- Handle AttributeError for tool not found case
@mkmeral mkmeral force-pushed the feat/programmatic-tool-caller branch from 717566e to b273bd6 Compare February 6, 2026 18:43
Copy link
Copy Markdown
Contributor Author

@mkmeral mkmeral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interrupt limitation with programmatic tool calls

Tools that use interrupts (human-in-the-loop) will raise a RuntimeError when invoked through the programmatic tool caller. The SDK's _ToolCaller has two guards:

  • If an interrupt is already active when a direct tool call is made → RuntimeError("cannot directly call tool during interrupt")
  • If a tool raises an interrupt during a direct call → RuntimeError("cannot raise interrupt in direct tool call")

This is a known SDK limitation — direct/programmatic tool calls have no mechanism to pause execution, collect human input, and resume. The error surfaces as a failed tool result back to the agent, so the agent can reason about it and try an alternative approach.

Add limitations note regarding interrupt-capable tools.
@mkmeral mkmeral changed the title feat: Add programmatic tool caller feat: add programmatic tool caller Feb 23, 2026
Aligns programmatic_tool_caller with the sandboxes design doc
(strands-agents/docs#681) Phase 1 requirements:

- Remove Executor ABC and LocalAsyncExecutor classes
  The design doc separates Sandbox (SDK-level, where code runs) from
  the programmatic tool caller (tools-level, runs in host process).
  The Executor abstraction competed with the Sandbox design.

- Inline async execution logic directly in the tool function
  Phase 1 always runs orchestration code in-process. The ~15 lines
  of execution logic are now directly in programmatic_tool_caller().

- Use compile() for better error tracebacks
  Per the design doc: compile(code, '<programmatic_tool_caller>', 'exec')
  gives clearer tracebacks than raw exec().

- Remove custom executor documentation and examples
  The Custom Executors section in the module docstring is removed.
  The Sandbox + Tool Proxy design (Phase 2) replaces this concept.

- Remove executor-related tests
  TestExecutor class and test_custom_executor removed. Added
  test_stderr_captured and test_syntax_error_handled for coverage.

The core tool logic (tool wrappers, _execute_tool, _create_async_tool_function,
_validate_code, _get_allowed_tools) is unchanged. The tool gets simpler, not
more complex.

Refs: strands-agents/docs#681, strands-agents#387
…configurable extras

Changes:
- Remove _validate_code() — python_repl has no code validation, neither should we
- Match base namespace to python_repl: {"__name__": "__main__"} instead of
  custom {__builtins__, asyncio, json, re, math}
- asyncio always injected (required for async tool wrappers)
- Add PROGRAMMATIC_TOOL_CALLER_EXTRA_MODULES env var for configurable namespace
  (comma-separated module names, e.g. "json,re,math,collections")
- Extract _build_namespace() helper for clarity and testability
- Add TestBuildNamespace test class (7 tests)
- Add test_extra_modules_available_in_code and test_code_can_import_modules
- Remove TestValidateCode test class and all _validate_code references
- Remove unused imports (json, math, re from top-level)
Copy link
Copy Markdown

@agent-of-mkmeral agent-of-mkmeral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Adversarial Testing Result: FAIL — 3 issues found

Scope: Full adversarial testing of programmatic_tool_caller — input boundaries, stdout/stderr restoration, async execution, namespace security, tool execution error paths, env var handling, contract verification, BaseException handling, indentation/wrapping, and concurrency.

Tests written: 47
Tests passing: 47 (all findings confirmed with reproducible artifacts)
Tests failing (findings): 3 bugs found and proven

Findings Summary
# Category Severity Description
1 Bug Critical SystemExit and KeyboardInterrupt escape the tool — sys.exit() in user code kills the host process
2 Bug Medium Empty code and comment-only code cause SyntaxError due to async wrapping producing empty function body
3 Bug Medium Tool named asyncio shadows the required asyncio module in namespace, breaking all async functionality

Finding 1 — BaseException Escape (Critical)

Category: Bug
Severity: Critical

Reproduction:

# This kills the host process:
result = programmatic_tool_caller(code="import sys; sys.exit(42)", tool_context=ctx)
# ^^^ SystemExit propagates past except Exception — host dies

# This also escapes:
result = programmatic_tool_caller(code="raise KeyboardInterrupt()", tool_context=ctx)
# ^^^ KeyboardInterrupt propagates to caller

Root cause: The exception handlers are:

except SyntaxError:
    ...
except Exception:
    ...

SystemExit and KeyboardInterrupt inherit from BaseException, NOT Exception. They bypass both catch blocks.

Observed behavior: SystemExit propagates up the call stack, killing the host process. KeyboardInterrupt similarly escapes.

Expected behavior: User code should never be able to kill the host. The tool should catch these and return an error dict.

Fix:

except SyntaxError:
    ...
except (SystemExit, KeyboardInterrupt) as e:
    error_msg = f"Execution error: {type(e).__name__}: {e}"
    console.print(Panel(error_msg, title="[bold red]Error[/]", border_style="red"))
    return {"status": "error", "content": [{"text": error_msg}]}
except Exception:
    ...

Artifact: TestFinding2_BaseExceptionEscape::test_system_exit_escapes_tool, test_keyboard_interrupt_escapes_tool


Finding 2 — Empty/Comment-Only Code SyntaxError (Medium)

Category: Unhandled Edge Case
Severity: Medium

Reproduction:

# Empty code → SyntaxError
result = programmatic_tool_caller(code="", tool_context=ctx)
# Returns: {"status": "error", "content": [{"text": "Syntax error: expected an indented block..."}]}

# Comment-only code → SyntaxError  
result = programmatic_tool_caller(code="# just a comment", tool_context=ctx)
# Same SyntaxError

Root cause: The async wrapping does:

indented_code = textwrap.indent(code, "    ")
wrapped_code = f"async def __user_code__():\n{indented_code}\n"

Empty code → async def __user_code__():\n\n → empty function body = SyntaxError.
Comment-only → async def __user_code__():\n # comment\n → comments aren't statements = SyntaxError.

Expected behavior: Empty/comment-only code should succeed with (no output), same as passing x = 42.

Fix: Add pass if code strip is empty or comments-only:

if not code.strip() or all(line.strip().startswith('#') for line in code.strip().splitlines()):
    indented_code = "    pass"
else:
    indented_code = textwrap.indent(code, "    ")

Artifact: TestFinding1_EmptyCodeWrapping::test_empty_code_string_fails, test_comment_only_code_fails


Finding 3 — asyncio Module Shadowing (Medium)

Category: Bug
Severity: Medium

Reproduction:

# If a tool named 'asyncio' exists in the registry:
ns = _build_namespace({"asyncio"}, agent)
assert ns["asyncio"] is not asyncio_module  # True! It's now a tool wrapper
# asyncio.gather(), asyncio.sleep() etc. are now broken

Root cause: In _build_namespace(), asyncio module is injected first, then tools overwrite by name:

namespace["asyncio"] = asyncio  # ← injected
for tool_name in available_tools:
    namespace[tool_name] = _create_async_tool_function(...)  # ← overwrites 'asyncio'

Expected behavior: Reserved names (asyncio, __name__, extra modules) should not be overwritable by tool names.

Fix: Either exclude reserved names from tool injection, or inject tools first and modules after:

# Option A: Exclude reserved names
RESERVED = {"asyncio", "__name__"}
for tool_name in available_tools - RESERVED:
    namespace[tool_name] = ...

# Option B: Inject tools first, then required modules (modules win)
for tool_name in available_tools:
    namespace[tool_name] = ...
namespace["asyncio"] = asyncio  # Overwrite any tool shadowing

Artifact: TestFinding3_AsyncioShadowing::test_asyncio_tool_overwrites_module


What Survived

41 adversarial tests passed without finding issues:

  • ✅ stdout/stderr properly restored after RuntimeError, SyntaxError, and user-replaced stdout
  • ✅ Unicode, null bytes, large output all handled correctly
  • ✅ Async code with user-defined coroutines, asyncio.sleep, asyncio.gather all work
  • ✅ Tool errors properly propagate to user code via try/except
  • asyncio.gather with 20 concurrent calls completes correctly
  • ✅ Sequential tool calls in loops work correctly
  • ✅ Namespace isolation: agent object not leaked, __name__ properly set
  • ✅ Self-exclusion: programmatic_tool_caller never available in namespace
  • ✅ Return format consistent (always {"status": ..., "content": [{"text": ...}]})
  • ✅ Env var edge cases (extra commas, whitespace, empty strings) handled correctly
  • ✅ Complex code structures (decorators, classes, nested indentation) work through async wrapping

🤖 AI agent response. Strands Agents. Feedback welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants