-
-
Notifications
You must be signed in to change notification settings - Fork 149
Code Execution Plugin
Note: This feature is currently implemented but not yet enabled in Witsy.
Anthropic recently published an article on code execution with MCP, demonstrating how to enable Claude to orchestrate multiple tool calls through a single "execute code" function. This approach can dramatically reduce token usage and improve reliability when chaining multiple operations together.
Implementing code execution in Witsy presents unique challenges compared to a typical MCP server setup:
-
Dynamic MCP servers: MCP servers in Witsy are user-configured and can be added or removed at runtime. This means we cannot write static "documentation" or examples for real code - the available tools and their schemas are completely dynamic.
-
Desktop application constraints: Witsy is a desktop application that needs to remain lightweight and portable. Implementing a sandboxed execution environment (as Anthropic does) would require heavy dependencies like Docker, which conflicts with Witsy's goal of being an accessible, easy-to-install desktop app.
Given these constraints, Witsy's implementation takes a different approach focused on workflow orchestration rather than arbitrary code execution.
💡 Results: Despite these limitations, the initial implementation shows dramatic improvements - reducing token usage from 50,000+ tokens to ~5,000 tokens (a 10x reduction) for simple multi-step tasks. See the Example below for details.
Source Code: src/plugins/code_exec.ts
The CodeExecutionPlugin is implemented as a MultiToolPlugin that exposes only two tools to the model. All other available tools (MCP servers, plugins, etc.) are not directly accessible to the model - instead, they can only be invoked through these two tools:
-
code_exec_get_tools_info: Retrieves detailed information about specific tools, including their parameters and descriptions. The tool's description includes a complete list of all available tools, allowing the model to discover what tools exist before requesting details. -
code_exec_run_program: Executes a workflow described as a JSON sequence of tool calls.
The plugin provides streaming status updates during workflow execution, yielding status messages as each step begins and completes (e.g., "Executing step 1: get_workspaces", "Completed step 1"). This allows the UI to show real-time progress to the user.
Getting models to reliably use code execution requires detailed system instructions. The following instructions are provided to guide the model:
View complete system instructions
To accomplish tasks, you can use the run_program tool. This tool provides a list of other tools you can use. You can get details about these tools using get_tool_info.
When to Use run_program:
- ALWAYS prefer run_program for workflows involving 2+ steps or when steps depend on each other
- Use run_program proactively rather than making individual tool calls and then switching to it
-
Before writing a program: Use
code_exec_get_tool_infoto understand the exact tool names, required parameters, and argument structure for unfamiliar tools - Once you understand the required tools, write a complete program and call run_program directly
Do not write single-step programs to retrieve intermediate data: the goal is to streamline execution of complete workflows.
Workflow for Using run_program:
- Identify which tools you need for the task
- Use
code_exec_get_tool_infowithtool_nameparameter to get details about each tool you're unfamiliar with - Review the tool's parameters, required fields, and structure
- Write a complete multi-step program with proper tool names and arguments
- Execute with
code_exec_run_program
Program Structure
The program must contain a steps array:
{
"steps": [...]
}Step Definition Each step object must include:
-
id: A unique string identifier for this step (used for output references in later steps). -
tool: The exact name of the tool to call (e.g., "search_internet", "asana_get_tasks___c9c8"). Use the complete tool name including any suffixes. -
args: An object with key-value pairs representing the inputs for this tool as described in its specification. - Optionally,
on_error: Controls workflow behavior if this step fails. Acceptable values are"continue"(ignore the error and proceed) or"abort"(stop execution). Default should be"abort"if not specified.
References Between Steps
- When a step's argument value should be populated with the output of a previous step, use the reference syntax:
{{step_id.result}} - You CAN access nested fields and array indices:
{{step_id.result.field_name}}or{{step_id.result.0.field_name}}- Example:
{{get_workspaces.result.0.gid}}to access thegidfield of the first workspace - Example:
{{get_user.result.gid}}to access thegidfield directly
- Example:
- You can reference fields at any depth:
{{step_id.result.data.items.0.id}} - When uncertain about structure, reference the whole result:
{{step_id.result}}
General Style & Validity
- The JSON must be valid (no comments, trailing commas, or extra fields).
- Remember the root object contains only a
stepsarray - Use clear and direct names for step IDs.
- Use only tools and parameters exactly as specified.
- For each tool, provide only the arguments that are required or documented.
- Programs should be minimal, containing only as many steps as needed to accomplish the described objective.
Error Handling
- If not specified, treat
on_erroras"abort". - When instructed, apply
"on_error": "continue"to relevant steps to skip on failure.
Key Points:
- Use
code_exec_get_tool_infofirst to understand unfamiliar tools - Start with
{"steps": [...]}(noprogramwrapper) - You can access nested fields in step results (e.g.,
.gid,.0.gid) - Use complete tool names including suffixes
- Plan the full workflow before executing
These instructions emphasize:
-
Proactive usage: Encourage the model to use
run_programfor multi-step workflows rather than making individual tool calls - Discovery workflow: Get tool info first, then write the program
- Clear syntax: Explicit examples of variable substitution patterns
- Error guidance: How to handle failures and structure references
Unlike Anthropic's implementation which allows arbitrary TypeScript code execution and intermediate data processing (filtering, mapping, etc.), Witsy's implementation focuses on workflow orchestration. A workflow is defined as:
{
"steps": [
{ "id": "step1", "tool": "tool_name", "args": { ... } },
{ "id": "step2", "tool": "tool_name", "args": { "param": "{{step1.result}}" } },
// ...
]
}Each step executes a tool and stores its result. Subsequent steps can reference previous results using template variables with the syntax {{step_id.path.to.value}}.
The plugin supports sophisticated variable substitution:
-
Dot notation:
{{step1.user.name}} -
Array access:
{{step1.items.0}}or{{step1.items[0]}} -
Bracket notation with paths:
{{step1.result[0].gid}} - Nested objects and arrays: Variables can be used in complex argument structures
Since it's impossible to predict the structure of objects returned by arbitrary MCP calls, the plugin provides JSON schemas in error messages to guide the model toward self-correction.
When a variable resolution error occurs (e.g., accessing a non-existent property or an out-of-bounds array index), the plugin returns the complete JSON schema of the actual result structure. This schema uses Witsy's lightweight "simple JSON schema" format (see Agents JSON format):
{
"status": "string",
"count": "number",
"items": ["string"]
}For example, if the model tries to access {{get_data.nonexistent}}, the error message includes:
Failed to resolve variables in "{"value":"{{get_data.nonexistent}}"}" for step "use_data".
Expected schema for "tool1":
{
"status": "string",
"count": "number",
"items": []
}
This approach provides the model with the complete, accurate structure of the data, allowing it to correct its variable references on the next attempt.
The plugin implements schema learning to reduce trial-and-error:
- Schema Generation: Every time a tool executes successfully, the plugin automatically generates a simple JSON schema from the actual result
-
Persistent Storage: Schemas are saved to disk (
code_exec.jsonin the app's userData directory) and persist across sessions -
Proactive Delivery: When the model calls
code_exec_get_tools_info, the response includesresult_schemafields for any tools that have been executed before
Example get_tools_info response:
{
"tools_info": [
{
"name": "asana_get_tasks",
"description": "Retrieves tasks from Asana...",
"parameters": { ... },
"result_schema": "{\"data\":[{\"gid\":\"string\",\"name\":\"string\"}]}"
}
]
}This means after the first successful execution of a tool, the model can see the expected output structure before writing programs that reference it, significantly reducing variable resolution errors.
To simplify variable references, the plugin automatically unwraps common result wrappers:
- If a tool returns
{ result: { ... } }, theresultproperty is auto-unwrapped - If a tool returns
{ data: { ... } }, thedataproperty is auto-unwrapped - JSON strings are automatically parsed
This means {{step1.items}} works whether the tool returns { result: { items: [...] } } or { items: [...] } directly.
Retrieve all Asana tasks assigned to the current user and store them in long-term memory.
- Without code execution: >50,000 tokens (multiple back-and-forth exchanges, each tool call requires full context)
- With code execution: ~5,000 tokens (two tool calls total)
First call - Get tool information:
{
"tool": "code_exec_get_tools_info",
"parameters": {
"tools_names": [
"asana_list_workspaces___c9c8",
"asana_get_user___c9c8",
"asana_get_tasks___c9c8",
"long_term_memory"
]
}
}Second call - Execute workflow:
{
"tool": "code_exec_run_program",
"parameters": {
"program": {
"steps": [
{
"id": "get_workspaces",
"tool": "asana_list_workspaces___c9c8",
"args": {}
},
{
"id": "get_user",
"tool": "asana_get_user___c9c8",
"args": { "user_id": "me" }
},
{
"id": "get_tasks",
"tool": "asana_get_tasks___c9c8",
"args": {
"assignee": "{{get_user.result.gid}}",
"workspace": "{{get_workspaces.result.0.gid}}",
"limit": 100
}
},
{
"id": "save_to_memory",
"tool": "long_term_memory",
"args": {
"action": "store",
"content": [
"User's Asana tasks retrieved on November 8, 2025: {{get_tasks.result}}"
]
}
}
]
}
}
}The entire operation completes in two tool calls with full variable substitution between steps.
Witsy's implementation is intentionally narrower than Anthropic's MCP code execution:
- No arbitrary code execution or data processing between steps
- No filtering, mapping, or transformation of intermediate results
- No conditional logic or branching within workflows
- Focus is purely on orchestrating sequential tool calls with variable passing
This trade-off was made to:
- Avoid the complexity and security concerns of sandboxed code execution
- Keep the implementation lightweight and maintainable
- Align with Witsy's architecture as a desktop application
Getting models to reliably use code execution still requires prompt engineering:
- Models need to be encouraged to use
get_tools_infofirst to understand tool schemas - Some models struggle with the JSON syntax for complex workflows
- Variable substitution patterns need to be clearly explained
The feature works best with frontier models (Claude Sonnet 4.5, etc.) that have strong instruction-following capabilities.
The current error detection has some weaknesses:
if (result.error || (typeof result === 'string' && result.toLowerCase().startsWith('error'))) {This approach:
- Only catches errors that start with "error" (case-insensitive)
- Misses errors formatted as
"Failed to..."or"Cannot..." - Relies on string matching rather than structured error formats
A more robust approach would require:
- Standardized error formats from MCP servers (which we don't control)
- Or more sophisticated heuristics for error detection
- Or explicit error signaling from the plugin execution layer
The automatic unwrapping of result and data properties, while convenient, adds cognitive overhead:
- It's not always clear whether
{{step1.items}}refers tostep1.itemsorstep1.result.items - The logic for determining when to unwrap has edge cases
- Makes the variable resolution code more complex (~130 lines with multiple branching paths)
This is a trade-off between convenience for simple cases and explicitness for complex ones.
Planned improvements include:
- Hardening variable substitution: Improve the robustness and reliability of the variable resolution logic to handle more edge cases gracefully
- Better result visibility: Provide the model with summary information about what was accomplished at each step, rather than just a boolean success/failure indicator
-
Error handling control: Implement the
on_errorparameter described in system instructions to allow workflows to continue on failure or abort as needed - Workflow constraints: Define and enforce practical limits on workflow size (max steps), execution time (timeouts), and resource usage
Completed Enhancements:
- ✅ Schema learning (implemented): The plugin now automatically captures JSON schemas from tool results and provides them in
get_tools_inforesponses, dramatically reducing variable resolution errors
Despite these limitations, the code execution plugin provides significant value for multi-step operations, dramatically reducing token usage while maintaining the reliability of tool chaining.