Introduce script engine package for vMCP code mode#4748
Introduce script engine package for vMCP code mode#4748
Conversation
Introduce the pkg/script/ package with the public interface for vMCP code mode. The Executor interface provides Execute() for running Starlark scripts and ToolDescription() for tools/list injection. Tool binds MCP tool metadata with a dispatch callback. Config holds execution parameters (step limit, parallel concurrency cap, tool call timeout). Part of #4742 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Core execution engine that wraps user scripts in a function body for top-level return support, enforces step limits to prevent runaway computation, and captures print() output as logs. This is the lowest layer of the script package — it only knows about Starlark, not MCP. Part of #4742 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Convert tests in description_test.go and execute_test.go to use testify require and table-driven structure per project testing rules. Part of #4742 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduce internal/conversions with three concerns: - Bidirectional Go ↔ Starlark type conversion (preserving JSON number fidelity by promoting whole float64s to Int) - MCP result parsing that handles structured content with/without the mcp-go SDK wrapper, text content as JSON or plain string, and error results - Tool name sanitization for converting MCP names to valid Starlark identifiers Part of #4742 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduce internal/builtins with:
- Tool callable factory supporting positional args (arg0, arg1, ...),
keyword args, and mixed calling conventions
- call_tool("name", ...) for dispatching to tools by original name
when the name is not a valid Starlark identifier
- parallel() builtin for concurrent fan-out with optional concurrency
limit via semaphore
- BuildToolMap for collision-safe name → callback mapping
Part of #4742
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Connect the public Executor interface to internal/core, internal/builtins, and internal/conversions. The executor builds a Starlark environment from bound tools (each as a callable with positional+kwargs support), injects data arguments, executes the script, and wraps the result as an mcp.CallToolResult with optional log content. Integration tests verify multi-tool scripts, loops/conditionals, parallel fan-out, data arguments, call_tool dispatch, step limit enforcement, and log capture. Part of #4742 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4748 +/- ##
==========================================
+ Coverage 68.78% 68.94% +0.16%
==========================================
Files 516 527 +11
Lines 54307 54663 +356
==========================================
+ Hits 37353 37688 +335
+ Misses 14100 14096 -4
- Partials 2854 2879 +25 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Fix correctness and safety issues found during code review: - Fix data race: parallel() child threads now get their own log buffers instead of sharing the parent Print callback. Logs are merged after wg.Wait() completes. - Propagate step limit to parallel() child threads so scripts cannot bypass CPU limits via parallel([lambda: infinite_loop()]). - Thread context.Context into parallel() for cancellation support. Semaphore acquire and task launch check ctx.Done(). - Set thread.Load to return an error instead of nil (prevents panic if a script uses a load() statement). - Reject data argument keys that shadow builtins or tool names instead of silently overwriting them. - Return errors for invalid data argument types instead of silently dropping them. - Simplify builtins package to expose a single Build() function that returns globals and reserved names. All other functions are unexported. - Consolidate builtins tests into a single test file using table-driven tests. Part of #4742 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implementation Plan (approved before coding)PR 1: Introduce script engine packageContextStory #4742 — Ship opt-in code mode for vMCP. Greenfield implementation (prototype discarded). PR 1 introduces the core Public API (
|
| AC | Test |
|---|---|
| unit: script can call multiple tools, loops/conditionals, return aggregated result | script_test.go |
unit: parallel() fans out concurrently, returns in order |
builtins/parallel_test.go |
| unit: step limit exceeded returns clear error | core/execute_test.go |
| unit: result unwrapping handles multiple formats | conversions/result_test.go |
Additional: positional/kwargs arg handling (builtins/tools_test.go), type conversion round-trips (conversions/starlark_test.go), name sanitization (conversions/toolname_test.go).
Verification
task test # unit tests pass
task lint-fix # linting passes
task license-fix # SPDX headers|
✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review. |
Large PR justification has been provided. Thank you!
- Remove reserved set from builtins.Build() — caller checks globals keys directly to prevent data argument shadowing - Consolidate integration tests into a single table-driven TestExecutor - Add test for automatic JSON string → structured data conversion Part of #4742 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Make per-tool-call timeout a caller responsibility: remove ToolCallTimeout from Config and document the contract on Tool.Call. Add test proving caller-set context deadlines are respected. - Fix UTF-8 truncation in GenerateToolDescription: slice by rune count instead of byte index to avoid splitting multi-byte characters. - Run go mod tidy to mark go.starlark.net as a direct dependency. Part of #4742 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4b8089a to
b32094b
Compare
Summary
pkg/script/package — the Starlark execution engine, tool bridge, and builtins — with a clean public interface. The middleware (PR 2), optimizer integration (PR 3), and config/CRD wiring (PR 4) build on this foundation.Part of #4742
Type of change
Test plan
task test)task lint-fix)Changes
pkg/script/script.goExecutorinterface,Tool,Config,New()pkg/script/executor.gopkg/script/description.goGenerateToolDescription()fortools/listinjectionpkg/script/internal/core/execute.gopkg/script/internal/conversions/starlark.gopkg/script/internal/conversions/result.goParseToolResultformcp.CallToolResult→ Go valuespkg/script/internal/conversions/toolname.gopkg/script/internal/builtins/builtins.goBuild()entry point returning globals + reserved namespkg/script/internal/builtins/tools.gopkg/script/internal/builtins/calltool.gocall_tool("name", ...)generic dispatch builtinpkg/script/internal/builtins/parallel.goparallel()concurrent fan-out with semaphore, context cancellation, child step limitsDoes this introduce a user-facing change?
No. This PR introduces a library package with no wiring to the server yet. Code mode becomes user-facing when the middleware is added in PR 2.
Special notes for reviewers
This is a new, self-contained package with no integration into the existing server. The XL size label is expected — the diff is ~60% tests and the rest is a new internal package tree. No existing files are modified.
Review by commit — each commit introduces one layer, building bottom-up:
Executor,Tool,Config)internal/core— Starlark execution engineinternal/conversions— type conversion + MCP result parsinginternal/builtins—parallel(),call_tool(), tool callable factoryload()panic, data shadowing, builtins simplification)ToolCallTimeoutfrom Config — caller responsibility viaTool.Callclosure, UTF-8 safe truncation,go mod tidy)Key design decisions:
Tool.Callis a closure — the middleware (PR 2) constructs these with callbacks that route through the middleware chainctxthrough toTool.Callwithout adding deadlines. Callers enforce timeouts by wrappingctxwithcontext.WithTimeoutin their closure.*mcp.CallToolResultdirectly — no custom result types, the middleware serializes it as-is{"arg0": val, "arg1": val, ...}— same convention for both direct tool calls andcall_tool()parallel()supports optional concurrency cap via semaphore, propagates step limit to child threads, respects context cancellationLarge PR Justification
pkg/script/) with no modifications to existing filescore/,builtins/,conversions/) must ship together for the public API to compileGenerated with Claude Code