Skip to content

Introduce script engine package for vMCP code mode#4748

Open
jerm-dro wants to merge 9 commits intomainfrom
jerm-dro/script-engine-package
Open

Introduce script engine package for vMCP code mode#4748
jerm-dro wants to merge 9 commits intomainfrom
jerm-dro/script-engine-package

Conversation

@jerm-dro
Copy link
Copy Markdown
Contributor

@jerm-dro jerm-dro commented Apr 10, 2026

Summary

  • Agents today batch multi-tool workflows through sequential tool calls with model inference between each one, wasting round-trips and tokens. vMCP code mode (Ship opt-in code mode for vMCP #4742) lets agents submit a Starlark script that calls tools server-side and returns an aggregated result in one call.
  • This PR introduces the core pkg/script/ package — the Starlark execution engine, tool bridge, and builtins — with a clean public interface. The middleware (PR 2), optimizer integration (PR 3), and config/CRD wiring (PR 4) build on this foundation.

Part of #4742

Type of change

  • New feature

Test plan

  • Unit tests (task test)
  • Linting (task lint-fix)

Changes

File Change
pkg/script/script.go Public API: Executor interface, Tool, Config, New()
pkg/script/executor.go Unexported executor wiring internals together
pkg/script/description.go GenerateToolDescription() for tools/list injection
pkg/script/internal/core/execute.go Starlark thread creation, script wrapping, step limit
pkg/script/internal/conversions/starlark.go Bidirectional Go ↔ Starlark type conversion
pkg/script/internal/conversions/result.go ParseToolResult for mcp.CallToolResult → Go values
pkg/script/internal/conversions/toolname.go MCP tool name → valid Starlark identifier
pkg/script/internal/builtins/builtins.go Single Build() entry point returning globals + reserved names
pkg/script/internal/builtins/tools.go Tool callable factory with positional + kwargs support
pkg/script/internal/builtins/calltool.go call_tool("name", ...) generic dispatch builtin
pkg/script/internal/builtins/parallel.go parallel() concurrent fan-out with semaphore, context cancellation, child step limits

Does this introduce a user-facing change?

No. This PR introduces a library package with no wiring to the server yet. Code mode becomes user-facing when the middleware is added in PR 2.

Special notes for reviewers

This is a new, self-contained package with no integration into the existing server. The XL size label is expected — the diff is ~60% tests and the rest is a new internal package tree. No existing files are modified.

Review by commit — each commit introduces one layer, building bottom-up:

  1. Public API types (Executor, Tool, Config)
  2. internal/core — Starlark execution engine
  3. internal/conversions — type conversion + MCP result parsing
  4. internal/builtinsparallel(), call_tool(), tool callable factory
  5. Executor wiring + integration tests
  6. Test style fix (require + table-driven)
  7. Self-review fixes (data race, step limit on child threads, context cancellation, load() panic, data shadowing, builtins simplification)
  8. Address review comments (remove reserved set, table-driven tests, JSON→structured data test)
  9. Address automated review (remove ToolCallTimeout from Config — caller responsibility via Tool.Call closure, UTF-8 safe truncation, go mod tidy)

Key design decisions:

  • Tool.Call is a closure — the middleware (PR 2) constructs these with callbacks that route through the middleware chain
  • Per-tool timeouts are a caller responsibility — the engine passes ctx through to Tool.Call without adding deadlines. Callers enforce timeouts by wrapping ctx with context.WithTimeout in their closure.
  • Returns *mcp.CallToolResult directly — no custom result types, the middleware serializes it as-is
  • Positional args become {"arg0": val, "arg1": val, ...} — same convention for both direct tool calls and call_tool()
  • parallel() supports optional concurrency cap via semaphore, propagates step limit to child threads, respects context cancellation
  • Data argument keys validated against globals to prevent shadowing builtins

Large PR Justification

  • New self-contained package (pkg/script/) with no modifications to existing files
  • ~60% of the diff is unit tests — splitting tests from implementation would make review harder, not easier
  • The internal package tree (core/, builtins/, conversions/) must ship together for the public API to compile

Generated with Claude Code

jerm-dro and others added 6 commits April 10, 2026 14:05
Introduce the pkg/script/ package with the public interface for vMCP
code mode. The Executor interface provides Execute() for running
Starlark scripts and ToolDescription() for tools/list injection. Tool
binds MCP tool metadata with a dispatch callback. Config holds execution
parameters (step limit, parallel concurrency cap, tool call timeout).

Part of #4742

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Core execution engine that wraps user scripts in a function body for
top-level return support, enforces step limits to prevent runaway
computation, and captures print() output as logs. This is the lowest
layer of the script package — it only knows about Starlark, not MCP.

Part of #4742

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Convert tests in description_test.go and execute_test.go to use
testify require and table-driven structure per project testing rules.

Part of #4742

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduce internal/conversions with three concerns:
- Bidirectional Go ↔ Starlark type conversion (preserving JSON number
  fidelity by promoting whole float64s to Int)
- MCP result parsing that handles structured content with/without the
  mcp-go SDK wrapper, text content as JSON or plain string, and error
  results
- Tool name sanitization for converting MCP names to valid Starlark
  identifiers

Part of #4742

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduce internal/builtins with:
- Tool callable factory supporting positional args (arg0, arg1, ...),
  keyword args, and mixed calling conventions
- call_tool("name", ...) for dispatching to tools by original name
  when the name is not a valid Starlark identifier
- parallel() builtin for concurrent fan-out with optional concurrency
  limit via semaphore
- BuildToolMap for collision-safe name → callback mapping

Part of #4742

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Connect the public Executor interface to internal/core, internal/builtins,
and internal/conversions. The executor builds a Starlark environment from
bound tools (each as a callable with positional+kwargs support), injects
data arguments, executes the script, and wraps the result as an
mcp.CallToolResult with optional log content.

Integration tests verify multi-tool scripts, loops/conditionals, parallel
fan-out, data arguments, call_tool dispatch, step limit enforcement, and
log capture.

Part of #4742

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@github-actions github-actions bot added the size/XL Extra large PR: 1000+ lines changed label Apr 10, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 10, 2026

Codecov Report

❌ Patch coverage is 82.57143% with 61 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.94%. Comparing base (443d2fa) to head (b32094b).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
pkg/script/internal/builtins/parallel.go 70.83% 15 Missing and 6 partials ⚠️
pkg/script/internal/conversions/starlark.go 79.76% 13 Missing and 4 partials ⚠️
pkg/script/internal/builtins/tools.go 76.92% 3 Missing and 3 partials ⚠️
pkg/script/internal/conversions/result.go 77.77% 4 Missing and 2 partials ⚠️
pkg/script/internal/core/execute.go 87.80% 4 Missing and 1 partial ⚠️
pkg/script/executor.go 94.11% 1 Missing and 1 partial ⚠️
pkg/script/internal/builtins/calltool.go 86.66% 1 Missing and 1 partial ⚠️
pkg/script/script.go 83.33% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4748      +/-   ##
==========================================
+ Coverage   68.78%   68.94%   +0.16%     
==========================================
  Files         516      527      +11     
  Lines       54307    54663     +356     
==========================================
+ Hits        37353    37688     +335     
+ Misses      14100    14096       -4     
- Partials     2854     2879      +25     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Fix correctness and safety issues found during code review:

- Fix data race: parallel() child threads now get their own log buffers
  instead of sharing the parent Print callback. Logs are merged after
  wg.Wait() completes.
- Propagate step limit to parallel() child threads so scripts cannot
  bypass CPU limits via parallel([lambda: infinite_loop()]).
- Thread context.Context into parallel() for cancellation support.
  Semaphore acquire and task launch check ctx.Done().
- Set thread.Load to return an error instead of nil (prevents panic if
  a script uses a load() statement).
- Reject data argument keys that shadow builtins or tool names instead
  of silently overwriting them.
- Return errors for invalid data argument types instead of silently
  dropping them.
- Simplify builtins package to expose a single Build() function that
  returns globals and reserved names. All other functions are unexported.
- Consolidate builtins tests into a single test file using table-driven
  tests.

Part of #4742

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 10, 2026
@jerm-dro
Copy link
Copy Markdown
Contributor Author

Implementation Plan (approved before coding)

PR 1: Introduce script engine package

Context

Story #4742 — Ship opt-in code mode for vMCP. Greenfield implementation (prototype discarded). PR 1 introduces the core pkg/script/ package with a clean public interface and internal Starlark implementation. Middleware (PR 2), optimizer integration (PR 3), and CRD wiring (PR 4) build on this.

Public API (pkg/script/)

// Executor runs Starlark scripts and describes the virtual tool.
type Executor interface {
    Execute(ctx context.Context, script string, data map[string]interface{}) (*mcp.CallToolResult, error)
    ToolDescription() string
}

// Tool bundles an MCP tool's metadata with a callback for invoking it.
type Tool struct {
    Name        string
    Description string
    Call        func(ctx context.Context, arguments map[string]interface{}) (*mcp.CallToolResult, error)
}

// Config holds script execution parameters.
type Config struct {
    StepLimit          uint64        // Max Starlark execution steps (default: 100,000)
    ParallelMax        int           // Max concurrent goroutines for parallel() (0 = unlimited)
    ToolCallTimeout    time.Duration // Per-tool-call timeout (0 = no timeout)
}

// New creates an Executor bound to the given tools and config.
// A nil config uses defaults.
func New(tools []Tool, cfg *Config) Executor

Tool calling conventions

Both direct tool invocation (my_tool(...)) and call_tool("name", ...) use identical argument handling:

  • my_tool(key=val){"key": val} (kwargs)
  • my_tool(val1, val2){"arg0": val1, "arg1": val2} (positional)
  • my_tool(val1, key=val2){"arg0": val1, "key": val2} (mixed)

call_tool is just an alternative for tools with names that aren't valid Starlark identifiers. Same argument handling, same Tool.Call dispatch.

Internal packages

pkg/script/internal/
├── core/           # Starlark thread, script wrapping, execution
├── builtins/       # parallel(), call_tool() Starlark builtins
└── conversions/    # Go ↔ Starlark type conversion, MCP result parsing

internal/core/

  • execute.goExecute(script, globals, stepLimit) (*ExecuteResult, error). Thread creation, def __main__() wrapping, print capture.
  • execute_test.go — step limit enforcement, basic execution, error propagation

internal/builtins/

  • parallel.goNewParallel(maxConcurrency int) *starlark.Builtin. WaitGroup + optional semaphore. 0 = unlimited.
  • calltool.goNewCallTool(ctx, toolMap) *starlark.Builtin. Dispatches by name from first positional arg.
  • tools.goMakeToolCallable(ctx, name, callFn) *starlark.Builtin. Shared arg handling (positional → argN, kwargs, mixed). Used by both direct tool callables and call_tool.
  • Tests for each

internal/conversions/

  • starlark.goGoToStarlark / StarlarkToGo bidirectional conversion
  • result.goParseToolResult(*mcp.CallToolResult) (interface{}, error) using mcp.AsTextContent()
  • toolname.goSanitizeName(name) string
  • Tests for each

How Execute() works

  1. Builds starlark.StringDict globals from bound tools:
    • Each Tool → Starlark callable via builtins.MakeToolCallable (shared arg handling)
    • call_tool() via builtins.NewCallTool (same arg handling, name-based dispatch)
    • parallel() via builtins.NewParallel(cfg.ParallelMax)
    • data entries → top-level Starlark variables
  2. Calls core.Execute(script, globals, cfg.StepLimit)
  3. Converts result value → Go via conversions.StarlarkToGo()
  4. Serializes to JSON, wraps in *mcp.CallToolResult (text content + optional logs item)

Commits

  1. Add public API typesscript.go with Executor, Tool, Config, New() stub. description.go with ToolDescription().
  2. Add internal/core — Starlark execution engine with step limit + tests.
  3. Add internal/conversions — Go ↔ Starlark type conversion, MCP result parsing, tool name sanitization + tests.
  4. Add internal/builtinsparallel(), call_tool(), tool callable factory with positional+kwargs support + tests.
  5. Wire executorexecutor.go implementation connecting public API → internals. Integration tests in script_test.go.

Tests

AC Test
unit: script can call multiple tools, loops/conditionals, return aggregated result script_test.go
unit: parallel() fans out concurrently, returns in order builtins/parallel_test.go
unit: step limit exceeded returns clear error core/execute_test.go
unit: result unwrapping handles multiple formats conversions/result_test.go

Additional: positional/kwargs arg handling (builtins/tools_test.go), type conversion round-trips (conversions/starlark_test.go), name sanitization (conversions/toolname_test.go).

Verification

task test          # unit tests pass
task lint-fix      # linting passes
task license-fix   # SPDX headers

@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 10, 2026
@github-actions
Copy link
Copy Markdown
Contributor

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

@github-actions github-actions bot dismissed their stale review April 10, 2026 22:57

Large PR justification has been provided. Thank you!

- Remove reserved set from builtins.Build() — caller checks globals
  keys directly to prevent data argument shadowing
- Consolidate integration tests into a single table-driven TestExecutor
- Add test for automatic JSON string → structured data conversion

Part of #4742

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 10, 2026
@jerm-dro jerm-dro marked this pull request as ready for review April 10, 2026 23:26
@jerm-dro jerm-dro requested a review from JAORMX as a code owner April 10, 2026 23:26
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 10, 2026
- Make per-tool-call timeout a caller responsibility: remove
  ToolCallTimeout from Config and document the contract on Tool.Call.
  Add test proving caller-set context deadlines are respected.
- Fix UTF-8 truncation in GenerateToolDescription: slice by rune count
  instead of byte index to avoid splitting multi-byte characters.
- Run go mod tidy to mark go.starlark.net as a direct dependency.

Part of #4742

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jerm-dro jerm-dro force-pushed the jerm-dro/script-engine-package branch from 4b8089a to b32094b Compare April 10, 2026 23:43
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant