Introduce script engine package for vMCP code mode by jerm-dro · Pull Request #4748 · stacklok/toolhive

jerm-dro · 2026-04-10T21:16:33Z

Summary

Agents today batch multi-tool workflows through sequential tool calls with model inference between each one, wasting round-trips and tokens. vMCP code mode (Ship opt-in code mode for vMCP #4742) lets agents submit a Starlark script that calls tools server-side and returns an aggregated result in one call.
This PR introduces the core pkg/script/ package — the Starlark execution engine, tool bridge, and builtins — with a clean public interface. The middleware (PR 2), optimizer integration (PR 3), and config/CRD wiring (PR 4) build on this foundation.

Part of #4742

Type of change

New feature

Test plan

Unit tests (task test)
Linting (task lint-fix)

Changes

File	Change
`pkg/script/script.go`	Public API: `Executor` interface, `Tool`, `Config`, `New()`
`pkg/script/executor.go`	Unexported executor wiring internals together
`pkg/script/description.go`	`GenerateToolDescription()` for `tools/list` injection
`pkg/script/internal/core/execute.go`	Starlark thread creation, script wrapping, step limit
`pkg/script/internal/conversions/starlark.go`	Bidirectional Go ↔ Starlark type conversion
`pkg/script/internal/conversions/result.go`	`ParseToolResult` for `mcp.CallToolResult` → Go values
`pkg/script/internal/conversions/toolname.go`	MCP tool name → valid Starlark identifier
`pkg/script/internal/builtins/builtins.go`	Single `Build()` entry point returning globals + reserved names
`pkg/script/internal/builtins/tools.go`	Tool callable factory with positional + kwargs support
`pkg/script/internal/builtins/calltool.go`	`call_tool("name", ...)` generic dispatch builtin
`pkg/script/internal/builtins/parallel.go`	`parallel()` concurrent fan-out with semaphore, context cancellation, child step limits

Does this introduce a user-facing change?

No. This PR introduces a library package with no wiring to the server yet. Code mode becomes user-facing when the middleware is added in PR 2.

Special notes for reviewers

This is a new, self-contained package with no integration into the existing server. The XL size label is expected — the diff is ~60% tests and the rest is a new internal package tree. No existing files are modified.

Review by commit — each commit introduces one layer, building bottom-up:

Public API types (Executor, Tool, Config)
internal/core — Starlark execution engine
internal/conversions — type conversion + MCP result parsing
internal/builtins — parallel(), call_tool(), tool callable factory
Executor wiring + integration tests
Test style fix (require + table-driven)
Self-review fixes (data race, step limit on child threads, context cancellation, load() panic, data shadowing, builtins simplification)
Address review comments (remove reserved set, table-driven tests, JSON→structured data test)
Address automated review (remove ToolCallTimeout from Config — caller responsibility via Tool.Call closure, UTF-8 safe truncation, go mod tidy)

Key design decisions:

Tool.Call is a closure — the middleware (PR 2) constructs these with callbacks that route through the middleware chain
Per-tool timeouts are a caller responsibility — the engine passes ctx through to Tool.Call without adding deadlines. Callers enforce timeouts by wrapping ctx with context.WithTimeout in their closure.
Returns *mcp.CallToolResult directly — no custom result types, the middleware serializes it as-is
Positional args become {"arg0": val, "arg1": val, ...} — same convention for both direct tool calls and call_tool()
parallel() supports optional concurrency cap via semaphore, propagates step limit to child threads, respects context cancellation
Data argument keys validated against globals to prevent shadowing builtins

Large PR Justification

New self-contained package (pkg/script/) with no modifications to existing files
~60% of the diff is unit tests — splitting tests from implementation would make review harder, not easier
The internal package tree (core/, builtins/, conversions/) must ship together for the public API to compile

Generated with Claude Code

Introduce the pkg/script/ package with the public interface for vMCP code mode. The Executor interface provides Execute() for running Starlark scripts and ToolDescription() for tools/list injection. Tool binds MCP tool metadata with a dispatch callback. Config holds execution parameters (step limit, parallel concurrency cap, tool call timeout). Part of #4742 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Core execution engine that wraps user scripts in a function body for top-level return support, enforces step limits to prevent runaway computation, and captures print() output as logs. This is the lowest layer of the script package — it only knows about Starlark, not MCP. Part of #4742 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Convert tests in description_test.go and execute_test.go to use testify require and table-driven structure per project testing rules. Part of #4742 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Introduce internal/conversions with three concerns: - Bidirectional Go ↔ Starlark type conversion (preserving JSON number fidelity by promoting whole float64s to Int) - MCP result parsing that handles structured content with/without the mcp-go SDK wrapper, text content as JSON or plain string, and error results - Tool name sanitization for converting MCP names to valid Starlark identifiers Part of #4742 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Introduce internal/builtins with: - Tool callable factory supporting positional args (arg0, arg1, ...), keyword args, and mixed calling conventions - call_tool("name", ...) for dispatching to tools by original name when the name is not a valid Starlark identifier - parallel() builtin for concurrent fan-out with optional concurrency limit via semaphore - BuildToolMap for collision-safe name → callback mapping Part of #4742 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Connect the public Executor interface to internal/core, internal/builtins, and internal/conversions. The executor builds a Starlark environment from bound tools (each as a callable with positional+kwargs support), injects data arguments, executes the script, and wraps the result as an mcp.CallToolResult with optional log content. Integration tests verify multi-tool scripts, loops/conditionals, parallel fan-out, data arguments, call_tool dispatch, step limit enforcement, and log capture. Part of #4742 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.

This review will be automatically dismissed once you add the justification section.

codecov · 2026-04-10T21:31:39Z

Codecov Report

❌ Patch coverage is 82.57143% with 61 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.94%. Comparing base (443d2fa) to head (b32094b).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
pkg/script/internal/builtins/parallel.go	70.83%	15 Missing and 6 partials ⚠️
pkg/script/internal/conversions/starlark.go	79.76%	13 Missing and 4 partials ⚠️
pkg/script/internal/builtins/tools.go	76.92%	3 Missing and 3 partials ⚠️
pkg/script/internal/conversions/result.go	77.77%	4 Missing and 2 partials ⚠️
pkg/script/internal/core/execute.go	87.80%	4 Missing and 1 partial ⚠️
pkg/script/executor.go	94.11%	1 Missing and 1 partial ⚠️
pkg/script/internal/builtins/calltool.go	86.66%	1 Missing and 1 partial ⚠️
pkg/script/script.go	83.33%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4748      +/-   ##
==========================================
+ Coverage   68.78%   68.94%   +0.16%     
==========================================
  Files         516      527      +11     
  Lines       54307    54663     +356     
==========================================
+ Hits        37353    37688     +335     
+ Misses      14100    14096       -4     
- Partials     2854     2879      +25

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Fix correctness and safety issues found during code review: - Fix data race: parallel() child threads now get their own log buffers instead of sharing the parent Print callback. Logs are merged after wg.Wait() completes. - Propagate step limit to parallel() child threads so scripts cannot bypass CPU limits via parallel([lambda: infinite_loop()]). - Thread context.Context into parallel() for cancellation support. Semaphore acquire and task launch check ctx.Done(). - Set thread.Load to return an error instead of nil (prevents panic if a script uses a load() statement). - Reject data argument keys that shadow builtins or tool names instead of silently overwriting them. - Return errors for invalid data argument types instead of silently dropping them. - Simplify builtins package to expose a single Build() function that returns globals and reserved names. All other functions are unexported. - Consolidate builtins tests into a single test file using table-driven tests. Part of #4742 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jerm-dro · 2026-04-10T21:52:16Z

Implementation Plan (approved before coding)

PR 1: Introduce script engine package

Context

Story #4742 — Ship opt-in code mode for vMCP. Greenfield implementation (prototype discarded). PR 1 introduces the core pkg/script/ package with a clean public interface and internal Starlark implementation. Middleware (PR 2), optimizer integration (PR 3), and CRD wiring (PR 4) build on this.

Public API (`pkg/script/`)

// Executor runs Starlark scripts and describes the virtual tool.
type Executor interface {
    Execute(ctx context.Context, script string, data map[string]interface{}) (*mcp.CallToolResult, error)
    ToolDescription() string
}

// Tool bundles an MCP tool's metadata with a callback for invoking it.
type Tool struct {
    Name        string
    Description string
    Call        func(ctx context.Context, arguments map[string]interface{}) (*mcp.CallToolResult, error)
}

// Config holds script execution parameters.
type Config struct {
    StepLimit          uint64        // Max Starlark execution steps (default: 100,000)
    ParallelMax        int           // Max concurrent goroutines for parallel() (0 = unlimited)
    ToolCallTimeout    time.Duration // Per-tool-call timeout (0 = no timeout)
}

// New creates an Executor bound to the given tools and config.
// A nil config uses defaults.
func New(tools []Tool, cfg *Config) Executor

Tool calling conventions

Both direct tool invocation (my_tool(...)) and call_tool("name", ...) use identical argument handling:

my_tool(key=val) → {"key": val} (kwargs)
my_tool(val1, val2) → {"arg0": val1, "arg1": val2} (positional)
my_tool(val1, key=val2) → {"arg0": val1, "key": val2} (mixed)

call_tool is just an alternative for tools with names that aren't valid Starlark identifiers. Same argument handling, same Tool.Call dispatch.

Internal packages

pkg/script/internal/
├── core/           # Starlark thread, script wrapping, execution
├── builtins/       # parallel(), call_tool() Starlark builtins
└── conversions/    # Go ↔ Starlark type conversion, MCP result parsing

`internal/core/`

execute.go — Execute(script, globals, stepLimit) (*ExecuteResult, error). Thread creation, def __main__() wrapping, print capture.
execute_test.go — step limit enforcement, basic execution, error propagation

`internal/builtins/`

parallel.go — NewParallel(maxConcurrency int) *starlark.Builtin. WaitGroup + optional semaphore. 0 = unlimited.
calltool.go — NewCallTool(ctx, toolMap) *starlark.Builtin. Dispatches by name from first positional arg.
tools.go — MakeToolCallable(ctx, name, callFn) *starlark.Builtin. Shared arg handling (positional → argN, kwargs, mixed). Used by both direct tool callables and call_tool.
Tests for each

`internal/conversions/`

starlark.go — GoToStarlark / StarlarkToGo bidirectional conversion
result.go — ParseToolResult(*mcp.CallToolResult) (interface{}, error) using mcp.AsTextContent()
toolname.go — SanitizeName(name) string
Tests for each

How Execute() works

Builds starlark.StringDict globals from bound tools:
- Each Tool → Starlark callable via builtins.MakeToolCallable (shared arg handling)
- call_tool() via builtins.NewCallTool (same arg handling, name-based dispatch)
- parallel() via builtins.NewParallel(cfg.ParallelMax)
- data entries → top-level Starlark variables
Calls core.Execute(script, globals, cfg.StepLimit)
Converts result value → Go via conversions.StarlarkToGo()
Serializes to JSON, wraps in *mcp.CallToolResult (text content + optional logs item)

Commits

Add public API types — script.go with Executor, Tool, Config, New() stub. description.go with ToolDescription().
Add internal/core — Starlark execution engine with step limit + tests.
Add internal/conversions — Go ↔ Starlark type conversion, MCP result parsing, tool name sanitization + tests.
Add internal/builtins — parallel(), call_tool(), tool callable factory with positional+kwargs support + tests.
Wire executor — executor.go implementation connecting public API → internals. Integration tests in script_test.go.

Tests

AC	Test
unit: script can call multiple tools, loops/conditionals, return aggregated result	`script_test.go`
unit: `parallel()` fans out concurrently, returns in order	`builtins/parallel_test.go`
unit: step limit exceeded returns clear error	`core/execute_test.go`
unit: result unwrapping handles multiple formats	`conversions/result_test.go`

Additional: positional/kwargs arg handling (builtins/tools_test.go), type conversion round-trips (conversions/starlark_test.go), name sanitization (conversions/toolname_test.go).

Verification

task test          # unit tests pass
task lint-fix      # linting passes
task license-fix   # SPDX headers

github-actions · 2026-04-10T22:57:55Z

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

Large PR justification has been provided. Thank you!

pkg/script/internal/builtins/builtins.go

pkg/script/script_test.go

- Remove reserved set from builtins.Build() — caller checks globals keys directly to prevent data argument shadowing - Consolidate integration tests into a single table-driven TestExecutor - Add test for automatic JSON string → structured data conversion Part of #4742 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

pkg/script/executor.go

pkg/script/description.go

go.mod

- Make per-tool-call timeout a caller responsibility: remove ToolCallTimeout from Config and document the contract on Tool.Call. Add test proving caller-set context deadlines are respected. - Fix UTF-8 truncation in GenerateToolDescription: slice by rune count instead of byte index to avoid splitting multi-byte characters. - Run go mod tidy to mark go.starlark.net as a direct dependency. Part of #4742 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jerm-dro and others added 6 commits April 10, 2026 14:05

Use require/table-driven tests per testing guidelines

2edb285

Convert tests in description_test.go and execute_test.go to use testify require and table-driven structure per project testing rules. Part of #4742 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions bot previously requested changes Apr 10, 2026

View reviewed changes

github-actions bot added the size/XL Extra large PR: 1000+ lines changed label Apr 10, 2026

github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 10, 2026

jerm-dro commented Apr 10, 2026

View reviewed changes

pkg/script/internal/builtins/builtins.go Outdated Show resolved Hide resolved

pkg/script/script_test.go Outdated Show resolved Hide resolved

pkg/script/script_test.go Show resolved Hide resolved

github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 10, 2026

jerm-dro marked this pull request as ready for review April 10, 2026 23:26

jerm-dro requested a review from JAORMX as a code owner April 10, 2026 23:26

claude bot reviewed Apr 10, 2026

View reviewed changes

pkg/script/executor.go Show resolved Hide resolved

pkg/script/description.go Outdated Show resolved Hide resolved

go.mod Outdated Show resolved Hide resolved

github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 10, 2026

jerm-dro force-pushed the jerm-dro/script-engine-package branch from 4b8089a to b32094b Compare April 10, 2026 23:43

github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 10, 2026

Conversation

jerm-dro commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type of change

Test plan

Changes

Does this introduce a user-facing change?

Special notes for reviewers

Large PR Justification

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Large PR Detected

How to unblock this PR:

Alternative:

Uh oh!

codecov bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jerm-dro commented Apr 10, 2026

PR 1: Introduce script engine package

Context

Public API (pkg/script/)

Tool calling conventions

Internal packages

internal/core/

internal/builtins/

internal/conversions/

How Execute() works

Commits

Tests

Verification

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jerm-dro commented Apr 10, 2026 •

edited

Loading

codecov bot commented Apr 10, 2026 •

edited

Loading

Public API (`pkg/script/`)

`internal/core/`

`internal/builtins/`

`internal/conversions/`