Python: feat: add agent-framework-monty (Monty-backed CodeAct provider)#5915
Python: feat: add agent-framework-monty (Monty-backed CodeAct provider)#5915eavanvalkenburg wants to merge 7 commits into
Conversation
New alpha package that wraps pydantic-monty (a Rust-based Python
interpreter) behind the same CodeAct API surface as
agent-framework-hyperlight, so users can swap providers with minimal
code change.
Public API (agent_framework_monty):
- MontyCodeActProvider — ContextProvider that injects a run-scoped
execute_code tool plus dynamic CodeAct instructions.
- MontyExecuteCodeTool — standalone FunctionTool for mixed-tool agents
or manual static wiring.
- FileMount / FileMountInput / MountMode — public types mirroring the
Hyperlight names, with Monty's mode (read-only/read-write/overlay)
and write_bytes_limit on FileMount.
Constructor kwargs (both classes) mirror Hyperlight where possible:
tools, approval_mode, workspace_root, file_mounts; plus a Monty-only
resource_limits forwarding ResourceLimits to Monty.start().
Filesystem flow:
- workspace_root auto-mounts at /input (read-write), matching Hyperlight.
- file_mounts accepts string shorthand, (host, mount) tuple, or
FileMount with mode + write cap.
- Files written under read-write mounts are scanned post-execution and
returned as Content.from_data items (mirrors Hyperlight /output).
- overlay mounts buffer writes in-memory; read-only mounts reject writes.
Internals:
- _monty_bridge.InlineCodeBridge ports the inline (non-durable) bridge
from anthonychu/maf-codeact-monty-python; handles FunctionSnapshot /
FutureSnapshot pause/resume, dispatches direct typed calls + the
call_tool fallback, forwards mount/limits to Monty.start(...).
- generate_type_stubs emits per-tool stubs so Monty's `ty` type-checker
rejects bad calls before any host tool runs.
Alpha-policy compliance (per python-package-management skill):
- Added agent-framework-monty = { workspace = true } to root
pyproject.toml.
- Added row to python/PACKAGE_STATUS.md.
- Added monty entry under Experimental in python/AGENTS.md.
- NOT added to core[all]; NO agent_framework.monty lazy shim (deferred
to beta promotion).
Samples (three sets, import from agent_framework_monty directly):
- samples/02-agents/context_providers/code_act/monty_code_act.py
(provider pattern) + updated local README.
- samples/02-agents/tools/monty_code_interpreter/ (standalone +
manual-wiring + README).
- samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/
(full hosted-agent layout with uv-based pyproject.toml + Dockerfile,
Azure Monitor wiring via APPLICATIONINSIGHTS_CONNECTION_STRING +
enable_instrumentation, ENABLE_INSTRUMENTATION and
ENABLE_SENSITIVE_DATA env vars). The alpha wheel is vendored into
./wheels/ (gitignored) via vendor-wheel.sh; new row added to the
parent Responses-API README.
Tests:
- 28 hermetic unit tests (stubbed pydantic_monty).
- 18 integration tests marked @pytest.mark.integration, auto-skipped
when pydantic_monty is unimportable; exercise the real Monty
runtime: print round-trip, last-expression value, direct typed
tool dispatch, call_tool fallback, async tool, asyncio.gather
parallelism, ty type-check rejection, OS blocked by default,
workspace_root read+write capture, read-only / overlay mount
semantics, resource_limits.max_duration_secs abort, approval
gating end-to-end, full Agent run with a scripted chat client.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Pull request overview
Adds a new Python alpha package, agent-framework-monty, providing a Monty-backed CodeAct implementation (provider + standalone execute_code tool) alongside samples and test coverage, enabling a cross-platform CodeAct option beyond Hyperlight.
Changes:
- Introduces
agent_framework_montypackage (provider/tool/types, instruction generation, Monty bridge, file-mount + output capture support). - Adds unit + integration tests for the Monty CodeAct surface, plus multiple samples (context provider, standalone tool, Foundry-hosted Responses agent).
- Registers the new workspace package in Python packaging metadata and lockfiles, and updates package status/docs.
Reviewed changes
Copilot reviewed 28 out of 30 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| python/uv.lock | Adds workspace member + locks pydantic-monty and the new monty package. |
| python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/vendor-wheel.sh | Script to build/vendor the alpha wheel for offline uv sync in Docker. |
| python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/README.md | Explains the hosted Responses sample and how Monty CodeAct works. |
| python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/pyproject.toml | Sample-local uv project config (including vendored wheel source). |
| python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/main.py | Hosted agent entrypoint wiring Foundry client + Monty provider + telemetry. |
| python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/Dockerfile | Docker build using uv sync and a vendored wheel. |
| python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/agent.yaml | Hosted-agent config for local/Foundry runs (env vars + resources). |
| python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/agent.manifest.yaml | Foundry manifest describing the hosted Monty CodeAct sample. |
| python/samples/04-hosting/foundry-hosted-agents/README.md | Adds a row documenting the new Monty CodeAct hosted sample. |
| python/samples/02-agents/tools/monty_code_interpreter/README.md | Documents local standalone/manual-wiring Monty tool samples. |
| python/samples/02-agents/tools/monty_code_interpreter/monty_code_interpreter.py | Standalone MontyExecuteCodeTool sample. |
| python/samples/02-agents/tools/monty_code_interpreter/monty_code_interpreter_manual_wiring.py | Manual static wiring sample (instructions + sandbox tool). |
| python/samples/02-agents/context_providers/code_act/README.md | Expands CodeAct docs to cover both Hyperlight and Monty providers. |
| python/samples/02-agents/context_providers/code_act/monty_code_act.py | Provider-driven Monty CodeAct sample with middleware logging. |
| python/pyproject.toml | Adds agent-framework-monty to the Python workspace dependencies. |
| python/packages/monty/tests/monty/test_monty_codeact.py | Hermetic unit tests with a fake pydantic_monty runtime. |
| python/packages/monty/tests/monty/test_monty_codeact_integration.py | Integration tests exercising the real Monty runtime (skipped if unavailable). |
| python/packages/monty/README.md | Package readme describing the Monty CodeAct API and usage patterns. |
| python/packages/monty/pyproject.toml | Defines the new alpha distribution, deps, tooling config, and tasks. |
| python/packages/monty/LICENSE | MIT license for the new package. |
| python/packages/monty/AGENTS.md | Package-level agent/dev guide and architecture notes. |
| python/packages/monty/agent_framework_monty/py.typed | Marks the package as typed for type checkers. |
| python/packages/monty/agent_framework_monty/_types.py | Public file-mount types (mode, mount input shapes). |
| python/packages/monty/agent_framework_monty/_provider.py | MontyCodeActProvider implementation (run-scoped tool + instructions). |
| python/packages/monty/agent_framework_monty/_monty_bridge.py | Inline Monty execution bridge + stub generation for ty. |
| python/packages/monty/agent_framework_monty/_instructions.py | Dynamic instructions + execute_code description builders. |
| python/packages/monty/agent_framework_monty/_execute_code_tool.py | MontyExecuteCodeTool implementation (mounts, approval gating, output capture). |
| python/packages/monty/agent_framework_monty/init.py | Public exports and version wiring. |
| python/PACKAGE_STATUS.md | Registers agent-framework-monty as alpha. |
| python/AGENTS.md | Adds monty under Experimental packages list. |
…IX path The shorthand string mount goes through _normalize_mount_path, which rewrites Windows drive letters like 'C:\\Users\\...' into '/C:/Users/...' (POSIX-style). The Windows CI runners surfaced this because tmp_path resolves to a backslashed Windows path; the test was comparing against the raw str(host_a) instead of the normalized form. Compare against _normalize_mount_path(str(host_a)) so the assertion is platform-independent. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Automated Code Review
Reviewers: 3 | Confidence: 90%
✓ Correctness
No actionable issues found in this dimension.
✓ Security Reliability
No actionable issues found in this dimension.
✗ Design Approach
I found one design issue: the Monty-specific instructions injected into provider/tool runs document a print-only result contract that contradicts the runtime behavior asserted by the new integration tests. That means the recommended provider path teaches the model to avoid a supported output path and can steer generations away from the API this PR actually introduces. The new Monty hosted-agent sample has a documentation/design mismatch that makes the advertised local-run path fail: its README defers to the shared hosted-agent setup flow, but this sample is packaged around
pyproject.tomlplus a vendored wheel and does not fit the parentrequirements.txtinstall step.
Flagged Issues
- The sample README tells readers to follow the parent local-run instructions (
python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/README.md:60-62), but that flow installs dependencies withuv pip install -r requirements.txt(python/samples/04-hosting/foundry-hosted-agents/README.md:160-163). This sample instead declares dependencies inpyproject.tomland resolvesagent-framework-montyfrom a vendored wheel (python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/pyproject.toml:1-20), so following the documented path fails before the host can start.
Automated review by eavanvalkenburg's agents
- _execute_code_tool docstring: clarify that the Monty backend supports scoped filesystem access via workspace_root / file_mounts (blocked by default). - _to_monty_mount: import pydantic_monty lazily through load_monty so missing-dependency errors surface as the same actionable RuntimeError the rest of the package raises (not a bare ImportError at module load). Renamed _load_monty -> load_monty for the same reason. - _python_type_repr: emit None for type(None) instead of Any, and normalize both typing.Union[...] and PEP-604 X | Y to PEP-604 syntax so Optional[X] / Union[..., None] / -> None signatures round-trip correctly through ty validation. Added a regression test. - _PrintCollector: track a running character count instead of recomputing sum(len(c) for c in self.chunks) per callback. Eliminates the O(n^2) cost on print-heavy code. - Instructions: mention that the value of the final expression is also returned alongside captured stdout (matches actual behavior). - 11_monty_codeact Dockerfile: pin ghcr.io/astral-sh/uv to 0.11.6 instead of :latest for reproducible builds. - 11_monty_codeact README: replace the bare "see parent README" pointer with sample-specific steps (./vendor-wheel.sh + uv sync + uv run), since the sample uses pyproject.toml + a vendored wheel rather than requirements.txt. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…PyPI Drop the vendored-wheel scaffolding now that agent-framework-monty is on PyPI as an alpha (1.0.0a*) release: - pyproject.toml: remove [tool.uv.sources] override; keep [tool.uv] prerelease = "allow" so uv pulls the alpha automatically. - Dockerfile: drop the COPY wheels/ step. - README: drop the ./vendor-wheel.sh setup step and the not-yet-on-PyPI warning. - Delete vendor-wheel.sh and the gitignored wheels/ directory. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…k escape
Same class of issue as the MSRC-reported Hyperlight finding: the
post-execution capture walked workspace_root with Path.rglob() +
is_file() + read_bytes() - all of which follow symlinks. An attacker
who controls the workspace (cloned repo, extracted archive, shared
workspace) could pre-place `workspace/leak.txt -> /etc/passwd` or
`workspace/outside_dir -> /etc/` and have host files surface as
captured Content items.
Monty's mount layer already rejects symlink reads from inside the
sandbox across all three modes (verified empirically), so the runtime
path was safe. This commit closes the post-execution scan path.
Changes:
- New `_iter_real_files(root)` walker that uses iterdir() +
is_symlink() to skip symlinks at every directory level and yields
only real files. Replaces the previous `host_root.rglob("*")` calls
in both `_snapshot_writable_mounts` and `_capture_written_files`.
- Use `Path.lstat()` instead of `Path.stat()` so size/mtime can never
be taken from a symlink target.
- Three new integration tests reproducing the MSRC attack shape
against the workspace_root flow: symlink-to-file outside workspace,
symlink-to-directory outside workspace, and a guard ensuring
legitimate sandbox writes are still captured when symlinks are
present.
Per user request, hyperlight is untouched in this commit (separate fix).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Apply the same Windows-CI safety guard as the hyperlight fix in PR microsoft#5919: the three symlink integration tests create symlinks via Path.symlink_to(), which fails with OSError / NotImplementedError on unprivileged Windows runners. Add a local _symlinks_supported helper (mirroring the one in packages/core/tests/core/test_skills.py) and pytest.skip when symlinks aren't available, so the tests no longer fail for environment reasons. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- _invoke_tool: drop the inspect.iscoroutinefunction(...) branch and
always `await self.tool_map[name](**kwargs)`. Every entry in
tool_map is `partial(FunctionTool.invoke, skip_parsing=True)` and
FunctionTool.invoke is `async def`, so the branching was dead code -
and on Python versions affected by cpython#98590,
iscoroutinefunction(partial(bound_async_method, ...)) returns False,
causing the bridge to take the asyncio.to_thread path, return an
unawaited coroutine, and surface it as a JSON-serialization failure
for every tool call. Added a regression test
test_invoke_tool_awaits_partial_wrapped_async_method.
- generate_type_stubs: skip tools whose name is not a valid Python
identifier or is a Python keyword. FunctionTool.name has no upstream
validation, so a name like "weird-name" produced a syntax error in
the stubs and a name like "broken\n pass\nasync def injected"
would inject arbitrary stub source. Non-identifier names stay
reachable via `call_tool("weird-name", ...)` at runtime; they just
don't get type-checked stubs. Added regression test
test_generate_type_stubs_skips_non_identifier_tool_names.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation and Context
Inspired by anthonychu/maf-codeact-monty-python.
CodeAct currently has one backend in the Python repo:
agent-framework-hyperlight. Hyperlight depends on a WASM micro-VM that is only published forlinux/x86_64andwin/AMD64with Python<3.14. macOS / arm64 / 3.14 users get no CodeAct story.This PR adds a second backend —
agent-framework-monty— that wraps pydantic-monty, a Rust-based Python interpreter, behind the same*CodeActProvider/*ExecuteCodeToolshape as Hyperlight, so users can swap providers with minimal churn. Monty runs cross-platform (no hypervisor or WASM backend), validates LLM-generated code against tool signatures withtybefore any host tool fires, and supports Monty-nativeResourceLimitsfor CPU / memory / output caps.Description
New alpha package
agent-framework-monty(python/packages/monty/).Public API (mirrors Hyperlight names where they apply):
MontyCodeActProvider—ContextProviderthat injects a run-scopedexecute_codetool plus dynamic CodeAct instructions.MontyExecuteCodeTool— standaloneFunctionToolfor mixed-tool agents or manual static wiring.FileMount/FileMountInput/MountMode— public types; same first twoFileMountfields as the Hyperlight version, with Monty-nativemode("read-only"/"read-write"/"overlay") andwrite_bytes_limit.Constructor kwargs:
tools,approval_mode,workspace_root(auto-mounted at/input, matching Hyperlight),file_mounts, plus a Monty-onlyresource_limitsforwarding toMonty.start(limits=...).Filesystem flow mirrors Hyperlight's
/outputcapture: files written under anyread-writemount during execution are scanned post-run and returned asContent.from_data(...)items with apathannotation.overlaymounts buffer writes in memory (nothing escapes the sandbox),read-onlymounts reject writes.Internals:
_monty_bridge.InlineCodeBridgeports the inline (non-durable) pause/resume bridge from the reference repo; dispatches direct typed tool calls + thecall_toolfallback; forwardsmount/limitstoMonty.start(...).generate_type_stubsbuilds per-tool stubs sotyrejects bad calls before any host tool fires.always_require, the wholeexecute_codeis gated.Alpha-policy compliance (per
python-package-managementskill):agent-framework-monty = { workspace = true }to rootpython/pyproject.toml.python/PACKAGE_STATUS.md.montyentry under Experimental inpython/AGENTS.md.core[all]; noagent_framework.montylazy-loading shim — both deferred until beta promotion. Samples importfrom agent_framework_monty import ...directly.Samples (3 sets):
samples/02-agents/context_providers/code_act/monty_code_act.py(provider pattern) + updated local README pointing at both providers.samples/02-agents/tools/monty_code_interpreter/— standalone + manual-wiring + README.samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/— full hosted-agent layout with a uv-basedpyproject.toml+ Dockerfile, Azure Monitor wiring (APPLICATIONINSIGHTS_CONNECTION_STRING+enable_instrumentation()inmain.py),ENABLE_INSTRUMENTATIONandENABLE_SENSITIVE_DATAenv vars. The alpha wheel is vendored into./wheels/(gitignored) viavendor-wheel.sh. New row added to the parent Responses-API README.Tests:
pydantic_montyfor speed and to keep CI working without the dep.@pytest.mark.integration, auto-skipped whenpydantic_montyis unimportable. They exercise the real Monty runtime: print round-trip, last-expression value, direct typed dispatch,call_toolfallback, async host tool,asyncio.gatherparallelism,tytype-check rejection, OS-blocked-by-default,workspace_rootread + write capture,read-only/overlaymount semantics,resource_limits.max_duration_secsaborting a busy loop, approval gating end-to-end, fullAgentrun with a scripted chat client.Out of scope (deliberately, for the alpha)
DurableCodeBridge,register_durable_codeact,wait_for_external_event, and per-tool external-event approval. Tracked as a follow-up.OSAccess(fully synthetic VFS) — flagged as a future escape hatch inAGENTS.md.fetch_urlhost tool with your own allow-list check".Contribution Checklist