Python: feat: add agent-framework-monty (Monty-backed CodeAct provider) by eavanvalkenburg · Pull Request #5915 · microsoft/agent-framework

eavanvalkenburg · 2026-05-18T10:04:03Z

Motivation and Context

Inspired by anthonychu/maf-codeact-monty-python.

CodeAct currently has one backend in the Python repo: agent-framework-hyperlight. Hyperlight depends on a WASM micro-VM that is only published for linux/x86_64 and win/AMD64 with Python <3.14. macOS / arm64 / 3.14 users get no CodeAct story.

This PR adds a second backend — agent-framework-monty — that wraps pydantic-monty, a Rust-based Python interpreter, behind the same *CodeActProvider / *ExecuteCodeTool shape as Hyperlight, so users can swap providers with minimal churn. Monty runs cross-platform (no hypervisor or WASM backend), validates LLM-generated code against tool signatures with ty before any host tool fires, and supports Monty-native ResourceLimits for CPU / memory / output caps.

Description

New alpha package agent-framework-monty (python/packages/monty/).

Public API (mirrors Hyperlight names where they apply):

MontyCodeActProvider — ContextProvider that injects a run-scoped execute_code tool plus dynamic CodeAct instructions.
MontyExecuteCodeTool — standalone FunctionTool for mixed-tool agents or manual static wiring.
FileMount / FileMountInput / MountMode — public types; same first two FileMount fields as the Hyperlight version, with Monty-native mode ("read-only"/"read-write"/"overlay") and write_bytes_limit.

Constructor kwargs: tools, approval_mode, workspace_root (auto-mounted at /input, matching Hyperlight), file_mounts, plus a Monty-only resource_limits forwarding to Monty.start(limits=...).

Filesystem flow mirrors Hyperlight's /output capture: files written under any read-write mount during execution are scanned post-run and returned as Content.from_data(...) items with a path annotation. overlay mounts buffer writes in memory (nothing escapes the sandbox), read-only mounts reject writes.

Internals:

_monty_bridge.InlineCodeBridge ports the inline (non-durable) pause/resume bridge from the reference repo; dispatches direct typed tool calls + the call_tool fallback; forwards mount / limits to Monty.start(...).
generate_type_stubs builds per-tool stubs so ty rejects bad calls before any host tool fires.
Approval-mode propagation: if any registered host tool is always_require, the whole execute_code is gated.

Alpha-policy compliance (per python-package-management skill):

Added agent-framework-monty = { workspace = true } to root python/pyproject.toml.
Added row to python/PACKAGE_STATUS.md.
Added monty entry under Experimental in python/AGENTS.md.
Not added to core[all]; no agent_framework.monty lazy-loading shim — both deferred until beta promotion. Samples import from agent_framework_monty import ... directly.

Samples (3 sets):

samples/02-agents/context_providers/code_act/monty_code_act.py (provider pattern) + updated local README pointing at both providers.
samples/02-agents/tools/monty_code_interpreter/ — standalone + manual-wiring + README.
samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/ — full hosted-agent layout with a uv-based pyproject.toml + Dockerfile, Azure Monitor wiring (APPLICATIONINSIGHTS_CONNECTION_STRING + enable_instrumentation() in main.py), ENABLE_INSTRUMENTATION and ENABLE_SENSITIVE_DATA env vars. The alpha wheel is vendored into ./wheels/ (gitignored) via vendor-wheel.sh. New row added to the parent Responses-API README.

Tests:

28 hermetic unit tests stubbing pydantic_monty for speed and to keep CI working without the dep.
18 integration tests marked @pytest.mark.integration, auto-skipped when pydantic_monty is unimportable. They exercise the real Monty runtime: print round-trip, last-expression value, direct typed dispatch, call_tool fallback, async host tool, asyncio.gather parallelism, ty type-check rejection, OS-blocked-by-default, workspace_root read + write capture, read-only / overlay mount semantics, resource_limits.max_duration_secs aborting a busy loop, approval gating end-to-end, full Agent run with a scripted chat client.

Out of scope (deliberately, for the alpha)

Durable execution — the reference repo's DurableCodeBridge, register_durable_codeact, wait_for_external_event, and per-tool external-event approval. Tracked as a follow-up.
Custom OSAccess (fully synthetic VFS) — flagged as a future escape hatch in AGENTS.md.
URL allow-list — Monty has no networking primitive; documented pattern is "expose a fetch_url host tool with your own allow-list check".

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? No — additive only (new package + new sample folders).

New alpha package that wraps pydantic-monty (a Rust-based Python interpreter) behind the same CodeAct API surface as agent-framework-hyperlight, so users can swap providers with minimal code change. Public API (agent_framework_monty): - MontyCodeActProvider — ContextProvider that injects a run-scoped execute_code tool plus dynamic CodeAct instructions. - MontyExecuteCodeTool — standalone FunctionTool for mixed-tool agents or manual static wiring. - FileMount / FileMountInput / MountMode — public types mirroring the Hyperlight names, with Monty's mode (read-only/read-write/overlay) and write_bytes_limit on FileMount. Constructor kwargs (both classes) mirror Hyperlight where possible: tools, approval_mode, workspace_root, file_mounts; plus a Monty-only resource_limits forwarding ResourceLimits to Monty.start(). Filesystem flow: - workspace_root auto-mounts at /input (read-write), matching Hyperlight. - file_mounts accepts string shorthand, (host, mount) tuple, or FileMount with mode + write cap. - Files written under read-write mounts are scanned post-execution and returned as Content.from_data items (mirrors Hyperlight /output). - overlay mounts buffer writes in-memory; read-only mounts reject writes. Internals: - _monty_bridge.InlineCodeBridge ports the inline (non-durable) bridge from anthonychu/maf-codeact-monty-python; handles FunctionSnapshot / FutureSnapshot pause/resume, dispatches direct typed calls + the call_tool fallback, forwards mount/limits to Monty.start(...). - generate_type_stubs emits per-tool stubs so Monty's `ty` type-checker rejects bad calls before any host tool runs. Alpha-policy compliance (per python-package-management skill): - Added agent-framework-monty = { workspace = true } to root pyproject.toml. - Added row to python/PACKAGE_STATUS.md. - Added monty entry under Experimental in python/AGENTS.md. - NOT added to core[all]; NO agent_framework.monty lazy shim (deferred to beta promotion). Samples (three sets, import from agent_framework_monty directly): - samples/02-agents/context_providers/code_act/monty_code_act.py (provider pattern) + updated local README. - samples/02-agents/tools/monty_code_interpreter/ (standalone + manual-wiring + README). - samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/ (full hosted-agent layout with uv-based pyproject.toml + Dockerfile, Azure Monitor wiring via APPLICATIONINSIGHTS_CONNECTION_STRING + enable_instrumentation, ENABLE_INSTRUMENTATION and ENABLE_SENSITIVE_DATA env vars). The alpha wheel is vendored into ./wheels/ (gitignored) via vendor-wheel.sh; new row added to the parent Responses-API README. Tests: - 28 hermetic unit tests (stubbed pydantic_monty). - 18 integration tests marked @pytest.mark.integration, auto-skipped when pydantic_monty is unimportable; exercise the real Monty runtime: print round-trip, last-expression value, direct typed tool dispatch, call_tool fallback, async tool, asyncio.gather parallelism, ty type-check rejection, OS blocked by default, workspace_root read+write capture, read-only / overlay mount semantics, resource_limits.max_duration_secs abort, approval gating end-to-end, full Agent run with a scripted chat client. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

moonbox3 · 2026-05-18T10:07:07Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
packages/monty/agent_framework_monty
_execute_code_tool.py	262	76	70%	63, 88, 92, 94, 101, 108–112, 146, 148–149, 230, 273, 289, 293, 298, 349–350, 434, 437, 442–443, 461–477, 492–503, 522–537, 543–544, 547–551
_instructions.py	39	1	97%	40
_monty_bridge.py	195	49	74%	29–39, 48, 51, 53, 66, 82–90, 93, 127–131, 162, 165–166, 169–172, 190–191, 222, 240, 242, 258–260, 303, 308, 326–327
_provider.py	34	4	88%	69, 73, 77, 81
_types.py	11	0	100%
TOTAL	34986	4070	88%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
6948	30 💤	0 ❌	0 🔥	1m 53s ⏱️

Copilot

Pull request overview

Adds a new Python alpha package, agent-framework-monty, providing a Monty-backed CodeAct implementation (provider + standalone execute_code tool) alongside samples and test coverage, enabling a cross-platform CodeAct option beyond Hyperlight.

Changes:

Introduces agent_framework_monty package (provider/tool/types, instruction generation, Monty bridge, file-mount + output capture support).
Adds unit + integration tests for the Monty CodeAct surface, plus multiple samples (context provider, standalone tool, Foundry-hosted Responses agent).
Registers the new workspace package in Python packaging metadata and lockfiles, and updates package status/docs.

Reviewed changes

Copilot reviewed 28 out of 30 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
python/uv.lock	Adds workspace member + locks `pydantic-monty` and the new monty package.
python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/vendor-wheel.sh	Script to build/vendor the alpha wheel for offline `uv sync` in Docker.
python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/README.md	Explains the hosted Responses sample and how Monty CodeAct works.
python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/pyproject.toml	Sample-local `uv` project config (including vendored wheel source).
python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/main.py	Hosted agent entrypoint wiring Foundry client + Monty provider + telemetry.
python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/Dockerfile	Docker build using `uv sync` and a vendored wheel.
python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/agent.yaml	Hosted-agent config for local/Foundry runs (env vars + resources).
python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/agent.manifest.yaml	Foundry manifest describing the hosted Monty CodeAct sample.
python/samples/04-hosting/foundry-hosted-agents/README.md	Adds a row documenting the new Monty CodeAct hosted sample.
python/samples/02-agents/tools/monty_code_interpreter/README.md	Documents local standalone/manual-wiring Monty tool samples.
python/samples/02-agents/tools/monty_code_interpreter/monty_code_interpreter.py	Standalone `MontyExecuteCodeTool` sample.
python/samples/02-agents/tools/monty_code_interpreter/monty_code_interpreter_manual_wiring.py	Manual static wiring sample (instructions + sandbox tool).
python/samples/02-agents/context_providers/code_act/README.md	Expands CodeAct docs to cover both Hyperlight and Monty providers.
python/samples/02-agents/context_providers/code_act/monty_code_act.py	Provider-driven Monty CodeAct sample with middleware logging.
python/pyproject.toml	Adds `agent-framework-monty` to the Python workspace dependencies.
python/packages/monty/tests/monty/test_monty_codeact.py	Hermetic unit tests with a fake `pydantic_monty` runtime.
python/packages/monty/tests/monty/test_monty_codeact_integration.py	Integration tests exercising the real Monty runtime (skipped if unavailable).
python/packages/monty/README.md	Package readme describing the Monty CodeAct API and usage patterns.
python/packages/monty/pyproject.toml	Defines the new alpha distribution, deps, tooling config, and tasks.
python/packages/monty/LICENSE	MIT license for the new package.
python/packages/monty/AGENTS.md	Package-level agent/dev guide and architecture notes.
python/packages/monty/agent_framework_monty/py.typed	Marks the package as typed for type checkers.
python/packages/monty/agent_framework_monty/_types.py	Public file-mount types (mode, mount input shapes).
python/packages/monty/agent_framework_monty/_provider.py	`MontyCodeActProvider` implementation (run-scoped tool + instructions).
python/packages/monty/agent_framework_monty/_monty_bridge.py	Inline Monty execution bridge + stub generation for `ty`.
python/packages/monty/agent_framework_monty/_instructions.py	Dynamic instructions + `execute_code` description builders.
python/packages/monty/agent_framework_monty/_execute_code_tool.py	`MontyExecuteCodeTool` implementation (mounts, approval gating, output capture).
python/packages/monty/agent_framework_monty/init.py	Public exports and version wiring.
python/PACKAGE_STATUS.md	Registers `agent-framework-monty` as alpha.
python/AGENTS.md	Adds `monty` under Experimental packages list.

…IX path The shorthand string mount goes through _normalize_mount_path, which rewrites Windows drive letters like 'C:\\Users\\...' into '/C:/Users/...' (POSIX-style). The Windows CI runners surfaced this because tmp_path resolves to a backslashed Windows path; the test was comparing against the raw str(host_a) instead of the normalized form. Compare against _normalize_mount_path(str(host_a)) so the assertion is platform-independent. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions

Automated Code Review

Reviewers: 3 | Confidence: 90%

✓ Correctness

No actionable issues found in this dimension.

✓ Security Reliability

No actionable issues found in this dimension.

✗ Design Approach

I found one design issue: the Monty-specific instructions injected into provider/tool runs document a print-only result contract that contradicts the runtime behavior asserted by the new integration tests. That means the recommended provider path teaches the model to avoid a supported output path and can steer generations away from the API this PR actually introduces. The new Monty hosted-agent sample has a documentation/design mismatch that makes the advertised local-run path fail: its README defers to the shared hosted-agent setup flow, but this sample is packaged around pyproject.toml plus a vendored wheel and does not fit the parent requirements.txt install step.

Flagged Issues

The sample README tells readers to follow the parent local-run instructions (python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/README.md:60-62), but that flow installs dependencies with uv pip install -r requirements.txt (python/samples/04-hosting/foundry-hosted-agents/README.md:160-163). This sample instead declares dependencies in pyproject.toml and resolves agent-framework-monty from a vendored wheel (python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/pyproject.toml:1-20), so following the documented path fails before the host can start.

Automated review by eavanvalkenburg's agents

- _execute_code_tool docstring: clarify that the Monty backend supports scoped filesystem access via workspace_root / file_mounts (blocked by default). - _to_monty_mount: import pydantic_monty lazily through load_monty so missing-dependency errors surface as the same actionable RuntimeError the rest of the package raises (not a bare ImportError at module load). Renamed _load_monty -> load_monty for the same reason. - _python_type_repr: emit None for type(None) instead of Any, and normalize both typing.Union[...] and PEP-604 X | Y to PEP-604 syntax so Optional[X] / Union[..., None] / -> None signatures round-trip correctly through ty validation. Added a regression test. - _PrintCollector: track a running character count instead of recomputing sum(len(c) for c in self.chunks) per callback. Eliminates the O(n^2) cost on print-heavy code. - Instructions: mention that the value of the final expression is also returned alongside captured stdout (matches actual behavior). - 11_monty_codeact Dockerfile: pin ghcr.io/astral-sh/uv to 0.11.6 instead of :latest for reproducible builds. - 11_monty_codeact README: replace the bare "see parent README" pointer with sample-specific steps (./vendor-wheel.sh + uv sync + uv run), since the sample uses pyproject.toml + a vendored wheel rather than requirements.txt. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…PyPI Drop the vendored-wheel scaffolding now that agent-framework-monty is on PyPI as an alpha (1.0.0a*) release: - pyproject.toml: remove [tool.uv.sources] override; keep [tool.uv] prerelease = "allow" so uv pulls the alpha automatically. - Dockerfile: drop the COPY wheels/ step. - README: drop the ./vendor-wheel.sh setup step and the not-yet-on-PyPI warning. - Delete vendor-wheel.sh and the gitignored wheels/ directory. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…k escape Same class of issue as the MSRC-reported Hyperlight finding: the post-execution capture walked workspace_root with Path.rglob() + is_file() + read_bytes() - all of which follow symlinks. An attacker who controls the workspace (cloned repo, extracted archive, shared workspace) could pre-place `workspace/leak.txt -> /etc/passwd` or `workspace/outside_dir -> /etc/` and have host files surface as captured Content items. Monty's mount layer already rejects symlink reads from inside the sandbox across all three modes (verified empirically), so the runtime path was safe. This commit closes the post-execution scan path. Changes: - New `_iter_real_files(root)` walker that uses iterdir() + is_symlink() to skip symlinks at every directory level and yields only real files. Replaces the previous `host_root.rglob("*")` calls in both `_snapshot_writable_mounts` and `_capture_written_files`. - Use `Path.lstat()` instead of `Path.stat()` so size/mtime can never be taken from a symlink target. - Three new integration tests reproducing the MSRC attack shape against the workspace_root flow: symlink-to-file outside workspace, symlink-to-directory outside workspace, and a guard ensuring legitimate sandbox writes are still captured when symlinks are present. Per user request, hyperlight is untouched in this commit (separate fix). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Apply the same Windows-CI safety guard as the hyperlight fix in PR microsoft#5919: the three symlink integration tests create symlinks via Path.symlink_to(), which fails with OSError / NotImplementedError on unprivileged Windows runners. Add a local _symlinks_supported helper (mirroring the one in packages/core/tests/core/test_skills.py) and pytest.skip when symlinks aren't available, so the tests no longer fail for environment reasons. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- _invoke_tool: drop the inspect.iscoroutinefunction(...) branch and always `await self.tool_map[name](**kwargs)`. Every entry in tool_map is `partial(FunctionTool.invoke, skip_parsing=True)` and FunctionTool.invoke is `async def`, so the branching was dead code - and on Python versions affected by cpython#98590, iscoroutinefunction(partial(bound_async_method, ...)) returns False, causing the bridge to take the asyncio.to_thread path, return an unawaited coroutine, and surface it as a JSON-serialization failure for every tool call. Added a regression test test_invoke_tool_awaits_partial_wrapped_async_method. - generate_type_stubs: skip tools whose name is not a valid Python identifier or is a Python keyword. FunctionTool.name has no upstream validation, so a name like "weird-name" produced a syntax error in the stubs and a name like "broken\n pass\nasync def injected" would inject arbitrary stub source. Non-identifier names stay reachable via `call_tool("weird-name", ...)` at runtime; they just don't get type-checked stubs. Added regression test test_generate_type_stubs_skips_non_identifier_tool_names. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 18, 2026 10:04

moonbox3 added documentation Improvements or additions to documentation python labels May 18, 2026

Copilot started reviewing on behalf of eavanvalkenburg May 18, 2026 10:05 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

github-actions Bot reviewed May 18, 2026

View reviewed changes

Comment thread python/samples/04-hosting/foundry-hosted-agents/responses/11_monty_codeact/README.md Outdated

eavanvalkenburg and others added 4 commits May 18, 2026 12:18

eavanvalkenburg self-assigned this May 18, 2026

eavanvalkenburg added this to Agent Framework May 18, 2026

eavanvalkenburg moved this to In Progress in Agent Framework May 18, 2026

moonbox3 reviewed May 19, 2026

View reviewed changes

Comment thread python/packages/monty/agent_framework_monty/_monty_bridge.py

Comment thread python/packages/monty/agent_framework_monty/_monty_bridge.py

giles17 approved these changes May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: feat: add agent-framework-monty (Monty-backed CodeAct provider)#5915

Python: feat: add agent-framework-monty (Monty-backed CodeAct provider)#5915
eavanvalkenburg wants to merge 7 commits into
microsoft:mainfrom
eavanvalkenburg:monty_codeact_provider

eavanvalkenburg commented May 18, 2026

Uh oh!

moonbox3 commented May 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

eavanvalkenburg commented May 18, 2026

Motivation and Context

Description

Out of scope (deliberately, for the alpha)

Contribution Checklist

Uh oh!

moonbox3 commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python Unit Test Overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Automated Code Review

✓ Correctness

✓ Security Reliability

✗ Design Approach

Flagged Issues

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

moonbox3 commented May 18, 2026 •

edited

Loading