feat(sandbox): grade pypi/uvx MCP servers under the Docker sandbox#79
Merged
Conversation
Only npm refs could be containerized; a pypi ref under
LITMUS_STDIO_ISOLATION=docker failed immediately ("unsupported for pypi"),
so Python MCP servers were ungradeable in the safe execution mode.
Stage a pypi package the same way as npm, with no target code running during
staging: install WHEELS ONLY (`uv pip install --only-binary=:all:`) into a
venv under /stage — the pypi analog of npm's `--ignore-scripts`. There is no
build-hook equivalent, so a package with no wheel fails closed rather than
building from an sdist. Resolve the console-script entry, version, and
declared egress offline with the venv python (`--network none`), then launch
the entry with that interpreter. Both container paths — the connect path that
grades C-01/C-03/C-04 and the C-02 egress-capture path — get the pypi branch;
the sinkhole/capture machinery is interpreter-agnostic and unchanged.
pypi authors declare egress via a wheel-shippable `polygraph.egress`
entry-point group (pyproject's `[tool.polygraph]` isn't shipped in wheels).
The image gains uv + python3. Same rubric, so METHODOLOGY_VERSION is
unchanged (litmus-v10) — a pypi server is graded exactly as an npm one.
gVisor runtime parity is preserved across install, resolve, and run.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011FW3vDMau8UYnNWCanfT4k
RubenSousaDinis
added a commit
that referenced
this pull request
Jul 1, 2026
Ships #78 (C-02 egress D rationale is now actionable — names the undeclared host(s) and points at polygraph.egress; messaging only, grades unchanged) and #79 (pypi/uvx MCP servers are now gradeable under the Docker sandbox: wheels-only venv staging, offline resolve, venv-python launch). Methodology version is unchanged (litmus-v10) — a pypi server is graded by the same rubric. Claude-Session: https://claude.ai/code/session_011FW3vDMau8UYnNWCanfT4k Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Only npm refs could be containerized. A
pypi/ref underLITMUS_STDIO_ISOLATION=dockerfailed immediately (docker isolation is unsupported for pypi refs), so Python MCP servers were ungradeable in the safe (sandboxed) execution mode — grading them otherwise means running untrusted Python on the host.How
Stage a pypi package the same way as npm, with no target code running during staging:
uv pip install --only-binary=:all:into a venv under/stage. This is the pypi analog of npm's--ignore-scripts; there is no build-hook equivalent, so a package with no wheel fails closed rather than building from an sdist (which would execute a PEP 517 backend).--network none, non-root) reads the console-script entry, version, and declared egress viaimportlib.metadata, printing the same{bins, version, declaredEgress}JSON the npm resolver does (parser reused).--entrypoint /stage/venv/bin/python). Both container paths — the connect path that grades C-01/C-03/C-04 and the C-02 egress-capture path — get the pypi branch. The sinkhole/host-DNAT capture is interpreter-agnostic and unchanged.Egress declaration for pypi uses a wheel-shippable
[project.entry-points."polygraph.egress"]group (pyproject's[tool.polygraph]isn't shipped inside wheels, so it can't be read offline).Invariants preserved
--runtimeparity across install + resolve + run.METHODOLOGY_VERSIONunchanged (litmus-v10); a pypi server is graded exactly as an npm one.github/explicit-command refs still fail closed under isolation.Verification
pnpm -r typecheckclean; probes 369 passed / 7 skipped, cli 104 passed.stagePypiInstallArgs(wheels-only,---guarded spec, runtime parity),pypiResolverRunArgs(--network none, non-root), the Python resolver's JSON shape,egressTargetArgs/containerLaunchinterpreter, and the updated isolation gate (github throws; npm/pypi proceed).pypi-live.test.ts, opt-in viaLITMUS_DOCKER_TESTS=1).pypi/mcp-server-timenow grades A (C-02 actually ran in the sandbox, not skipped), where before it errored;npm/@modelcontextprotocol/server-filesystemstill grades A (no regression).🤖 Generated with Claude Code