Skip to content

feat(sandbox): grade pypi/uvx MCP servers under the Docker sandbox#79

Merged
RubenSousaDinis merged 1 commit into
mainfrom
feat/pypi-docker-sandbox
Jul 1, 2026
Merged

feat(sandbox): grade pypi/uvx MCP servers under the Docker sandbox#79
RubenSousaDinis merged 1 commit into
mainfrom
feat/pypi-docker-sandbox

Conversation

@RubenSousaDinis

Copy link
Copy Markdown
Member

What

Only npm refs could be containerized. A pypi/ ref under LITMUS_STDIO_ISOLATION=docker failed immediately (docker isolation is unsupported for pypi refs), so Python MCP servers were ungradeable in the safe (sandboxed) execution mode — grading them otherwise means running untrusted Python on the host.

How

Stage a pypi package the same way as npm, with no target code running during staging:

  • Install wheels-onlyuv pip install --only-binary=:all: into a venv under /stage. This is the pypi analog of npm's --ignore-scripts; there is no build-hook equivalent, so a package with no wheel fails closed rather than building from an sdist (which would execute a PEP 517 backend).
  • Resolve offline — the venv python (--network none, non-root) reads the console-script entry, version, and declared egress via importlib.metadata, printing the same {bins, version, declaredEgress} JSON the npm resolver does (parser reused).
  • Launch — the entry runs with the venv python interpreter (--entrypoint /stage/venv/bin/python). Both container paths — the connect path that grades C-01/C-03/C-04 and the C-02 egress-capture path — get the pypi branch. The sinkhole/host-DNAT capture is interpreter-agnostic and unchanged.

Egress declaration for pypi uses a wheel-shippable [project.entry-points."polygraph.egress"] group (pyproject's [tool.polygraph] isn't shipped inside wheels, so it can't be read offline).

Invariants preserved

  • No target code runs during staging (wheels-only, fail-closed on sdist).
  • gVisor --runtime parity across install + resolve + run.
  • Same grading rubric → METHODOLOGY_VERSION unchanged (litmus-v10); a pypi server is graded exactly as an npm one. github/explicit-command refs still fail closed under isolation.

Verification

  • pnpm -r typecheck clean; probes 369 passed / 7 skipped, cli 104 passed.
  • New unit tests: stagePypiInstallArgs (wheels-only, ---guarded spec, runtime parity), pypiResolverRunArgs (--network none, non-root), the Python resolver's JSON shape, egressTargetArgs/containerLaunch interpreter, and the updated isolation gate (github throws; npm/pypi proceed).
  • New Docker-gated live test (pypi-live.test.ts, opt-in via LITMUS_DOCKER_TESTS=1).
  • End-to-end (Docker up): pypi/mcp-server-time now grades A (C-02 actually ran in the sandbox, not skipped), where before it errored; npm/@modelcontextprotocol/server-filesystem still grades A (no regression).

🤖 Generated with Claude Code

Only npm refs could be containerized; a pypi ref under
LITMUS_STDIO_ISOLATION=docker failed immediately ("unsupported for pypi"),
so Python MCP servers were ungradeable in the safe execution mode.

Stage a pypi package the same way as npm, with no target code running during
staging: install WHEELS ONLY (`uv pip install --only-binary=:all:`) into a
venv under /stage — the pypi analog of npm's `--ignore-scripts`. There is no
build-hook equivalent, so a package with no wheel fails closed rather than
building from an sdist. Resolve the console-script entry, version, and
declared egress offline with the venv python (`--network none`), then launch
the entry with that interpreter. Both container paths — the connect path that
grades C-01/C-03/C-04 and the C-02 egress-capture path — get the pypi branch;
the sinkhole/capture machinery is interpreter-agnostic and unchanged.

pypi authors declare egress via a wheel-shippable `polygraph.egress`
entry-point group (pyproject's `[tool.polygraph]` isn't shipped in wheels).

The image gains uv + python3. Same rubric, so METHODOLOGY_VERSION is
unchanged (litmus-v10) — a pypi server is graded exactly as an npm one.
gVisor runtime parity is preserved across install, resolve, and run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011FW3vDMau8UYnNWCanfT4k
@RubenSousaDinis RubenSousaDinis merged commit 280fbea into main Jul 1, 2026
11 checks passed
@RubenSousaDinis RubenSousaDinis deleted the feat/pypi-docker-sandbox branch July 1, 2026 07:50
RubenSousaDinis added a commit that referenced this pull request Jul 1, 2026
Ships #78 (C-02 egress D rationale is now actionable — names the undeclared
host(s) and points at polygraph.egress; messaging only, grades unchanged) and
#79 (pypi/uvx MCP servers are now gradeable under the Docker sandbox:
wheels-only venv staging, offline resolve, venv-python launch). Methodology
version is unchanged (litmus-v10) — a pypi server is graded by the same rubric.


Claude-Session: https://claude.ai/code/session_011FW3vDMau8UYnNWCanfT4k

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant