Skip to content

john221wick/deepagents-sandbox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

deepagents_sandbox

A small Linux sandbox for native sandbox support for agent workflows in the same spirit as OpenAI Agents SDK. On Linux it uses bubblewrap for isolation and cgroups v2 for limits, similar to Codex's open-source Linux sandbox approach. No containers, no VMs, just Linux primitives.

Status: Alpha. Works on Linux with cgroup v2. Tested on Ubuntu 22.04 and 24.04.

Goal of this project

The goal is simple:

  • let an LLM or agent run code without giving it your whole machine
  • keep its work inside /workspace
  • block obvious bad behavior like reading host files, using the network, or spawning too many processes
  • stay small enough that you can use it directly as a Python package

How this works

When you create a sandbox, Sandy:

  1. makes a fresh temporary workspace
  2. starts bubblewrap with isolated namespaces
  3. mounts only a small filesystem view inside the sandbox
  4. makes /workspace and /tmp writable
  5. keeps the network off by default
  6. applies memory and PID limits with cgroups when the environment allows it
  7. deletes the temporary workspace when the sandbox closes

This is broadly similar to the sandboxing approach OpenAI has described for Codex. OpenAI’s public Codex materials say tasks run in isolated cloud sandboxes/containers, and the open-source Codex Linux sandbox docs say bubblewrap is the default filesystem sandbox on Linux. Sandy is not the same implementation, but it follows the same general idea: isolate execution, keep the filesystem tight, and only expose the paths the tool actually needs. Sources: OpenAI, Introducing Codex and openai/codex Linux sandbox README.

One important detail: if you keep using the same sandbox instance, the same /workspace stays there between commands. If you create a new sandbox instance, you get a fresh workspace.

Simplest example

This is the smallest useful example:

import asyncio
from deepagents_sandbox import NativeSandbox

async def main():
    async with NativeSandbox() as sandbox:
        await sandbox.execute("printf 'print(1 / 0)\n' > /workspace/main.py")
        print((await sandbox.execute("python3 /workspace/main.py")).stderr.strip())

        await sandbox.execute("printf 'print(1 + 1)\n' > /workspace/main.py")
        print((await sandbox.execute("python3 /workspace/main.py")).stdout.strip())

asyncio.run(main())

The same sandbox instance keeps /workspace/main.py between those commands.

More complete example

import asyncio
from deepagents_sandbox import NativeSandbox, SandboxConfig

async def main():
    config = SandboxConfig(
        memory_limit_mb=512,
        max_pids=256,
        timeout_seconds=30,
    )

    async with NativeSandbox(config) as sandbox:
        # run a command
        result = await sandbox.execute("echo hello from the sandbox")
        print(result.stdout)   # hello from the sandbox
        print(result.exit_code)  # 0

        # upload files into the sandbox workspace
        await sandbox.upload_files([
            ("script.sh", b"#!/bin/sh\nwhoami && ls -la"),
        ])

        # run the uploaded script
        result = await sandbox.execute("sh /workspace/script.sh")
        print(result.stdout)

        # download files from the workspace
        downloads = await sandbox.download_files(["output.txt"])
        print(downloads[0].content.decode())

asyncio.run(main())

What you get

  • Process isolation via bubblewrap's PID, user, mount, network, and IPC namespaces
  • Resource limits via cgroups v2: memory cap, PID ceiling, CPU weight
  • Network isolation by default (opt-in with network_access=True)
  • Filesystem sandbox — only the workspace directory is writable; /usr, /bin, /lib, /lib64 are read-only bind mounts
  • Timeout enforcement — commands that run too long are killed
  • Output size limits — stdout/stderr truncated at 256KB to prevent log exhaustion
  • Graceful degradation — if cgroups aren't available, deepagents_sandbox warns and runs without resource limits

Requirements

  • Linux (x86-64 or ARM64)
  • cgroup v2 (standard on modern Linux distros)
  • bubblewrap (apt install bubblewrap on Debian/Ubuntu)
  • Python 3.10+

If you're developing on macOS, run deepagents_sandbox inside Docker or a Linux VM. bubblewrap and cgroup v2 are Linux-only. Some Docker environments expose cgroup v2 but do not delegate writable controllers; in that case deepagents_sandbox warns and runs without memory/PID/CPU limits.

Installation

pip install deepagents_sandbox

For Deep Agents / LangChain usage:

pip install "deepagents_sandbox[langchain]"

For development:

git clone https://github.com/john221wick/sandy.git
cd sandy
pip install -e ".[dev]"

Configuration

SandboxConfig is a frozen dataclass — pass it to NativeSandbox at construction:

config = SandboxConfig(
    memory_limit_mb=512,      # max RAM (default: 512MB)
    max_pids=256,             # max processes (default: 256)
    cpu_shares=100,           # CPU weight (default: 100)
    timeout_seconds=60.0,     # hard timeout (default: 60s)
    max_output_bytes=262144,  # stdout/stderr cap (default: 256KB)
    network_access=False,     # allow outbound network (default: False)
    gpu=False,                # expose GPU (default: False, reserved for v2)
    extra_bind_mounts=[],     # list of (host_path, sandbox_path) tuples
    extra_env={},             # extra environment variables
)

Deep Agents adapter

Sandy exposes a Deep Agents backend directly from the package root:

from deepagents import create_deep_agent
from deepagents_sandbox import Sandbox

backend = Sandbox()
agent = create_deep_agent(
    model="openai:gpt-4.1-mini",
    backend=backend,
)

result = agent.invoke(
    {
        "messages": [
            {
                "role": "user",
                "content": "Write /workspace/hello.py, then run it.",
            }
        ]
    }
)

backend.close()

Notes:

  • This targets Deep Agents specifically, not bare ChatModel.invoke(...).
  • Use absolute paths under /workspace.
  • /tmp/... is supported for backend temp-file flows used by Deep Agents.
  • The adapter assumes it is running on Linux or inside Docker where bubblewrap works.

Tests included

The test suite is split into three parts:

  • unit
    • workspace creation, read, write, list, snapshot, restore, and cleanup
    • path traversal checks like ../../etc/passwd
    • symlink escape checks
    • executor command validation, mount flags, env flags, and network flags
    • timeout handling
    • cgroup slice creation, config writing, and PID attachment
    • Linux and bwrap prerequisite detection
    • NativeSandbox lifecycle and timeout forwarding
    • Deep Agents adapter behavior for /workspace, /tmp, invalid paths, and error mapping
  • integration
    • real command execution through bubblewrap
    • current working directory is /workspace
    • network is blocked by default
    • system paths are read-only
    • /workspace is writable
    • file upload, execute, and download flows work
    • timeout handling on real commands
    • Deep Agents adapter can run commands and move files through /workspace and /tmp
  • adversarial
    • fork bomb containment
    • memory bomb containment when cgroup memory limits are available
    • blocked network access with curl and DNS lookups
    • host file access checks like /etc/shadow
    • path traversal attempts from inside the workspace
    • symlink escape attempts
    • blocked privilege escalation with sudo and su

Run them like this:

On macOS or Windows, you can run the unit tests (no bwrap required):

make setup
make unit

To run the full test suite including integration and adversarial tests on macOS, use Docker:

docker build -t deepagents_sandbox-test .
docker run --rm --privileged --cgroupns=private deepagents_sandbox-test

To run just the Deep Agents adapter tests in Docker:

docker build -t deepagents_sandbox-test .
docker run --rm --privileged --cgroupns=private deepagents_sandbox-test \
  pytest -v --tb=short tests/unit/test_langchain_adapter.py tests/integration/test_langchain_adapter.py

The container must run with --privileged --cgroupns=private so bubblewrap and cgroups work inside the container.

Or use the Makefile targets directly:

make lint          # ruff
make typecheck     # mypy
make test          # pytest (all tests)
make unit          # pytest -m "not integration and not adversarial"
make integration   # pytest -m integration
make adversarial   # pytest -m adversarial

Security properties

The sandbox limits what a compromised or malicious command can do:

  • Fork bomb (:(){ :|:& };:): PID limit via cgroups pids.max
  • Memory exhaustion: memory limit via cgroups memory.max
  • Network exfiltration: --unshare-net by default
  • Read host files like /etc/shadow: read-only filesystem, only /workspace writable
  • Path traversal like ../../etc/passwd: workspace-relative path enforcement
  • Privilege escalation with sudo or su: dropped capabilities, user namespace isolation, and synthetic passwd/group files

Caveats: This is not a hard security boundary like a VM or a rootless container. It's designed to catch accidental mistakes and naive adversarial prompts. A sufficiently motivated attacker with kernel access or sufficient privileges can escape it. Use appropriately.

Project layout

deepagents_sandbox/
  detect.py    # prerequisite checks (bwrap, cgroup v2, user namespaces)
  workspace.py # temp directory with snapshot/restore
  cgroup.py    # cgroup v2 slice creation and cleanup
  executor.py  # bubblewrap subprocess management
  config.py    # SandboxConfig dataclass
  sandbox.py   # NativeSandbox (async context manager)
  __init__.py  # public API exports

tests/
  unit/        # mocked tests, run on any OS
  integration/ # real bwrap execution tests
  adversarial/ # escape attempt tests

License

MIT

About

Native sandbox for deepagents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors