A small Linux sandbox for native sandbox support for agent workflows in the same spirit as OpenAI Agents SDK. On Linux it uses bubblewrap for isolation and cgroups v2 for limits, similar to Codex's open-source Linux sandbox approach. No containers, no VMs, just Linux primitives.
Status: Alpha. Works on Linux with cgroup v2. Tested on Ubuntu 22.04 and 24.04.
The goal is simple:
- let an LLM or agent run code without giving it your whole machine
- keep its work inside
/workspace - block obvious bad behavior like reading host files, using the network, or spawning too many processes
- stay small enough that you can use it directly as a Python package
When you create a sandbox, Sandy:
- makes a fresh temporary workspace
- starts
bubblewrapwith isolated namespaces - mounts only a small filesystem view inside the sandbox
- makes
/workspaceand/tmpwritable - keeps the network off by default
- applies memory and PID limits with cgroups when the environment allows it
- deletes the temporary workspace when the sandbox closes
This is broadly similar to the sandboxing approach OpenAI has described for Codex. OpenAI’s public Codex materials say tasks run in isolated cloud sandboxes/containers, and the open-source Codex Linux sandbox docs say bubblewrap is the default filesystem sandbox on Linux. Sandy is not the same implementation, but it follows the same general idea: isolate execution, keep the filesystem tight, and only expose the paths the tool actually needs. Sources: OpenAI, Introducing Codex and openai/codex Linux sandbox README.
One important detail: if you keep using the same sandbox instance, the same /workspace stays there between commands. If you create a new sandbox instance, you get a fresh workspace.
This is the smallest useful example:
import asyncio
from deepagents_sandbox import NativeSandbox
async def main():
async with NativeSandbox() as sandbox:
await sandbox.execute("printf 'print(1 / 0)\n' > /workspace/main.py")
print((await sandbox.execute("python3 /workspace/main.py")).stderr.strip())
await sandbox.execute("printf 'print(1 + 1)\n' > /workspace/main.py")
print((await sandbox.execute("python3 /workspace/main.py")).stdout.strip())
asyncio.run(main())The same sandbox instance keeps /workspace/main.py between those commands.
import asyncio
from deepagents_sandbox import NativeSandbox, SandboxConfig
async def main():
config = SandboxConfig(
memory_limit_mb=512,
max_pids=256,
timeout_seconds=30,
)
async with NativeSandbox(config) as sandbox:
# run a command
result = await sandbox.execute("echo hello from the sandbox")
print(result.stdout) # hello from the sandbox
print(result.exit_code) # 0
# upload files into the sandbox workspace
await sandbox.upload_files([
("script.sh", b"#!/bin/sh\nwhoami && ls -la"),
])
# run the uploaded script
result = await sandbox.execute("sh /workspace/script.sh")
print(result.stdout)
# download files from the workspace
downloads = await sandbox.download_files(["output.txt"])
print(downloads[0].content.decode())
asyncio.run(main())- Process isolation via bubblewrap's PID, user, mount, network, and IPC namespaces
- Resource limits via cgroups v2: memory cap, PID ceiling, CPU weight
- Network isolation by default (opt-in with
network_access=True) - Filesystem sandbox — only the workspace directory is writable;
/usr,/bin,/lib,/lib64are read-only bind mounts - Timeout enforcement — commands that run too long are killed
- Output size limits — stdout/stderr truncated at 256KB to prevent log exhaustion
- Graceful degradation — if cgroups aren't available, deepagents_sandbox warns and runs without resource limits
- Linux (x86-64 or ARM64)
- cgroup v2 (standard on modern Linux distros)
- bubblewrap (
apt install bubblewrapon Debian/Ubuntu) - Python 3.10+
If you're developing on macOS, run deepagents_sandbox inside Docker or a Linux VM. bubblewrap and cgroup v2 are Linux-only. Some Docker environments expose cgroup v2 but do not delegate writable controllers; in that case deepagents_sandbox warns and runs without memory/PID/CPU limits.
pip install deepagents_sandboxFor Deep Agents / LangChain usage:
pip install "deepagents_sandbox[langchain]"For development:
git clone https://github.com/john221wick/sandy.git
cd sandy
pip install -e ".[dev]"SandboxConfig is a frozen dataclass — pass it to NativeSandbox at construction:
config = SandboxConfig(
memory_limit_mb=512, # max RAM (default: 512MB)
max_pids=256, # max processes (default: 256)
cpu_shares=100, # CPU weight (default: 100)
timeout_seconds=60.0, # hard timeout (default: 60s)
max_output_bytes=262144, # stdout/stderr cap (default: 256KB)
network_access=False, # allow outbound network (default: False)
gpu=False, # expose GPU (default: False, reserved for v2)
extra_bind_mounts=[], # list of (host_path, sandbox_path) tuples
extra_env={}, # extra environment variables
)Sandy exposes a Deep Agents backend directly from the package root:
from deepagents import create_deep_agent
from deepagents_sandbox import Sandbox
backend = Sandbox()
agent = create_deep_agent(
model="openai:gpt-4.1-mini",
backend=backend,
)
result = agent.invoke(
{
"messages": [
{
"role": "user",
"content": "Write /workspace/hello.py, then run it.",
}
]
}
)
backend.close()Notes:
- This targets Deep Agents specifically, not bare
ChatModel.invoke(...). - Use absolute paths under
/workspace. /tmp/...is supported for backend temp-file flows used by Deep Agents.- The adapter assumes it is running on Linux or inside Docker where
bubblewrapworks.
The test suite is split into three parts:
unit- workspace creation, read, write, list, snapshot, restore, and cleanup
- path traversal checks like
../../etc/passwd - symlink escape checks
- executor command validation, mount flags, env flags, and network flags
- timeout handling
- cgroup slice creation, config writing, and PID attachment
- Linux and
bwrapprerequisite detection NativeSandboxlifecycle and timeout forwarding- Deep Agents adapter behavior for
/workspace,/tmp, invalid paths, and error mapping
integration- real command execution through
bubblewrap - current working directory is
/workspace - network is blocked by default
- system paths are read-only
/workspaceis writable- file upload, execute, and download flows work
- timeout handling on real commands
- Deep Agents adapter can run commands and move files through
/workspaceand/tmp
- real command execution through
adversarial- fork bomb containment
- memory bomb containment when cgroup memory limits are available
- blocked network access with
curland DNS lookups - host file access checks like
/etc/shadow - path traversal attempts from inside the workspace
- symlink escape attempts
- blocked privilege escalation with
sudoandsu
Run them like this:
On macOS or Windows, you can run the unit tests (no bwrap required):
make setup
make unitTo run the full test suite including integration and adversarial tests on macOS, use Docker:
docker build -t deepagents_sandbox-test .
docker run --rm --privileged --cgroupns=private deepagents_sandbox-testTo run just the Deep Agents adapter tests in Docker:
docker build -t deepagents_sandbox-test .
docker run --rm --privileged --cgroupns=private deepagents_sandbox-test \
pytest -v --tb=short tests/unit/test_langchain_adapter.py tests/integration/test_langchain_adapter.pyThe container must run with --privileged --cgroupns=private so bubblewrap and cgroups work inside the container.
Or use the Makefile targets directly:
make lint # ruff
make typecheck # mypy
make test # pytest (all tests)
make unit # pytest -m "not integration and not adversarial"
make integration # pytest -m integration
make adversarial # pytest -m adversarialThe sandbox limits what a compromised or malicious command can do:
- Fork bomb (
:(){ :|:& };:): PID limit via cgroupspids.max - Memory exhaustion: memory limit via cgroups
memory.max - Network exfiltration:
--unshare-netby default - Read host files like
/etc/shadow: read-only filesystem, only/workspacewritable - Path traversal like
../../etc/passwd: workspace-relative path enforcement - Privilege escalation with
sudoorsu: dropped capabilities, user namespace isolation, and synthetic passwd/group files
Caveats: This is not a hard security boundary like a VM or a rootless container. It's designed to catch accidental mistakes and naive adversarial prompts. A sufficiently motivated attacker with kernel access or sufficient privileges can escape it. Use appropriately.
deepagents_sandbox/
detect.py # prerequisite checks (bwrap, cgroup v2, user namespaces)
workspace.py # temp directory with snapshot/restore
cgroup.py # cgroup v2 slice creation and cleanup
executor.py # bubblewrap subprocess management
config.py # SandboxConfig dataclass
sandbox.py # NativeSandbox (async context manager)
__init__.py # public API exports
tests/
unit/ # mocked tests, run on any OS
integration/ # real bwrap execution tests
adversarial/ # escape attempt tests
MIT