Goal
Make long-running sandboxes inspectable from outside. Every sandbox (CLI, Python SDK, embedded) exposes a per-process Unix control socket; a new sandlock ps lists live sandboxes and sandlock config <name> returns the effective policy. The socket is the substrate; later verbs (stats, logs, port, diff, shutdown) plug in without further protocol work.
Motivation
Today sandlock list exists but only sees sandboxes that registered themselves in /dev/shm/sandlock-$UID/network.json, and registration only happens in the port_remap branch of run_command (crates/sandlock-cli/src/main.rs:504). Two consequences:
- Sandboxes launched without
--port-remap, and every sandbox created via the Python SDK (which never touches network_registry), are invisible to sandlock list / sandlock kill.
- Even when a sandbox is listed, the registry only stores
pid, ports, allowed_hosts, and the virtual /etc/hosts (crates/sandlock-cli/src/network_registry.rs:13). None of the policy passed to Sandbox() (Landlock rules, seccomp adjustments, http rules, limits, determinism flags) is recoverable. There is no sandlock inspect and no introspection API on a running sandbox.
On-disk persistence is not a complete answer on its own: the effective policy lives in supervisor memory, and a policy_fn calling ctx.deny_path() at runtime mutates it, so a snapshot file would be stale. The Docker precedent (/var/lib/docker/containers/<id>/config.v2.json) persists for daemon-restart recovery, which Sandlock's one-supervisor-per-sandbox model does not need. The only source of truth is the supervisor process itself, so the introspection path needs to talk to it.
Current state
Command::List and Command::Kill (crates/sandlock-cli/src/main.rs:141, :168): read from network.json only.
network_registry::register (crates/sandlock-cli/src/network_registry.rs:68): single shared JSON file under flock, called from one CLI codepath.
sandlock-core and sandlock-ffi have no equivalent. Python SDK sandboxes do not register.
- The TOML profile serializer in
crates/sandlock-core/src/profile.rs already flattens the Sandbox dataclass to a profile shape; reusable for the JSON response body.
Proposed design
Per-sandbox runtime dir
Layout under /dev/shm/sandlock-$UID/<name>/:
pid: single-line pid file; lets ps list and prune dead sandboxes without opening the socket.
control.sock: Unix stream socket, supervisor-owned, bound before the child is forked.
No other files. meta.json is redundant (start time, argv, exe are all in /proc/<pid>). policy.json is dropped in favor of the socket so that dynamic-policy mutations are reflected.
The supervisor unlinks its dir on normal exit and on signal-handled exit paths. Readers (ps, kill, config) prune dirs whose pid is no longer alive (same liveness check list uses today, crates/sandlock-cli/src/network_registry.rs:141).
The existing /dev/shm/sandlock-$UID/network.json is deleted; per the project's pre-1.0 no-backcompat stance, no shim.
Wire protocol
4-byte big-endian length prefix, then UTF-8 JSON. One client at a time per socket; no concurrency to manage.
Request:
{\"v\": 1, \"verb\": \"config\", \"args\": {}}
Response:
{\"v\": 1, \"ok\": true, \"data\": { ...effective Sandbox policy... }}
or
{\"v\": 1, \"ok\": false, \"err\": \"...\"}
The v field reserves room to rev the wire pre-1.0. config is the only verb defined in v1.
Core changes (sandlock-core, ~180 LOC)
- Supervisor setup:
mkdir <name>/, write pid, bind control.sock. Hook into the existing supervisor lifecycle where seccomp/Landlock setup already runs.
- Cleanup: unlink the dir on normal exit, on supervisor panic, and on the signal-handled paths.
- Control loop: serve
control.sock from the supervisor's event loop or a dedicated thread (decision deferred to implementation; depends on whether the seccomp-notify loop can absorb a periodic accept without notification latency cost).
config handler: reuse the profile serializer from crates/sandlock-core/src/profile.rs and emit JSON instead of TOML. Runtime kwargs (policy_fn, init_fn, work_fn) render as the literal string \"<callback>\".
CLI changes (sandlock-cli, ~80 LOC)
- Rename
sandlock list to sandlock ps. Columns: NAME, PID, UPTIME, CMD. UPTIME and CMD come from /proc/<pid>/stat and /proc/<pid>/cmdline.
- New
sandlock config <name>: opens <name>/control.sock, sends {\"v\":1,\"verb\":\"config\"}, prints the data field. --json is the default; --toml round-trips into a profile via the existing TOML serializer.
- Rewire
sandlock kill <name> to read pid from <name>/pid.
- Delete
crates/sandlock-cli/src/network_registry.rs and the registration call at crates/sandlock-cli/src/main.rs:522.
Deferred (each is now an independent verb addition)
stats: RSS / CPU% / threads / FDs from /proc/<pid>; branchfs delta and forks-used from supervisor counters. Backs sandlock stats <name>, streaming by default per the Docker analogue.
logs: ring buffer of recent seccomp denials (syscall name + count, no argv per TOCTOU) and MITM proxy decisions. Backs sandlock logs <name>.
port: current port-remap table. Backs sandlock port <name>; replaces today's network_registry::update_ports callback wiring.
diff: branchfs A/M/D changes, reusing the existing Change type that dry_run returns. Backs sandlock diff <name>.
shutdown: graceful stop request; optional fast path for sandlock kill.
Each is "register a verb handler"; the protocol, dir, and lifecycle are already in place.
Open questions
- Socket auth.
/dev/shm/sandlock-$UID/ is mode-0700 per-user, so any same-user process can connect to any of that user's sandboxes. Same trust boundary as docker.sock for the docker group, but in a multi-sandbox-per-user setup it does let sandbox A's processes introspect sandbox B's policy if they can reach the host fs (Landlock normally prevents this, but worth naming the assumption).
- Where the control loop runs. If the supervisor's main loop is busy with seccomp-notify, a dedicated thread may be needed to avoid notification latency. Defer to whoever knows that code best.
- Callback placeholder shape.
\"<callback>\" is a flat marker. Worth signaling more (function name, module) for debuggability? repr(fn) covers Python; less obvious for Rust closures.
kill vs graceful stop. Cleanest separation: leave kill as SIGKILL via pid (no socket round-trip needed when the supervisor is hung), add a future sandlock stop for graceful that goes through the shutdown verb.
Acceptance
sandlock ps lists every live sandbox started by the same UID, regardless of whether it was launched via sandlock run, the Python SDK, or the FFI.
- A Python
Sandbox(...) instance whose process is still running shows up in sandlock ps and answers sandlock config <name> with its effective policy.
sandlock config <name> --toml produces a profile that sandlock run -p <that profile> re-runs identically.
- A sandbox using
policy_fn to call ctx.deny_path(\"/etc\") at runtime reflects the addition in sandlock config.
- Supervisor crash (SIGKILL) leaves a stale dir that the next
sandlock ps prunes.
- The old
/dev/shm/sandlock-$UID/network.json is gone; no code references it.
Goal
Make long-running sandboxes inspectable from outside. Every sandbox (CLI, Python SDK, embedded) exposes a per-process Unix control socket; a new
sandlock pslists live sandboxes andsandlock config <name>returns the effective policy. The socket is the substrate; later verbs (stats,logs,port,diff,shutdown) plug in without further protocol work.Motivation
Today
sandlock listexists but only sees sandboxes that registered themselves in/dev/shm/sandlock-$UID/network.json, and registration only happens in theport_remapbranch ofrun_command(crates/sandlock-cli/src/main.rs:504). Two consequences:--port-remap, and every sandbox created via the Python SDK (which never touchesnetwork_registry), are invisible tosandlock list/sandlock kill.pid,ports,allowed_hosts, and the virtual/etc/hosts(crates/sandlock-cli/src/network_registry.rs:13). None of the policy passed toSandbox()(Landlock rules, seccomp adjustments, http rules, limits, determinism flags) is recoverable. There is nosandlock inspectand no introspection API on a running sandbox.On-disk persistence is not a complete answer on its own: the effective policy lives in supervisor memory, and a
policy_fncallingctx.deny_path()at runtime mutates it, so a snapshot file would be stale. The Docker precedent (/var/lib/docker/containers/<id>/config.v2.json) persists for daemon-restart recovery, which Sandlock's one-supervisor-per-sandbox model does not need. The only source of truth is the supervisor process itself, so the introspection path needs to talk to it.Current state
Command::ListandCommand::Kill(crates/sandlock-cli/src/main.rs:141,:168): read fromnetwork.jsononly.network_registry::register(crates/sandlock-cli/src/network_registry.rs:68): single shared JSON file underflock, called from one CLI codepath.sandlock-coreandsandlock-ffihave no equivalent. Python SDK sandboxes do not register.crates/sandlock-core/src/profile.rsalready flattens theSandboxdataclass to a profile shape; reusable for the JSON response body.Proposed design
Per-sandbox runtime dir
Layout under
/dev/shm/sandlock-$UID/<name>/:pid: single-line pid file; letspslist and prune dead sandboxes without opening the socket.control.sock: Unix stream socket, supervisor-owned, bound before the child is forked.No other files.
meta.jsonis redundant (start time, argv, exe are all in/proc/<pid>).policy.jsonis dropped in favor of the socket so that dynamic-policy mutations are reflected.The supervisor unlinks its dir on normal exit and on signal-handled exit paths. Readers (
ps,kill,config) prune dirs whose pid is no longer alive (same liveness checklistuses today,crates/sandlock-cli/src/network_registry.rs:141).The existing
/dev/shm/sandlock-$UID/network.jsonis deleted; per the project's pre-1.0 no-backcompat stance, no shim.Wire protocol
4-byte big-endian length prefix, then UTF-8 JSON. One client at a time per socket; no concurrency to manage.
Request:
{\"v\": 1, \"verb\": \"config\", \"args\": {}}Response:
{\"v\": 1, \"ok\": true, \"data\": { ...effective Sandbox policy... }}or
{\"v\": 1, \"ok\": false, \"err\": \"...\"}The
vfield reserves room to rev the wire pre-1.0.configis the only verb defined in v1.Core changes (sandlock-core, ~180 LOC)
mkdir <name>/, writepid, bindcontrol.sock. Hook into the existing supervisor lifecycle where seccomp/Landlock setup already runs.control.sockfrom the supervisor's event loop or a dedicated thread (decision deferred to implementation; depends on whether the seccomp-notify loop can absorb a periodic accept without notification latency cost).confighandler: reuse the profile serializer fromcrates/sandlock-core/src/profile.rsand emit JSON instead of TOML. Runtime kwargs (policy_fn,init_fn,work_fn) render as the literal string\"<callback>\".CLI changes (sandlock-cli, ~80 LOC)
sandlock listtosandlock ps. Columns: NAME, PID, UPTIME, CMD. UPTIME and CMD come from/proc/<pid>/statand/proc/<pid>/cmdline.sandlock config <name>: opens<name>/control.sock, sends{\"v\":1,\"verb\":\"config\"}, prints thedatafield.--jsonis the default;--tomlround-trips into a profile via the existing TOML serializer.sandlock kill <name>to read pid from<name>/pid.crates/sandlock-cli/src/network_registry.rsand the registration call atcrates/sandlock-cli/src/main.rs:522.Deferred (each is now an independent verb addition)
stats: RSS / CPU% / threads / FDs from/proc/<pid>; branchfs delta and forks-used from supervisor counters. Backssandlock stats <name>, streaming by default per the Docker analogue.logs: ring buffer of recent seccomp denials (syscall name + count, no argv per TOCTOU) and MITM proxy decisions. Backssandlock logs <name>.port: current port-remap table. Backssandlock port <name>; replaces today'snetwork_registry::update_portscallback wiring.diff: branchfs A/M/D changes, reusing the existingChangetype thatdry_runreturns. Backssandlock diff <name>.shutdown: graceful stop request; optional fast path forsandlock kill.Each is "register a verb handler"; the protocol, dir, and lifecycle are already in place.
Open questions
/dev/shm/sandlock-$UID/is mode-0700 per-user, so any same-user process can connect to any of that user's sandboxes. Same trust boundary asdocker.sockfor the docker group, but in a multi-sandbox-per-user setup it does let sandbox A's processes introspect sandbox B's policy if they can reach the host fs (Landlock normally prevents this, but worth naming the assumption).\"<callback>\"is a flat marker. Worth signaling more (function name, module) for debuggability?repr(fn)covers Python; less obvious for Rust closures.killvs graceful stop. Cleanest separation: leavekillas SIGKILL via pid (no socket round-trip needed when the supervisor is hung), add a futuresandlock stopfor graceful that goes through theshutdownverb.Acceptance
sandlock pslists every live sandbox started by the same UID, regardless of whether it was launched viasandlock run, the Python SDK, or the FFI.Sandbox(...)instance whose process is still running shows up insandlock psand answerssandlock config <name>with its effective policy.sandlock config <name> --tomlproduces a profile thatsandlock run -p <that profile>re-runs identically.policy_fnto callctx.deny_path(\"/etc\")at runtime reflects the addition insandlock config.sandlock psprunes./dev/shm/sandlock-$UID/network.jsonis gone; no code references it.