RFC: Per-sandbox introspection via control socket (sandlock ps / config)

## Goal

Make long-running sandboxes inspectable from outside. Every sandbox (CLI, Python SDK, embedded) exposes a per-process Unix control socket; a new `sandlock ps` lists live sandboxes and `sandlock config <name>` returns the effective policy. The socket is the substrate; later verbs (`stats`, `logs`, `port`, `diff`, `shutdown`) plug in without further protocol work.

## Motivation

Today `sandlock list` exists but only sees sandboxes that registered themselves in `/dev/shm/sandlock-$UID/network.json`, and registration only happens in the `port_remap` branch of `run_command` (`crates/sandlock-cli/src/main.rs:504`). Two consequences:

1. Sandboxes launched without `--port-remap`, and every sandbox created via the Python SDK (which never touches `network_registry`), are invisible to `sandlock list` / `sandlock kill`.
2. Even when a sandbox is listed, the registry only stores `pid`, `ports`, `allowed_hosts`, and the virtual `/etc/hosts` (`crates/sandlock-cli/src/network_registry.rs:13`). None of the policy passed to `Sandbox()` (Landlock rules, seccomp adjustments, http rules, limits, determinism flags) is recoverable. There is no `sandlock inspect` and no introspection API on a running sandbox.

On-disk persistence is not a complete answer on its own: the effective policy lives in supervisor memory, and a `policy_fn` calling `ctx.deny_path()` at runtime mutates it, so a snapshot file would be stale. The Docker precedent (`/var/lib/docker/containers/<id>/config.v2.json`) persists for daemon-restart recovery, which Sandlock's one-supervisor-per-sandbox model does not need. The only source of truth is the supervisor process itself, so the introspection path needs to talk to it.

## Current state

- `Command::List` and `Command::Kill` (`crates/sandlock-cli/src/main.rs:141`, `:168`): read from `network.json` only.
- `network_registry::register` (`crates/sandlock-cli/src/network_registry.rs:68`): single shared JSON file under `flock`, called from one CLI codepath.
- `sandlock-core` and `sandlock-ffi` have no equivalent. Python SDK sandboxes do not register.
- The TOML profile serializer in `crates/sandlock-core/src/profile.rs` already flattens the `Sandbox` dataclass to a profile shape; reusable for the JSON response body.

## Proposed design

### Per-sandbox runtime dir

Layout under `/dev/shm/sandlock-$UID/<name>/`:

- `pid`: single-line pid file; lets `ps` list and prune dead sandboxes without opening the socket.
- `control.sock`: Unix stream socket, supervisor-owned, bound before the child is forked.

No other files. `meta.json` is redundant (start time, argv, exe are all in `/proc/<pid>`). `policy.json` is dropped in favor of the socket so that dynamic-policy mutations are reflected.

The supervisor unlinks its dir on normal exit and on signal-handled exit paths. Readers (`ps`, `kill`, `config`) prune dirs whose pid is no longer alive (same liveness check `list` uses today, `crates/sandlock-cli/src/network_registry.rs:141`).

The existing `/dev/shm/sandlock-$UID/network.json` is deleted; per the project's pre-1.0 no-backcompat stance, no shim.

### Wire protocol

4-byte big-endian length prefix, then UTF-8 JSON. One client at a time per socket; no concurrency to manage.

Request:

```json
{\"v\": 1, \"verb\": \"config\", \"args\": {}}
```

Response:

```json
{\"v\": 1, \"ok\": true, \"data\": { ...effective Sandbox policy... }}
```

or

```json
{\"v\": 1, \"ok\": false, \"err\": \"...\"}
```

The `v` field reserves room to rev the wire pre-1.0. `config` is the only verb defined in v1.

### Core changes (sandlock-core, ~180 LOC)

- Supervisor setup: `mkdir <name>/`, write `pid`, bind `control.sock`. Hook into the existing supervisor lifecycle where seccomp/Landlock setup already runs.
- Cleanup: unlink the dir on normal exit, on supervisor panic, and on the signal-handled paths.
- Control loop: serve `control.sock` from the supervisor's event loop or a dedicated thread (decision deferred to implementation; depends on whether the seccomp-notify loop can absorb a periodic accept without notification latency cost).
- `config` handler: reuse the profile serializer from `crates/sandlock-core/src/profile.rs` and emit JSON instead of TOML. Runtime kwargs (`policy_fn`, `init_fn`, `work_fn`) render as the literal string `\"<callback>\"`.

### CLI changes (sandlock-cli, ~80 LOC)

- Rename `sandlock list` to `sandlock ps`. Columns: NAME, PID, UPTIME, CMD. UPTIME and CMD come from `/proc/<pid>/stat` and `/proc/<pid>/cmdline`.
- New `sandlock config <name>`: opens `<name>/control.sock`, sends `{\"v\":1,\"verb\":\"config\"}`, prints the `data` field. `--json` is the default; `--toml` round-trips into a profile via the existing TOML serializer.
- Rewire `sandlock kill <name>` to read pid from `<name>/pid`.
- Delete `crates/sandlock-cli/src/network_registry.rs` and the registration call at `crates/sandlock-cli/src/main.rs:522`.

### Deferred (each is now an independent verb addition)

- `stats`: RSS / CPU% / threads / FDs from `/proc/<pid>`; branchfs delta and forks-used from supervisor counters. Backs `sandlock stats <name>`, streaming by default per the Docker analogue.
- `logs`: ring buffer of recent seccomp denials (syscall name + count, no argv per TOCTOU) and MITM proxy decisions. Backs `sandlock logs <name>`.
- `port`: current port-remap table. Backs `sandlock port <name>`; replaces today's `network_registry::update_ports` callback wiring.
- `diff`: branchfs A/M/D changes, reusing the existing `Change` type that `dry_run` returns. Backs `sandlock diff <name>`.
- `shutdown`: graceful stop request; optional fast path for `sandlock kill`.

Each is \"register a verb handler\"; the protocol, dir, and lifecycle are already in place.

## Open questions

1. **Socket auth.** `/dev/shm/sandlock-$UID/` is mode-0700 per-user, so any same-user process can connect to any of that user's sandboxes. Same trust boundary as `docker.sock` for the docker group, but in a multi-sandbox-per-user setup it does let sandbox A's processes introspect sandbox B's policy if they can reach the host fs (Landlock normally prevents this, but worth naming the assumption).
2. **Where the control loop runs.** If the supervisor's main loop is busy with seccomp-notify, a dedicated thread may be needed to avoid notification latency. Defer to whoever knows that code best.
3. **Callback placeholder shape.** `\"<callback>\"` is a flat marker. Worth signaling more (function name, module) for debuggability? `repr(fn)` covers Python; less obvious for Rust closures.
4. **`kill` vs graceful stop.** Cleanest separation: leave `kill` as SIGKILL via pid (no socket round-trip needed when the supervisor is hung), add a future `sandlock stop` for graceful that goes through the `shutdown` verb.

## Acceptance

- `sandlock ps` lists every live sandbox started by the same UID, regardless of whether it was launched via `sandlock run`, the Python SDK, or the FFI.
- A Python `Sandbox(...)` instance whose process is still running shows up in `sandlock ps` and answers `sandlock config <name>` with its effective policy.
- `sandlock config <name> --toml` produces a profile that `sandlock run -p <that profile>` re-runs identically.
- A sandbox using `policy_fn` to call `ctx.deny_path(\"/etc\")` at runtime reflects the addition in `sandlock config`.
- Supervisor crash (SIGKILL) leaves a stale dir that the next `sandlock ps` prunes.
- The old `/dev/shm/sandlock-$UID/network.json` is gone; no code references it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Per-sandbox introspection via control socket (sandlock ps / config) #68

Goal

Motivation

Current state

Proposed design

Per-sandbox runtime dir

Wire protocol

Core changes (sandlock-core, ~180 LOC)

CLI changes (sandlock-cli, ~80 LOC)

Deferred (each is now an independent verb addition)

Open questions

Acceptance

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

RFC: Per-sandbox introspection via control socket (sandlock ps / config) #68

Description

Goal

Motivation

Current state

Proposed design

Per-sandbox runtime dir

Wire protocol

Core changes (sandlock-core, ~180 LOC)

CLI changes (sandlock-cli, ~80 LOC)

Deferred (each is now an independent verb addition)

Open questions

Acceptance

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions