feat(ffi): C ABI for the Handler trait (RFC #43)#44
Conversation
Add empty modules that the upcoming Handler ABI tasks will fill in. The placeholder integration test wires the module path so later tasks can extend it without churning the test file's discovery.
Stable repr(C) projection of SeccompNotif/SeccompData so external handler implementations can read the kernel notification without binding to the unstable internal struct layout. Also re-export SeccompNotif/SeccompData from sandlock_core's crate root. They are already part of the public Handler-trait contract through HandlerCtx.notif; re-exporting them makes the path stable without widening the `sys` module visibility.
Opaque child-memory accessor that wraps the existing TOCTOU-safe helpers in sandlock-core. The handle's lifetime is bounded by a single callback invocation; the C side must not retain it past the return.
Output-parameter approach for the NotifAction surface — the C handler writes its decision via setter calls instead of returning a tagged union by value. Makes future variant additions backwards-compatible without breaking C ABI.
Wraps a C handler function pointer, opaque user-data slot with an optional destructor, and per-handler exception policy (defaulting to Kill — fail-closed).
Translates a sandlock_handler_t into a Handler impl by snapshotting the notification, allocating a mem handle and action out struct, then offloading the synchronous C callback to spawn_blocking. Failures (rc, unset action, or panic across FFI) trigger the configured exception policy.
New C entry points that pair the existing policy/argv surface with the handler ABI. Each handler registration's ownership transfers into the run; the supervisor frees them when the run completes.
Canonical example for downstream bindings — compiles handler_smoke.c against sandlock.h, links the cdylib, runs it from a cargo test.
Adds a section to extension-handlers.md describing the new C ABI, ownership rules, callback contract, and a pointer to the canonical smoke test.
Adds tests for previously uncovered paths in the handler ABI: all NotifAction translation variants, exception policy fallbacks, panic recovery via catch_unwind, Unset action handling, handler construction edge cases, run-path failure modes, and a live-fd mem_read_cstr smoke against an intercepted openat.
The exception-policy doc-comment and extension-handlers.md both advertise that a panicking handler triggers the configured exception policy. With `extern "C" fn` that promise was a lie — per Rust ABI rules panics across `extern "C"` abort the process before `catch_unwind` ever fires. Switch the two C-callable function-pointer type aliases to `extern "C-unwind"`. C callers see no change (C cannot panic); Rust handlers plugged into the C ABI now actually exercise the panic-recovery path. Un-ignore the regression test.
The previous round-trip test relied on leak-sanitizer to flag a missed ud_drop call; default cargo test does not enable LSan, so the assertion was a no-op. Replace with an AtomicUsize counter incremented inside the dropper and asserted post-free.
- Strip internal task/PR references from production code comments. - Drop sandlock_action_set_inject_fd_send_tracked setter; reserve enum discriminant 5 for the future tracker-aware variant. - Add `name` parameter to sandlock_run_with_handlers and sandlock_run_interactive_with_handlers for parity with the existing sandlock_run entry point. - Tighten Sync safety comment on sandlock_handler_t to reflect the serial dispatch contract from sandlock-core. - Document that InjectFdSend ownership transfer is conditional on the action surviving until supervisor dispatch. - Reject argc == 0 in sandlock_run_with_handlers* before reaching the sandbox. - Remove SANDLOCK_HANDLER_MODULE_BUILT scaffolding and its placeholder integration test.
The previous implementation substituted the child's pid for pgid==0 in sandlock_action_set_kill — that targets the single process, not its process group, contradicting the documented "Kill the child process group" semantics on NotifAction::Kill. Query getpgid(notif.pid) inside FfiHandler::handle to recover the real pgid. Fall back to the pid only on ESRCH (child already exited). Update the C-header doc to reflect the actual resolution mechanism.
Four findings from adversarial review:
A1: A C handler that calls sandlock_action_set_inject_fd_send and
then panics/returns rc != 0 left srcfd open in the supervisor
forever. Drain pending inject-fd payloads on every fallback
path.
A2: A C handler can write the InjectFdSendTracked discriminant
directly without a setter (the value is public in the C header).
Drain srcfd in that translate_action branch.
A3: run_with_handlers_inner's early-return paths (null policy,
invalid argv, invalid name) abandoned registered handler
pointers. Restructure to always consume the registration array
regardless of return value; update the C header contract.
A5: sandlock_handler_free was extern "C" but its ud_drop docstring
claimed unwinding panics. Switch to extern "C-unwind" to match
the documented behavior.
B1: handler_new_rejects_invalid_exception_policy probed only one
out-of-range value (99u32). A mutation that rejects only that
single value would have slipped through. Iterate over the
boundary (3u32, one past Continue=2), a mid-range value, and
u32::MAX.
B2: action_out_layout_is_stable verified only size and alignment.
A field reorder that preserves the total size — e.g. swapping
kind with the implicit padding — would have passed. Pin down
each field's byte offset via addr_of! through MaybeUninit. Add
the same offset-based test for sandlock_notif_data_t alongside
the existing size assertion.
In nested PID namespaces (Kubernetes pod-in-pod, KubeVirt, DinD)
the kernel can deliver a seccomp notification with notif.pid == 0
when the trapped task is invisible from the supervisor's PID
namespace. `getpgid(0)` then returns the supervisor's own pgid;
substituting that into a Kill { pgid } action sent via killpg
kills the supervisor — a sandbox-escape vector reachable by any
malicious child.
Add three defensive guards before substitution:
- reject notif.pid <= 0,
- reject getpgid() failure (ESRCH and friends),
- reject resolved pgid that matches the supervisor's own.
In all three cases, fall back to the bare pid; the kernel will
reject the malformed killpg with ESRCH, but the supervisor lives.
Regression tests cover each guard with destructive verification:
removing the corresponding arm produces test failures asserting
the unfixed action would target the supervisor.
|
Thanks for the PR. 3 high-level comments:
|
Maintainer review caught a gap in the always-consume contract: when the registration array contains a null handler, collect_registrations returns None and run_with_handlers_inner returned null without calling release_registrations. The non-null entries leaked — their ud_drop never fired — violating the public C-ABI contract documented in sandlock.h. Add release_registrations on that branch. The accompanying test was internally consistent with the old contract (it manually called sandlock_handler_free on the registered handler to verify the dropper fires), which masked the bug. Update the test to assert the supervisor performs the release without any manual free — matching the public contract and serving as a regression hook for both the implementation and the test pattern. Resolves comment 1 from PR multikernel#44 review.
Maintainer review flagged that the Sync safety comment claimed "supervisor serializes per handler" while the public C-ABI docs separately required `ud` to be thread-safe — a self-contradiction. Adopt the conservative position: the supervisor MAY dispatch handler callbacks concurrently across different notifications (the current loop is largely serial but the public ABI makes no concurrency guarantee). The C caller is responsible for ensuring their opaque `ud` is thread-safe. This survives a future dispatcher that parallelises without an ABI break. Resolves comment 2 from PR multikernel#44 review.
Maintainer review noted that the monolithic handler.rs (~940 lines) mixed ABI types, the FfiHandler adapter, helpers, and run entry points, blurring trust boundaries. Split into: * handler/abi.rs — public ABI types, setters, accessors * handler/adapter.rs — FfiHandler + translate_action + drain helpers * handler/run.rs — sandlock_run_with_handlers entry points handler/mod.rs re-exports the full public surface so external consumers and integration tests continue to import from `sandlock_ffi::handler::*` unchanged. Resolves comment 3 from PR multikernel#44 review.
The K1 fix (5deb8de) substituted `pid` for `child_pgid` when `notif.pid <= 0`. The substituted `pid` then flowed into the Kill action as `pgid = 0`, which `killpg(0, sig)` interprets per POSIX as "signal the caller's process group" — i.e., the supervisor. Same suicide vector K1 was supposed to close, just one syscall later. Introduce an UNSAFE_PGID sentinel (i32::MIN). When pid resolution cannot produce a safe value (pid<=0, getpgid failure, or resolved pgid matches the supervisor's), set child_pgid = UNSAFE_PGID. Both translate_action's Kill arm and exception_action's Kill arm check the sentinel and refuse to produce a Kill action — translate_action returns None (routing to exception policy), exception_action's Kill arm degrades to Errno(EPERM). The k1 regression test was a "for show" test that asserted `Kill { pgid: 0 }` as the expected outcome, locking in the suicide-vector behavior. Update it to assert exception-policy fallback. Add a second test that registers a SIGURG handler in the test process and verifies it is not delivered after a lethal-pgid kill action runs through dispatch. k2 and k3 tests also updated: their previous "fall back to bare pid" assertions baked in a behaviour the new resolution rejects via UNSAFE_PGID. They now assert the exception-policy fallback. Four other tests (a1/a2 fd-drain hooks, kill-policy-on-rc-nonzero, recovers-from-panic) used `fake_ctx()` whose pid is the test process's own — getpgid then matches supervisor_pgid and the Kill exception policy correctly degrades to EPERM. The two non-pipe tests switch to an isolated-child helper so the Kill assertion stays observable; the two pipe-bearing tests assert EPERM (their load-bearing check is the drain, not the action kind).
A5 (d1807c3) made sandlock_handler_free use extern "C-unwind" so a panicking user-supplied ud_drop unwinds cleanly. But the much-more- trafficked sandlock_run_with_handlers / _interactive entry points were still extern "C", and release_registrations dropped multiple boxes in a loop without catch_unwind. A mid-loop ud_drop panic: (1) aborted the process at the extern "C" boundary, and (2) skipped cleanup of the remaining handlers in the array, which violated the "array consumed as a whole" contract. Switch the run entry points to extern "C-unwind". Wrap each per-element drop in release_registrations with catch_unwind, then re-raise the first captured panic after the loop completes so the outer caller's panic-recovery flow still observes one unwinding panic — but only after all handlers have been freed. Add release_registrations_continues_after_mid_loop_panic as a regression hook.
The C header documents sandlock_handler_ud_drop_t as "invoked exactly once when the container is freed" — but the Rust Drop impl gated the call on a non-null ud check, contradicting the contract. A C caller using ud_drop for lifecycle logging (or any side effect not keyed on ud) silently lost the cleanup whenever the container had a null ud. The previous regression test (handler_new_with_null_ud_and_dropper_does_not_invoke_dropper) used a panicking dropper and asserted the dropper did NOT fire — locking in the buggy behavior as expected output. Rename and rewrite to assert the dropper does fire exactly once. Idiomatic C: free(NULL) is well-defined no-op; user droppers can mirror that on their own side if they care.
max_len=1 path, argc/nreg bounds, errno field rename
Five minor findings from the deep bug-pattern audit:
- B-new-2: end-to-end tests asserted `stdout.contains("777")` /
`contains("111")` / `contains("222")` — a real child pid containing
those substrings (~1.1% per run) would mask a regression. Switch to
exact match.
- B-new-4: action setter tests verified the kind tag and the written
payload field but not that tag-only setters (set_continue, set_hold)
leave the union payload untouched. Plant a sentinel in payload.none
and assert it survives.
- A-new-3: sandlock_mem_read_cstr rejected max_len==1 even though the
header doc says max_len >= 1 is sufficient to fit a NUL. Add a
fast-path that probes the target for readability and writes a NUL
on success.
- D-new-2: argc and nregistrations had no upper bound; a malicious or
buggy C caller passing argc = u32::MAX with a small argv backing
was unbounded-deref UB. Cap both at 4096 and add regression tests.
- E-new-1: the Rust union field `errno` had the C-side counterpart
named `errno_value` (to avoid the <errno.h> macro collision). The
asymmetry was a documentation hazard. Rename the Rust field to
`errno_value` to match.
|
Thanks for the careful review — your 3 comments are addressed across 3 commits, and a follow-up audit caught 8 more bugs in the same patterns. CI is green (8/8). Your 3 comments1. Null-handler ownership leak → You were right: 2. Sync contract self-contradiction → Adopted your preference: the supervisor may dispatch concurrently across notifications, so the C caller is responsible for ensuring 3. handler.rs too large → Split into Proactive audit caught 8 moreYour review prompted me to run an explicit bug-pattern audit on the rest of the branch, looking specifically for repeats of the patterns we'd already hit (doc/impl mismatches, "for show" tests, ownership gaps in error paths, defense omissions). 11 findings total — 8 fixed in this push: 2 critical
3 important
(plus the test-pattern fix for the K1 supervisor-suicide test, bundled with 5 minor
Out of scope / deferredSame as the original PR body, two items unchanged:
Status
Happy to squash the audit-driven commits into the original feature commits before merge if you'd find that cleaner. |
Summary
Adds a C ABI surface for the
Handlertrait introduced in #36 so non-Rust callers (Python via ctypes, Go via cgo, future bindings) can register custom seccomp-notif handlers without writing Rust.Implements the maintainer-approved options from RFC #43:
tokio::task::spawn_blockingHandlerCtxsurface)repr(C)snapshot (sandlock_notif_data_t) for the cheap copy-safe fields; opaque handle (sandlock_mem_handle_t*) for child-memory accessNotifActionsurface)sandlock_action_out_tfilled via setter calls — no tagged union returned by valuesandlock_handler_t*allocated viasandlock_handler_new, freed viasandlock_handler_free(or the supervisor on registration)sandlock_exception_policy_tdefaulting toKill(fail-closed); applied on rc != 0,Unsetaction, or callback panicPublic surface added
Rust modules (new files under
crates/sandlock-ffi/src/):handler.rs— opaque types, setters, accessor functions,FfiHandleradapter, run entry pointsnotif_repr.rs—sandlock_notif_data_tstable layoutC header (
crates/sandlock-ffi/include/sandlock.h):sandlock_notif_data_t,sandlock_mem_handle_t,sandlock_action_out_t(+ tag enum + payload union + nested structs),sandlock_handler_t,sandlock_handler_registration_tsandlock_handler_fn_t,sandlock_handler_ud_drop_t(bothextern "C-unwind"for Rust handlers exposed through the C ABI)sandlock_mem_read_cstr/_read/_write, sevensandlock_action_set_*setters,sandlock_handler_new/_free,sandlock_run_with_handlers/_interactive_with_handlerssandlock-core touch — a single
pub useline atcrates/sandlock-core/src/lib.rs:31:Required because the Handler trait contract already binds
SeccompNotifthroughHandlerCtx.notif; the types must be name-reachable for any external implementer. Narrower than makingmod syspublic.Test plan
cargo build -p sandlock-fficlean.cargo clippy -p sandlock-ffi -- -A clippy::allclean.catch_unwind(verified destructively),Unsetfallback, handler allocation/free withud_dropcounter assertion,run_with_handlersfailure paths (null policy / argv / registrations / handler-in-array, with transactional cleanup), multiple-handler dispatch, real-fdmem_read_cstrthrough an interceptedopenat,c-unwindABI correctness, layout assertions viaoffset_of-style probing, k8s defense-in-depth forchild_pgidresolution.Drop(now uses anAtomicUsizecounter) and a self-referencing assertion onpgidsubstitution (now spawns a real child sopid != pgid).Notable design notes
extern "C-unwind"for handler function pointers. The exception-policy doc-comment advertises panic recovery via the configured policy. With plainextern "C" fnthis was a lie — Rust 1.71+ ABI rules abort on panic acrossextern "C"boundaries. Usingextern "C-unwind"makes the documented behavior real for Rust handlers plugged through the C ABI; pure-C callers cannot panic (C has no unwinding) so the choice is invisible to them. Stable since 1.71 (RFC 2945).InjectFdSendTrackeddiscriminant reserved (no setter shipped). The tracker-callback path requires a registry of pending injections that is out of scope for this PR. The enum discriminant (SANDLOCK_ACTION_INJECT_FD_SEND_TRACKED = 5), payload variant, and inner struct types are preserved for ABI stability; the setter is deliberately omitted.translate_actiontreats the discriminant as Unset and drains any pending srcfd before falling back to the exception policy. The setter will return in a follow-up PR that lands the tracker registry.Run-entry-point ownership contract.
sandlock_run_with_handlersalways consumes the handler pointers in its registration array, regardless of return value. Early-return paths (null policy, invalid argv, invalid name, null handler in array) free the handlers viarelease_registrations, so the C caller never has to worry about a partial-consume window. Documented insandlock.h.child_pgidresolution forKill { pgid: 0 }substitution. Three defense-in-depth guards inFfiHandler::handle:notif.pid <= 0— guards againstgetpgid(0)returning the supervisor's own pgid in nested PID namespaces (Kubernetes pod-in-pod, KubeVirt, DinD). Akillpg(supervisor_pgid)would be a sandbox-escape vector reachable by any malicious child.getpgid()ESRCH — child exited between notification and query.In all three cases, the substitution falls back to the bare pid; the kernel rejects
killpg(pid)ifpiddoes not name a valid process group, but the supervisor survives.Out of scope (deferred follow-ups)
1.
set_inject_fd_sendsrcfd validation. A C handler can today passsrcfd = 2(stderr) or any other supervisor-owned fd;OwnedFd::from_raw_fdconsumes it andclose(2)corrupts supervisor output. A simplesrcfd >= 3 && srcfd != notif_fdcheck would close the most obvious foot-gun. Deferred because the right policy (deny-list vs allow-list, runtime check vs build-time) deserves your input.2. Unchecked
setpgid(0, 0)calls in sandlock-core. While auditing pgid resolution for the FFI defense-in-depth, I noticed two child-path call sites that don't check the syscall return:crates/sandlock-core/src/sandbox.rs:1201(main spawn path, right afterfork())crates/sandlock-core/src/fork.rs:80(COW clone child)context.rs:766already checks and exits on failure. The unchecked sites are not bugs the present PR is trying to fix — sandlock correctly callssetpgidin all three paths — but if either failed (rare:EPERMif the child is already a session leader, or namespace-policy denials), the child would silently inherit the supervisor's pgid. The FFI's third guard catches this, but the underlying core would benefit from matching error handling. Happy to open a small follow-up PR adding error checks if you'd find it useful.3.
nameparameter via builder. The new entry points acceptname: *const c_charfor parity withsandlock_run. Internally the value is applied viaSandbox::with_nameon a clone (same pattern assandlock_runin lib.rs). The core APISandbox::run_with_extra_handlersdoes not currently accept anameparameter directly; happy to thread it through if you prefer that shape, but the builder-on-clone path is identical to the existingsandlock_run.4. Python bindings (PR 2). This PR ships the C ABI only. The Python wrapper module (
sandlock.handler,sandlock.NotifAction) on ctypes is the next PR per the RFC's 3-phase plan.5. Ergonomic helpers (PR 3). Path-deny presets,
policy_fn-style glue, higher-level Python idioms — third PR.Test environment
extern "C-unwind"is stable since 1.71;extern "C"ABI tests cover the unchanged convention)/usr/bin/python3(two end-to-end tests spawn it; CI matrix without Python would need either the binary installed or#[ignore]on those two tests).RFC reference
Implements maintainer-approved options A1 / C2 / C3 / B4 / D5 (Kill default) from #43. Each commit explains the why of its scope; the audit trail across the 17 commits also shows where additional review passes caught real bugs (panic-recovery ABI, fd leaks on error paths, pgid resolution). Happy to squash on request.