shadow-fd: cache sealed memfds across repeated opens#67
Open
brian049 wants to merge 1 commit into
Open
Conversation
kbox_shadow_create() rebuilt a fresh memfd on every read-only open by copying the entire file contents through LKL via a pread64 loop. For hot files opened repeatedly across exec like dynamic linker, libc, and other shared objects re-opened by every fork+exec, and this redundant copy dominated the open path. This commit add an inode-keyed cache that reuses sealed memfds across opens. The cache holds up to 64 entries in an LRU list, and each lookup is a linear scan of a small array. The key is (dev, ino, mtime_sec, mtime_nsec, size) Cached fds are always sealed (F_SEAL_WRITE | GROW | SHRINK | SEAL) before insertion, and only the read-only path enters the cache. The new entry point kbox_shadow_create_cached() replaces every read-only and seal-bound call site. The two main-binary exec_memfd paths in dispatch-exec.c and image.c keep the original kbox_shadow_create() because they patch PT_INTERP via pwrite() and must own a private and writable memfd. Callers always receive a dup() of the cached entry, never the cached fd itself, so closing the returned fd never affects the cache. Change-Id: I31c148d2513983562628feaccdf0abfba6e70ed5
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR resolves issue #37 by introducing an inode-keyed cache for
sealed shadow memfds, eliminating the redundant pread64 copy on every
read-only open of hot files.
Summary
kbox_shadow_create_cached(): returnsdup()of an existingsealed memfd on cache hit, falls back to
kbox_shadow_create()+seal + insert on miss.
KBOX_SHADOW_CACHE_MAX(64 entries).(dev, ino, mtime_sec, mtime_nsec, size). Any mismatchevicts the stale entry.
kbox_shadow_create()at every read-only, seal-bound callsite in
seccomp-dispatch.c,dispatch-exec.c,image.c.exec_memfdpaths indispatch-exec.candimage.ckeep the originalkbox_shadow_create()because theypatch PT_INTERP via
pwrite()and require a private writable memfd.Verification
Syscall count reduction
Workload:
ls; ls; ls; cat /etc/os-release; cat /etc/os-releaseinside an Alpine rootfs, traced via
strace -c -e trace=memfd_create,pread64.memfd_createdrops as the ld-musl interpreter isdup()'d on everyexec after the first.
pread64falls less because most pread64 inthis workload comes from
catreading file contents, not from shadowconstruction.
Latency (locally measured against the Alpine rootfs)
Targeting
/lib/libc.musl-x86_64.so.1(662 KB) with an"the
mmap()syscall, with open and close outside the timer window" loop for200 iterations:
The baseline-subtracted breakdown separates what strace cannot
distinguish:
mmapcost,present in every path.
fstat + hash lookup + dup + addfdinjection into the tracee) .cache (the full
pread64-copy-and-seal loop for a 662 KB file).The end-to-end 47× speedup reflects the complete
open+mmapround-trip as a caller would observe it; the issue projected
"halving" (2×). The baseline-subtracted mechanism speedup of
~86× isolates the shadow-creation cost from the kernel mmap
overhead on both sides.
The warm path's tight distribution (p50 = 13.1 μs, only one tail
sample at 86 μs over 200 iterations) shows the hit path is stable
with no systematic long tail.
All existing unit tests pass (
make check-unit: 280/280).Follow-ups (not in this PR)
I have local commits ready for:
tests/unit/test-shadow-cache.cwith six unit tests covering hit,mtime invalidation, LRU eviction, sealed-fd invariant,
close-does-not-evict, and reset semantics.
shadow-bench-test) that producedthe latency numbers above.
Closes #37
Summary by cubic
Adds an inode-keyed cache for sealed read-only shadow memfds to reuse them across repeated opens and cut redundant copy work on hot files. Read-only call sites now use the cached path; writable
exec_memfdpaths stay on the original creator.kbox_shadow_create_cached()returns adup()of a sealed memfd on hit; falls back to create + seal + insert on miss.KBOX_SHADOW_CACHE_MAX(64), keyed by(dev, ino, mtime_sec, mtime_nsec, size)with stale-entry eviction.kbox_shadow_cache_reset()and basic counters (kbox_shadow_cache_size/hits/misses).seccomp-dispatch.c,dispatch-exec.c, andimage.c; explicitkbox_shadow_seal()calls removed on those paths.Written for commit 00000ef. Summary will update on new commits.