Skip to content

shadow-fd: cache sealed memfds across repeated opens#67

Open
brian049 wants to merge 1 commit into
sysprog21:mainfrom
brian049:pr/issue37
Open

shadow-fd: cache sealed memfds across repeated opens#67
brian049 wants to merge 1 commit into
sysprog21:mainfrom
brian049:pr/issue37

Conversation

@brian049
Copy link
Copy Markdown

@brian049 brian049 commented May 11, 2026

This PR resolves issue #37 by introducing an inode-keyed cache for
sealed shadow memfds, eliminating the redundant pread64 copy on every
read-only open of hot files.

Summary

  • Adds kbox_shadow_create_cached(): returns dup() of an existing
    sealed memfd on cache hit, falls back to kbox_shadow_create() +
    seal + insert on miss.
  • LRU eviction with capacity KBOX_SHADOW_CACHE_MAX (64 entries).
  • Cache key: (dev, ino, mtime_sec, mtime_nsec, size). Any mismatch
    evicts the stale entry.
  • Replaces kbox_shadow_create() at every read-only, seal-bound call
    site in seccomp-dispatch.c, dispatch-exec.c, image.c.
  • The two main-binary exec_memfd paths in dispatch-exec.c and
    image.c keep the original kbox_shadow_create() because they
    patch PT_INTERP via pwrite() and require a private writable memfd.

Verification

Syscall count reduction

Workload: ls; ls; ls; cat /etc/os-release; cat /etc/os-release
inside an Alpine rootfs, traced via
strace -c -e trace=memfd_create,pread64.

before after
memfd_create 12 7
pread64 34 33

memfd_create drops as the ld-musl interpreter is dup()'d on every
exec after the first. pread64 falls less because most pread64 in
this workload comes from cat reading file contents, not from shadow
construction.

Latency (locally measured against the Alpine rootfs)

Targeting /lib/libc.musl-x86_64.so.1 (662 KB) with an
"the mmap() syscall, with open and close outside the timer window" loop for
200 iterations:

  anon baseline (kernel mmap only):       6.3 us avg
  cold  (shadow create + mmap):         639.9 us
  warm  (cache hit + mmap):              13.6 us avg
                                          p50=13.1  min=10.0  max=86.1

  Derived (baseline-subtracted):
  shadow create overhead:               626.3 us  (cold - warm avg)
  cache hit + addfd overhead:             7.3 us  (warm avg - anon avg)

  Speedup:
  end-to-end:                            47.0x  (cold / warm avg)
  mechanism:                             85.8x  (shadow_create / cache_hit,
                                                 baseline-subtracted)

The baseline-subtracted breakdown separates what strace cannot
distinguish:

  1. anon baseline (6.3 μs): irreducible kernel mmap cost,
    present in every path.
  2. warm − anon = 7.3 μs: cache machinery on a hit (fstat + hash lookup + dup + addfd injection into the tracee) .
  3. cold − warm = 626.3 μs: the cost saved per open by the
    cache (the full pread64-copy-and-seal loop for a 662 KB file).

The end-to-end 47× speedup reflects the complete open+mmap
round-trip as a caller would observe it; the issue projected
"halving" (2×). The baseline-subtracted mechanism speedup of
~86× isolates the shadow-creation cost from the kernel mmap
overhead on both sides.

The warm path's tight distribution (p50 = 13.1 μs, only one tail
sample at 86 μs over 200 iterations) shows the hit path is stable
with no systematic long tail.

All existing unit tests pass (make check-unit: 280/280).

Follow-ups (not in this PR)

I have local commits ready for:

  • tests/unit/test-shadow-cache.c with six unit tests covering hit,
    mtime invalidation, LRU eviction, sealed-fd invariant,
    close-does-not-evict, and reset semantics.
  • A guest-side benchmark binary (shadow-bench-test) that produced
    the latency numbers above.

Closes #37


Summary by cubic

Adds an inode-keyed cache for sealed read-only shadow memfds to reuse them across repeated opens and cut redundant copy work on hot files. Read-only call sites now use the cached path; writable exec_memfd paths stay on the original creator.

  • New Features
    • New kbox_shadow_create_cached() returns a dup() of a sealed memfd on hit; falls back to create + seal + insert on miss.
    • LRU cache capped at KBOX_SHADOW_CACHE_MAX (64), keyed by (dev, ino, mtime_sec, mtime_nsec, size) with stale-entry eviction.
    • Exposes kbox_shadow_cache_reset() and basic counters (kbox_shadow_cache_size/hits/misses).
    • Replaces read-only, seal-bound call sites in seccomp-dispatch.c, dispatch-exec.c, and image.c; explicit kbox_shadow_seal() calls removed on those paths.

Written for commit 00000ef. Summary will update on new commits.

kbox_shadow_create() rebuilt a fresh memfd on every read-only open
by copying the entire file contents through LKL via a pread64 loop.
For hot files opened repeatedly across exec like dynamic linker,
libc, and other shared objects re-opened by every fork+exec, and
this redundant copy dominated the open path.

This commit add an inode-keyed cache that reuses sealed memfds
across opens. The cache holds up to 64 entries in an LRU list,
and each lookup is a linear scan of a small array. The key is
(dev, ino, mtime_sec, mtime_nsec, size)

Cached fds are always sealed (F_SEAL_WRITE | GROW | SHRINK | SEAL)
before insertion, and only the read-only path enters the cache.
The new entry point kbox_shadow_create_cached() replaces every
read-only and seal-bound call site.
The two main-binary exec_memfd paths in dispatch-exec.c and
image.c keep the original kbox_shadow_create() because they patch
PT_INTERP via pwrite() and must own a private and writable memfd.

Callers always receive a dup() of the cached entry, never the
cached fd itself, so closing the returned fd never affects the
cache.

Change-Id: I31c148d2513983562628feaccdf0abfba6e70ed5
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 5 files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cache shadow memfds across repeated opens to reduce open+close latency

1 participant