core: ksuid_explicit_bzero shim + DSE-resistant wipe of CSPRNG state by justinjoy · Pull Request #10 · semantic-reasoning/libksuid

justinjoy · 2026-04-30T06:16:06Z

Closes #2.

Summary

libksuid/rand_tls.c and libksuid/chacha20.c had four memset(p, 0, n) calls on sensitive material (CSPRNG seed bytes, freshly-drawn key+nonce, consumed keystream chunks, ChaCha20 internal state). At -O2 and beyond the compiler is allowed -- and increasingly does -- to elide those stores via dead-store elimination, because the wiped buffers are not subsequently read. For a CSPRNG that source comments advertise as having "wipe semantics", that's exactly the wrong outcome.

This PR introduces libksuid/wipe.h::ksuid_explicit_bzero, a private static inline shim that resolves at compile time to the strongest DSE-immune primitive the target offers, and rewires the four wipe sites to use it.

Series — two atomic commits

Commit	Purpose
`b2a7093` `feat:`	New `wipe.h` shim + meson dual-header probe + `summary()` of selected backend + `KSUID_FORCE_VOLATILE_FALLBACK` testing override + `tests/test_wipe.c`
`2d88183` `core:`	Wire shim into `rand_tls.c` (3 sites) + `chacha20.c::x[16]` + two CI gates (auto-build disasm grep + dedicated fallback-coverage job)

Resolution ladder

1. explicit_bzero  (glibc 2.25+, MUSL, *BSD, macOS 14.4+)
   - dual probe: try <string.h> first, then <strings.h>
   - both probes pass -D_DEFAULT_SOURCE explicitly
2. SecureZeroMemory  (Windows / Cygwin via <windows.h>)
3. memset_s  (C11 Annex K, rare)
4. Indirect-call-through-volatile fallback:
   static volatile fn-ptr to memset + __asm__ __volatile__ memory clobber

KSUID_FORCE_VOLATILE_FALLBACK build flag bypasses every primitive arm and forces the fallback. CI uses it on a dedicated job to exercise the path that no production matrix lane would otherwise reach.

Pipeline that ran

Per the global GitHub-issue resolution workflow rule:

Architect study — 2-commit split, dual-header probe, _DEFAULT_SOURCE in probe, function-pointer-through-volatile fallback, chacha20.c x[16] wipe in scope.
Critic study — 10-item risk register: dual-header probe (R1), _DEFAULT_SOURCE (R2), MSVC SecureZeroMemory signature (R3), volatile fallback DSE-resistance (R4), DSE proof via objdump (R5), null guard (R6), clang-tidy (R7), chacha20 x[16] (R8), thread-exit boundary (R9), CI summary auditability (R10).
Synthesize — adopted 2-commit split with mitigations for every R1..R10.
Implementer round 1 — committed.
Reviewer round 1 — PASS (11/11 contractual items, 12 surviving wipe calls in disasm, 13/13 tests).
Architect meta round 1 — SIGN-OFF.
Critic meta round 1 — SEND BACK with two blockers:
- R5 disasm gate fragility: count varied 6 vs 12 across builds; floor 4 felt low; regex misses fallback path; ls libksuid.so.* matched the .p object-archive directory.
- R4 fallback path zero matrix coverage: every lane selects explicit_bzero; volatile fallback ships untested.
Implementer rewound + reconstituted into b2a7093 + 2d88183:
- KSUID_FORCE_VOLATILE_FALLBACK macro added to wipe.h to bypass every primitive arm
- New wipe-fallback CI job builds with that flag, runs test_wipe, and asserts zero <explicit_bzero@plt> references in the resulting library
- Auto-build gate path matcher tightened to find -type f -maxdepth 1 -name 'libksuid.so.*' | grep -E '...\.[0-9]+\.[0-9]+\.[0-9]+$' | head -1
Reviewer round 2 — PASS. Both blockers verified fixed; both builds green; auto-gate count 6 (>=4); fallback explicit_bzero@plt count 0.
Architect meta round 2 — SIGN-OFF.
Critic meta round 2 — SIGN-OFF (with non-blocking follow-up note: wipe-fallback job runs only test_wipe; could expand to full suite to also exercise test_chacha20 / test_rand_tls under fallback semantics).

What gates this PR

gst-indent + clang-tidy 22 (lint phase)
Build/test on Ubuntu GCC + Clang, macOS Clang, Windows MSVC (the platform issue Wipe CSPRNG state with explicit_bzero / SecureZeroMemory shim (defeat DSE) #2 is most relevant for, since SecureZeroMemory is the Windows backend)
DESTDIR install verification (header + license artifacts)
ASan + UBSan on Linux + macOS
meson dist round-trip on Ubuntu
NEW: wipe-fallback job — builds with -DKSUID_FORCE_VOLATILE_FALLBACK=1, runs test_wipe, asserts zero explicit_bzero@plt in the resulting library
NEW: auto-build disasm gate on Ubuntu GCC — fails the build if fewer than 4 surviving wipe calls are found in the optimized library (proves DSE didn't eat the wipes)

Test plan

lint phase green
build matrix green on all four OS+compiler lanes
DESTDIR install lays down ${prefix}/include/libksuid/ksuid.h (wipe.h is private; not installed)
sanitizers green
meson dist round-trip green
wipe-fallback job green — proves fallback path works on a host that would otherwise never run it
auto-build disasm gate green — proves DSE didn't elide the wipes in the production build

Out of scope

TLS-state lifetime: per-thread ksuid_tls_rng_t survives until the OS reclaims the TLS block. Wiping at thread exit is issue Wipe per-thread CSPRNG state at thread exit (residue policy) #4. A TODO(#4) banner near the top of rand_tls.c documents the boundary.

Follow-ups (low-risk, non-blocking)

Critic round-2 noted that wipe-fallback runs only test_wipe (which exercises standalone buffers) rather than the full suite (which would also drive test_chacha20 / test_rand_tls under fallback semantics). Cheap follow-up: replace meson test -C builddir-fb test_wipe with meson test -C builddir-fb. Not a blocker because the objdump inverse-gate already proves the library was built correctly under fallback.

…wipe Closes #2 (commit 1 of 2 in the series). Adds a private DSE-resistant zeroizer in libksuid/wipe.h. Plain memset(p, 0, n) on a buffer the compiler proves is never read again is allowed -- and at -O2+ encouraged -- to be elided entirely. For sensitive material (CSPRNG seed bytes, ChaCha20 internal state, freshly-drawn key material) that is exactly the wrong outcome. ksuid_explicit_bzero is a header-only static inline that resolves at compile time to the strongest DSE-immune primitive the target offers, in this order: 1. explicit_bzero (glibc 2.25+, MUSL, *BSD, macOS 14.4+) -- documented to resist optimisation. Two meson probes pick between <string.h> (modern glibc, macOS, OpenBSD) and <strings.h> (FreeBSD, NetBSD, older glibc/MUSL). 2. SecureZeroMemory (Windows / Cygwin via <windows.h>) -- MSDN guarantees the writes are not optimised away. 3. memset_s (C11 Annex K, rare) 4. Indirect-call-through-volatile fallback: a static volatile function pointer to memset, called via the pointer, followed by a memory-clobber asm barrier on GCC/Clang. The build-time KSUID_FORCE_VOLATILE_FALLBACK macro bypasses every primitive branch and forces the fallback path. This exists so CI can exercise the fallback even on hosts that have explicit_bzero / SecureZeroMemory available -- without it the fallback would ship unverified on every supported matrix lane. Production builds never set this flag. The Critic risk register flagged seven concerns the implementation addresses up front: R1 dual-header probe: try <string.h> first, then <strings.h>. Glibc 2.43 on Arch only declares the prototype in <string.h>; FreeBSD only in <strings.h>; macOS varies by SDK. Single- header probes silently miss the platform's primary location. R2 _DEFAULT_SOURCE: meson cc.has_function does NOT inherit add_project_arguments(), so the probe must pass -D_DEFAULT_SOURCE explicitly. Without it glibc hides the prototype and the probe lies "no explicit_bzero". R4 fallback DSE-resistance: a naive `volatile uint8_t *vp = p; for (...) vp[i] = 0` can still be elided by some compilers because the volatile qualifier on the pointee, without read observation, is not always honoured. The shim uses the stronger pattern -- volatile-qualified function-pointer + trailing __asm__ __volatile__ memory clobber. R4-coverage KSUID_FORCE_VOLATILE_FALLBACK: makes the fallback path exercisable even when the host has a primitive available, so the auto matrix is not the only thing testing the shim. R6 null guard + bounded for-loop: `if (!p || !n) return;` with for-loop counter avoids size_t underflow that -fsanitize=unsigned-integer-overflow would flag. R10 meson summary(): emits the selected backend on configure ("wipe backend: explicit_bzero (<string.h>)" on Linux glibc, etc.), so CI logs make backend selection auditable across the matrix without having to re-run feature probes. Surface added: libksuid/wipe.h new private header (NOT installed), now gated on KSUID_FORCE_VOLATILE_FALLBACK so the fallback path is reachable on demand. meson.build two cc.has_function probes + KSUID_HAVE_EXPLICIT_BZERO_* defines + summary() output. tests/test_wipe.c smoke test: wipe a buffer, assert every byte is zero. Proves the shim *zeroes* (NOT that it resists DSE -- objdump grep in commit 2 proves that). Covers full buffer, subrange, zero-length, and NULL. tests/meson.build test_wipe registered in base_tests. Verified locally on Linux glibc 2.43 (auto build): meson summary reports "wipe backend: explicit_bzero (<string.h>)"; 13/13 tests pass; clang-tidy 22 still reports zero findings. Verified on KSUID_FORCE_VOLATILE_FALLBACK build: same backend probe, but the shim resolves to the volatile fn-ptr + asm clobber path; 13/13 tests still pass; objdump confirms zero explicit_bzero@plt calls in the resulting library. Commit 2 wires the shim into the four existing memset(0) sites in rand_tls.c + chacha20.c that hold sensitive data, plus the CI gates that prove the wipes survive optimisation in both build modes. Out of scope: TLS-state wipe at thread exit. That is issue #4.

Closes #2 (commit 2 of 2 in the series). Replaces four DSE-vulnerable plain-memset(0) sites with the new ksuid_explicit_bzero shim that landed in commit 1. Adds two CI gates that together prove the wipes (a) survive optimisation in the default build and (b) work on the portable fallback path. Sites converted to ksuid_explicit_bzero: libksuid/rand_tls.c:86 partial-seed wipe on RNG failure libksuid/rand_tls.c:109 kn[44] wipe after key/nonce copied to TLS state libksuid/rand_tls.c:168 consumed-keystream wipe in ksuid_random_bytes inner loop libksuid/chacha20.c:66 ksuid_chacha20_block local x[16] -- the post-permutation state, which is keystream-mixed and a leak vector for stack-read primitives in sibling frames. The fourth wipe (chacha20.c x[16]) was a Critic-flagged scope addition: the issue body said "any temporary state that holds key material" and the round-mixed x[] qualifies. Cost is one 64-byte wipe per ChaCha block, dominated by the 20-round permutation it follows; benchmarked overhead is in the noise. The seed-time `r->buf` zero-fill (rand_tls.c:117) deliberately stays as plain memset -- it is initialisation before the first keystream block overwrites the buffer, not secret-erasure, so DSE is not a hazard. CI gates added (.github/workflows/ci-pr.yml): Phase 2a (auto build, Ubuntu GCC): Runs `objdump -d libksuid.so.<ver> | grep -E 'call .*<(explicit_bzero|ksuid_explicit_bzero)'` and fails the build if fewer than four surviving call sites are found. The floor of 4 matches the source-level call count; observed locally on glibc 2.43 / GCC 15.2.1 is 6 surviving calls (the static-inline shim is partially inlined and partially kept out-of-line). The path matcher uses `find -type f` so the libksuid.so.<ver>.p object-archive directory cannot leak into the disasm input. Critic R5 mitigation. Phase 2b (KSUID_FORCE_VOLATILE_FALLBACK build): A NEW dedicated job that builds with -DKSUID_FORCE_VOLATILE_FALLBACK=1 to bypass every platform primitive and exercise the volatile-fn-ptr fallback path. test_wipe runs against this build to prove the fallback still zeroes correctly. A secondary objdump grep asserts the fallback library has zero `call <explicit_bzero@plt>` references, catching a regression where a future contributor adds a primitive without gating on the force macro. Critic R4-coverage mitigation -- without this, every matrix lane would silently select explicit_bzero and the fallback would ship untested. Out-of-scope, deliberately not addressed: - TLS-state lifetime: the per-thread ksuid_tls_rng_t survives until the OS reclaims the TLS block. Wiping at thread exit is issue #4. A `TODO(#4)` banner in rand_tls.c documents the boundary so a reader does not assume this PR closed it. 13/13 tests pass on both auto and force-fallback builds; clang- tidy 22 still reports zero findings; gst-indent leaves the working tree untouched; meson reports `wipe backend: explicit_bzero (<string.h>)` on Linux glibc 2.43.

macOS Clang on the macos-latest runner failed to compile wipe.h with ../libksuid/wipe.h:83:3: error: call to undeclared function 'memset_s'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] The other matrix lanes pass: Ubuntu GCC + Clang select explicit_bzero (<string.h>); Windows MSVC selects SecureZeroMemory; the wipe-fallback job forces the volatile path. Only macOS falls through every explicit_bzero probe (the macos-latest SDK doesn't expose the prototype the way the probe is shaped) and lands on the memset_s arm. Root cause: the memset_s prototype in <string.h> on every libc that provides it (glibc, Apple libc, MUSL, ...) is gated behind __STDC_WANT_LIB_EXT1__. wipe.h tried to opt in by defining the macro right before its conditional <string.h> include, but the header had already pulled <string.h> in unconditionally at the top -- and the include guard prevented the second include from re-emitting the prototype. The opt-in came too late. Fix: when the meson probe selects the memset_s arm, also push -D__STDC_WANT_LIB_EXT1__=1 into common_args so every translation unit sees the macro set BEFORE its first <string.h> include, regardless of include order. The redundant `#define` inside wipe.h becomes a no-op and is replaced with a comment that explains why the project-wide define is the only correct fix. Linux glibc 2.43 is unaffected because the project-arg macro is harmless when set on a libc that already exposes memset_s unconditionally (or doesn't ship it at all). The primitive that gets selected on Linux GCC is still explicit_bzero (<string.h>) per the meson summary line. Verified locally on Linux GCC: 13/13 tests pass on both auto and KSUID_FORCE_VOLATILE_FALLBACK builds; meson summary unchanged. The fix is small enough that it is being landed on the same PR as the shim itself rather than as a separate follow-up.

justinjoy added 3 commits April 30, 2026 15:12

justinjoy merged commit cd155c9 into main Apr 30, 2026
11 checks passed

justinjoy deleted the feature/issue-2-explicit-bzero branch April 30, 2026 06:30

justinjoy mentioned this pull request Apr 30, 2026

AVX2 8-wide ksuid_string_batch kernel (follow-up to #5) #13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core: ksuid_explicit_bzero shim + DSE-resistant wipe of CSPRNG state#10

core: ksuid_explicit_bzero shim + DSE-resistant wipe of CSPRNG state#10
justinjoy merged 3 commits intomainfrom
feature/issue-2-explicit-bzero

justinjoy commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justinjoy commented Apr 30, 2026

Summary

Series — two atomic commits

Resolution ladder

Pipeline that ran

What gates this PR

Test plan

Out of scope

Follow-ups (low-risk, non-blocking)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant