This repository was archived by the owner on May 1, 2026. It is now read-only.
core: ksuid_explicit_bzero shim + DSE-resistant wipe of CSPRNG state#10
Merged
core: ksuid_explicit_bzero shim + DSE-resistant wipe of CSPRNG state#10
Conversation
…wipe Closes #2 (commit 1 of 2 in the series). Adds a private DSE-resistant zeroizer in libksuid/wipe.h. Plain memset(p, 0, n) on a buffer the compiler proves is never read again is allowed -- and at -O2+ encouraged -- to be elided entirely. For sensitive material (CSPRNG seed bytes, ChaCha20 internal state, freshly-drawn key material) that is exactly the wrong outcome. ksuid_explicit_bzero is a header-only static inline that resolves at compile time to the strongest DSE-immune primitive the target offers, in this order: 1. explicit_bzero (glibc 2.25+, MUSL, *BSD, macOS 14.4+) -- documented to resist optimisation. Two meson probes pick between <string.h> (modern glibc, macOS, OpenBSD) and <strings.h> (FreeBSD, NetBSD, older glibc/MUSL). 2. SecureZeroMemory (Windows / Cygwin via <windows.h>) -- MSDN guarantees the writes are not optimised away. 3. memset_s (C11 Annex K, rare) 4. Indirect-call-through-volatile fallback: a static volatile function pointer to memset, called via the pointer, followed by a memory-clobber asm barrier on GCC/Clang. The build-time KSUID_FORCE_VOLATILE_FALLBACK macro bypasses every primitive branch and forces the fallback path. This exists so CI can exercise the fallback even on hosts that have explicit_bzero / SecureZeroMemory available -- without it the fallback would ship unverified on every supported matrix lane. Production builds never set this flag. The Critic risk register flagged seven concerns the implementation addresses up front: R1 dual-header probe: try <string.h> first, then <strings.h>. Glibc 2.43 on Arch only declares the prototype in <string.h>; FreeBSD only in <strings.h>; macOS varies by SDK. Single- header probes silently miss the platform's primary location. R2 _DEFAULT_SOURCE: meson cc.has_function does NOT inherit add_project_arguments(), so the probe must pass -D_DEFAULT_SOURCE explicitly. Without it glibc hides the prototype and the probe lies "no explicit_bzero". R4 fallback DSE-resistance: a naive `volatile uint8_t *vp = p; for (...) vp[i] = 0` can still be elided by some compilers because the volatile qualifier on the pointee, without read observation, is not always honoured. The shim uses the stronger pattern -- volatile-qualified function-pointer + trailing __asm__ __volatile__ memory clobber. R4-coverage KSUID_FORCE_VOLATILE_FALLBACK: makes the fallback path exercisable even when the host has a primitive available, so the auto matrix is not the only thing testing the shim. R6 null guard + bounded for-loop: `if (!p || !n) return;` with for-loop counter avoids size_t underflow that -fsanitize=unsigned-integer-overflow would flag. R10 meson summary(): emits the selected backend on configure ("wipe backend: explicit_bzero (<string.h>)" on Linux glibc, etc.), so CI logs make backend selection auditable across the matrix without having to re-run feature probes. Surface added: libksuid/wipe.h new private header (NOT installed), now gated on KSUID_FORCE_VOLATILE_FALLBACK so the fallback path is reachable on demand. meson.build two cc.has_function probes + KSUID_HAVE_EXPLICIT_BZERO_* defines + summary() output. tests/test_wipe.c smoke test: wipe a buffer, assert every byte is zero. Proves the shim *zeroes* (NOT that it resists DSE -- objdump grep in commit 2 proves that). Covers full buffer, subrange, zero-length, and NULL. tests/meson.build test_wipe registered in base_tests. Verified locally on Linux glibc 2.43 (auto build): meson summary reports "wipe backend: explicit_bzero (<string.h>)"; 13/13 tests pass; clang-tidy 22 still reports zero findings. Verified on KSUID_FORCE_VOLATILE_FALLBACK build: same backend probe, but the shim resolves to the volatile fn-ptr + asm clobber path; 13/13 tests still pass; objdump confirms zero explicit_bzero@plt calls in the resulting library. Commit 2 wires the shim into the four existing memset(0) sites in rand_tls.c + chacha20.c that hold sensitive data, plus the CI gates that prove the wipes survive optimisation in both build modes. Out of scope: TLS-state wipe at thread exit. That is issue #4.
Closes #2 (commit 2 of 2 in the series). Replaces four DSE-vulnerable plain-memset(0) sites with the new ksuid_explicit_bzero shim that landed in commit 1. Adds two CI gates that together prove the wipes (a) survive optimisation in the default build and (b) work on the portable fallback path. Sites converted to ksuid_explicit_bzero: libksuid/rand_tls.c:86 partial-seed wipe on RNG failure libksuid/rand_tls.c:109 kn[44] wipe after key/nonce copied to TLS state libksuid/rand_tls.c:168 consumed-keystream wipe in ksuid_random_bytes inner loop libksuid/chacha20.c:66 ksuid_chacha20_block local x[16] -- the post-permutation state, which is keystream-mixed and a leak vector for stack-read primitives in sibling frames. The fourth wipe (chacha20.c x[16]) was a Critic-flagged scope addition: the issue body said "any temporary state that holds key material" and the round-mixed x[] qualifies. Cost is one 64-byte wipe per ChaCha block, dominated by the 20-round permutation it follows; benchmarked overhead is in the noise. The seed-time `r->buf` zero-fill (rand_tls.c:117) deliberately stays as plain memset -- it is initialisation before the first keystream block overwrites the buffer, not secret-erasure, so DSE is not a hazard. CI gates added (.github/workflows/ci-pr.yml): Phase 2a (auto build, Ubuntu GCC): Runs `objdump -d libksuid.so.<ver> | grep -E 'call .*<(explicit_bzero|ksuid_explicit_bzero)'` and fails the build if fewer than four surviving call sites are found. The floor of 4 matches the source-level call count; observed locally on glibc 2.43 / GCC 15.2.1 is 6 surviving calls (the static-inline shim is partially inlined and partially kept out-of-line). The path matcher uses `find -type f` so the libksuid.so.<ver>.p object-archive directory cannot leak into the disasm input. Critic R5 mitigation. Phase 2b (KSUID_FORCE_VOLATILE_FALLBACK build): A NEW dedicated job that builds with -DKSUID_FORCE_VOLATILE_FALLBACK=1 to bypass every platform primitive and exercise the volatile-fn-ptr fallback path. test_wipe runs against this build to prove the fallback still zeroes correctly. A secondary objdump grep asserts the fallback library has zero `call <explicit_bzero@plt>` references, catching a regression where a future contributor adds a primitive without gating on the force macro. Critic R4-coverage mitigation -- without this, every matrix lane would silently select explicit_bzero and the fallback would ship untested. Out-of-scope, deliberately not addressed: - TLS-state lifetime: the per-thread ksuid_tls_rng_t survives until the OS reclaims the TLS block. Wiping at thread exit is issue #4. A `TODO(#4)` banner in rand_tls.c documents the boundary so a reader does not assume this PR closed it. 13/13 tests pass on both auto and force-fallback builds; clang- tidy 22 still reports zero findings; gst-indent leaves the working tree untouched; meson reports `wipe backend: explicit_bzero (<string.h>)` on Linux glibc 2.43.
macOS Clang on the macos-latest runner failed to compile wipe.h with ../libksuid/wipe.h:83:3: error: call to undeclared function 'memset_s'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] The other matrix lanes pass: Ubuntu GCC + Clang select explicit_bzero (<string.h>); Windows MSVC selects SecureZeroMemory; the wipe-fallback job forces the volatile path. Only macOS falls through every explicit_bzero probe (the macos-latest SDK doesn't expose the prototype the way the probe is shaped) and lands on the memset_s arm. Root cause: the memset_s prototype in <string.h> on every libc that provides it (glibc, Apple libc, MUSL, ...) is gated behind __STDC_WANT_LIB_EXT1__. wipe.h tried to opt in by defining the macro right before its conditional <string.h> include, but the header had already pulled <string.h> in unconditionally at the top -- and the include guard prevented the second include from re-emitting the prototype. The opt-in came too late. Fix: when the meson probe selects the memset_s arm, also push -D__STDC_WANT_LIB_EXT1__=1 into common_args so every translation unit sees the macro set BEFORE its first <string.h> include, regardless of include order. The redundant `#define` inside wipe.h becomes a no-op and is replaced with a comment that explains why the project-wide define is the only correct fix. Linux glibc 2.43 is unaffected because the project-arg macro is harmless when set on a libc that already exposes memset_s unconditionally (or doesn't ship it at all). The primitive that gets selected on Linux GCC is still explicit_bzero (<string.h>) per the meson summary line. Verified locally on Linux GCC: 13/13 tests pass on both auto and KSUID_FORCE_VOLATILE_FALLBACK builds; meson summary unchanged. The fix is small enough that it is being landed on the same PR as the shim itself rather than as a separate follow-up.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2.
Summary
libksuid/rand_tls.candlibksuid/chacha20.chad fourmemset(p, 0, n)calls on sensitive material (CSPRNG seed bytes, freshly-drawn key+nonce, consumed keystream chunks, ChaCha20 internal state). At-O2and beyond the compiler is allowed -- and increasingly does -- to elide those stores via dead-store elimination, because the wiped buffers are not subsequently read. For a CSPRNG that source comments advertise as having "wipe semantics", that's exactly the wrong outcome.This PR introduces
libksuid/wipe.h::ksuid_explicit_bzero, a privatestatic inlineshim that resolves at compile time to the strongest DSE-immune primitive the target offers, and rewires the four wipe sites to use it.Series — two atomic commits
b2a7093feat:wipe.hshim + meson dual-header probe +summary()of selected backend +KSUID_FORCE_VOLATILE_FALLBACKtesting override +tests/test_wipe.c2d88183core:rand_tls.c(3 sites) +chacha20.c::x[16]+ two CI gates (auto-build disasm grep + dedicated fallback-coverage job)Resolution ladder
KSUID_FORCE_VOLATILE_FALLBACKbuild flag bypasses every primitive arm and forces the fallback. CI uses it on a dedicated job to exercise the path that no production matrix lane would otherwise reach.Pipeline that ran
Per the global GitHub-issue resolution workflow rule:
_DEFAULT_SOURCEin probe, function-pointer-through-volatile fallback,chacha20.c x[16]wipe in scope._DEFAULT_SOURCE(R2), MSVCSecureZeroMemorysignature (R3), volatile fallback DSE-resistance (R4), DSE proof via objdump (R5), null guard (R6), clang-tidy (R7), chacha20x[16](R8), thread-exit boundary (R9), CI summary auditability (R10).ls libksuid.so.*matched the.pobject-archive directory.explicit_bzero; volatile fallback ships untested.b2a7093+2d88183:KSUID_FORCE_VOLATILE_FALLBACKmacro added towipe.hto bypass every primitive armwipe-fallbackCI job builds with that flag, runstest_wipe, and asserts zero<explicit_bzero@plt>references in the resulting libraryfind -type f -maxdepth 1 -name 'libksuid.so.*' | grep -E '...\.[0-9]+\.[0-9]+\.[0-9]+$' | head -1explicit_bzero@pltcount 0.wipe-fallbackjob runs onlytest_wipe; could expand to full suite to also exercisetest_chacha20/test_rand_tlsunder fallback semantics).What gates this PR
gst-indent+clang-tidy 22(lint phase)SecureZeroMemoryis the Windows backend)meson distround-trip on Ubuntuwipe-fallbackjob — builds with-DKSUID_FORCE_VOLATILE_FALLBACK=1, runstest_wipe, asserts zeroexplicit_bzero@pltin the resulting libraryTest plan
${prefix}/include/libksuid/ksuid.h(wipe.h is private; not installed)meson distround-trip greenwipe-fallbackjob green — proves fallback path works on a host that would otherwise never run itOut of scope
ksuid_tls_rng_tsurvives until the OS reclaims the TLS block. Wiping at thread exit is issue Wipe per-thread CSPRNG state at thread exit (residue policy) #4. ATODO(#4)banner near the top ofrand_tls.cdocuments the boundary.Follow-ups (low-risk, non-blocking)
wipe-fallbackruns onlytest_wipe(which exercises standalone buffers) rather than the full suite (which would also drivetest_chacha20/test_rand_tlsunder fallback semantics). Cheap follow-up: replacemeson test -C builddir-fb test_wipewithmeson test -C builddir-fb. Not a blocker because the objdump inverse-gate already proves the library was built correctly under fallback.