Skip to content
This repository was archived by the owner on May 1, 2026. It is now read-only.

core: ksuid_explicit_bzero shim + DSE-resistant wipe of CSPRNG state#10

Merged
justinjoy merged 3 commits intomainfrom
feature/issue-2-explicit-bzero
Apr 30, 2026
Merged

core: ksuid_explicit_bzero shim + DSE-resistant wipe of CSPRNG state#10
justinjoy merged 3 commits intomainfrom
feature/issue-2-explicit-bzero

Conversation

@justinjoy
Copy link
Copy Markdown
Contributor

Closes #2.

Summary

libksuid/rand_tls.c and libksuid/chacha20.c had four memset(p, 0, n) calls on sensitive material (CSPRNG seed bytes, freshly-drawn key+nonce, consumed keystream chunks, ChaCha20 internal state). At -O2 and beyond the compiler is allowed -- and increasingly does -- to elide those stores via dead-store elimination, because the wiped buffers are not subsequently read. For a CSPRNG that source comments advertise as having "wipe semantics", that's exactly the wrong outcome.

This PR introduces libksuid/wipe.h::ksuid_explicit_bzero, a private static inline shim that resolves at compile time to the strongest DSE-immune primitive the target offers, and rewires the four wipe sites to use it.

Series — two atomic commits

Commit Purpose
b2a7093 feat: New wipe.h shim + meson dual-header probe + summary() of selected backend + KSUID_FORCE_VOLATILE_FALLBACK testing override + tests/test_wipe.c
2d88183 core: Wire shim into rand_tls.c (3 sites) + chacha20.c::x[16] + two CI gates (auto-build disasm grep + dedicated fallback-coverage job)

Resolution ladder

1. explicit_bzero  (glibc 2.25+, MUSL, *BSD, macOS 14.4+)
   - dual probe: try <string.h> first, then <strings.h>
   - both probes pass -D_DEFAULT_SOURCE explicitly
2. SecureZeroMemory  (Windows / Cygwin via <windows.h>)
3. memset_s  (C11 Annex K, rare)
4. Indirect-call-through-volatile fallback:
   static volatile fn-ptr to memset + __asm__ __volatile__ memory clobber

KSUID_FORCE_VOLATILE_FALLBACK build flag bypasses every primitive arm and forces the fallback. CI uses it on a dedicated job to exercise the path that no production matrix lane would otherwise reach.

Pipeline that ran

Per the global GitHub-issue resolution workflow rule:

  1. Architect study — 2-commit split, dual-header probe, _DEFAULT_SOURCE in probe, function-pointer-through-volatile fallback, chacha20.c x[16] wipe in scope.
  2. Critic study — 10-item risk register: dual-header probe (R1), _DEFAULT_SOURCE (R2), MSVC SecureZeroMemory signature (R3), volatile fallback DSE-resistance (R4), DSE proof via objdump (R5), null guard (R6), clang-tidy (R7), chacha20 x[16] (R8), thread-exit boundary (R9), CI summary auditability (R10).
  3. Synthesize — adopted 2-commit split with mitigations for every R1..R10.
  4. Implementer round 1 — committed.
  5. Reviewer round 1 — PASS (11/11 contractual items, 12 surviving wipe calls in disasm, 13/13 tests).
  6. Architect meta round 1 — SIGN-OFF.
  7. Critic meta round 1 — SEND BACK with two blockers:
    • R5 disasm gate fragility: count varied 6 vs 12 across builds; floor 4 felt low; regex misses fallback path; ls libksuid.so.* matched the .p object-archive directory.
    • R4 fallback path zero matrix coverage: every lane selects explicit_bzero; volatile fallback ships untested.
  8. Implementer rewound + reconstituted into b2a7093 + 2d88183:
    • KSUID_FORCE_VOLATILE_FALLBACK macro added to wipe.h to bypass every primitive arm
    • New wipe-fallback CI job builds with that flag, runs test_wipe, and asserts zero <explicit_bzero@plt> references in the resulting library
    • Auto-build gate path matcher tightened to find -type f -maxdepth 1 -name 'libksuid.so.*' | grep -E '...\.[0-9]+\.[0-9]+\.[0-9]+$' | head -1
  9. Reviewer round 2 — PASS. Both blockers verified fixed; both builds green; auto-gate count 6 (>=4); fallback explicit_bzero@plt count 0.
  10. Architect meta round 2 — SIGN-OFF.
  11. Critic meta round 2 — SIGN-OFF (with non-blocking follow-up note: wipe-fallback job runs only test_wipe; could expand to full suite to also exercise test_chacha20 / test_rand_tls under fallback semantics).

What gates this PR

  • gst-indent + clang-tidy 22 (lint phase)
  • Build/test on Ubuntu GCC + Clang, macOS Clang, Windows MSVC (the platform issue Wipe CSPRNG state with explicit_bzero / SecureZeroMemory shim (defeat DSE) #2 is most relevant for, since SecureZeroMemory is the Windows backend)
  • DESTDIR install verification (header + license artifacts)
  • ASan + UBSan on Linux + macOS
  • meson dist round-trip on Ubuntu
  • NEW: wipe-fallback job — builds with -DKSUID_FORCE_VOLATILE_FALLBACK=1, runs test_wipe, asserts zero explicit_bzero@plt in the resulting library
  • NEW: auto-build disasm gate on Ubuntu GCC — fails the build if fewer than 4 surviving wipe calls are found in the optimized library (proves DSE didn't eat the wipes)

Test plan

  • lint phase green
  • build matrix green on all four OS+compiler lanes
  • DESTDIR install lays down ${prefix}/include/libksuid/ksuid.h (wipe.h is private; not installed)
  • sanitizers green
  • meson dist round-trip green
  • wipe-fallback job green — proves fallback path works on a host that would otherwise never run it
  • auto-build disasm gate green — proves DSE didn't elide the wipes in the production build

Out of scope

Follow-ups (low-risk, non-blocking)

  • Critic round-2 noted that wipe-fallback runs only test_wipe (which exercises standalone buffers) rather than the full suite (which would also drive test_chacha20 / test_rand_tls under fallback semantics). Cheap follow-up: replace meson test -C builddir-fb test_wipe with meson test -C builddir-fb. Not a blocker because the objdump inverse-gate already proves the library was built correctly under fallback.

…wipe

Closes #2 (commit 1 of 2 in the series).

Adds a private DSE-resistant zeroizer in libksuid/wipe.h. Plain
memset(p, 0, n) on a buffer the compiler proves is never read again
is allowed -- and at -O2+ encouraged -- to be elided entirely. For
sensitive material (CSPRNG seed bytes, ChaCha20 internal state,
freshly-drawn key material) that is exactly the wrong outcome.

ksuid_explicit_bzero is a header-only static inline that resolves
at compile time to the strongest DSE-immune primitive the target
offers, in this order:

  1. explicit_bzero  (glibc 2.25+, MUSL, *BSD, macOS 14.4+)
     -- documented to resist optimisation. Two meson probes pick
        between <string.h> (modern glibc, macOS, OpenBSD) and
        <strings.h> (FreeBSD, NetBSD, older glibc/MUSL).
  2. SecureZeroMemory  (Windows / Cygwin via <windows.h>)
     -- MSDN guarantees the writes are not optimised away.
  3. memset_s  (C11 Annex K, rare)
  4. Indirect-call-through-volatile fallback: a static volatile
     function pointer to memset, called via the pointer, followed
     by a memory-clobber asm barrier on GCC/Clang.

The build-time KSUID_FORCE_VOLATILE_FALLBACK macro bypasses every
primitive branch and forces the fallback path. This exists so CI
can exercise the fallback even on hosts that have explicit_bzero /
SecureZeroMemory available -- without it the fallback would ship
unverified on every supported matrix lane. Production builds never
set this flag.

The Critic risk register flagged seven concerns the implementation
addresses up front:

  R1 dual-header probe: try <string.h> first, then <strings.h>.
     Glibc 2.43 on Arch only declares the prototype in <string.h>;
     FreeBSD only in <strings.h>; macOS varies by SDK. Single-
     header probes silently miss the platform's primary location.
  R2 _DEFAULT_SOURCE: meson cc.has_function does NOT inherit
     add_project_arguments(), so the probe must pass
     -D_DEFAULT_SOURCE explicitly. Without it glibc hides the
     prototype and the probe lies "no explicit_bzero".
  R4 fallback DSE-resistance: a naive `volatile uint8_t *vp = p;
     for (...) vp[i] = 0` can still be elided by some compilers
     because the volatile qualifier on the pointee, without read
     observation, is not always honoured. The shim uses the
     stronger pattern -- volatile-qualified function-pointer +
     trailing __asm__ __volatile__ memory clobber.
  R4-coverage KSUID_FORCE_VOLATILE_FALLBACK: makes the fallback
     path exercisable even when the host has a primitive available,
     so the auto matrix is not the only thing testing the shim.
  R6 null guard + bounded for-loop: `if (!p || !n) return;` with
     for-loop counter avoids size_t underflow that
     -fsanitize=unsigned-integer-overflow would flag.
  R10 meson summary(): emits the selected backend on configure
     ("wipe backend: explicit_bzero (<string.h>)" on Linux glibc,
     etc.), so CI logs make backend selection auditable across
     the matrix without having to re-run feature probes.

Surface added:

  libksuid/wipe.h            new private header (NOT installed),
                              now gated on
                              KSUID_FORCE_VOLATILE_FALLBACK so the
                              fallback path is reachable on demand.
  meson.build                two cc.has_function probes
                              + KSUID_HAVE_EXPLICIT_BZERO_*
                              defines + summary() output.
  tests/test_wipe.c          smoke test: wipe a buffer, assert
                              every byte is zero. Proves the
                              shim *zeroes* (NOT that it resists
                              DSE -- objdump grep in commit 2
                              proves that). Covers full buffer,
                              subrange, zero-length, and NULL.
  tests/meson.build          test_wipe registered in base_tests.

Verified locally on Linux glibc 2.43 (auto build): meson summary
reports "wipe backend: explicit_bzero (<string.h>)"; 13/13 tests
pass; clang-tidy 22 still reports zero findings. Verified on
KSUID_FORCE_VOLATILE_FALLBACK build: same backend probe, but the
shim resolves to the volatile fn-ptr + asm clobber path; 13/13
tests still pass; objdump confirms zero explicit_bzero@plt calls
in the resulting library.

Commit 2 wires the shim into the four existing memset(0) sites in
rand_tls.c + chacha20.c that hold sensitive data, plus the CI
gates that prove the wipes survive optimisation in both build
modes.

Out of scope: TLS-state wipe at thread exit. That is issue #4.
Closes #2 (commit 2 of 2 in the series).

Replaces four DSE-vulnerable plain-memset(0) sites with the new
ksuid_explicit_bzero shim that landed in commit 1. Adds two CI
gates that together prove the wipes (a) survive optimisation in
the default build and (b) work on the portable fallback path.

Sites converted to ksuid_explicit_bzero:

  libksuid/rand_tls.c:86   partial-seed wipe on RNG failure
  libksuid/rand_tls.c:109  kn[44] wipe after key/nonce copied to
                            TLS state
  libksuid/rand_tls.c:168  consumed-keystream wipe in
                            ksuid_random_bytes inner loop
  libksuid/chacha20.c:66   ksuid_chacha20_block local x[16] -- the
                            post-permutation state, which is
                            keystream-mixed and a leak vector for
                            stack-read primitives in sibling frames.

The fourth wipe (chacha20.c x[16]) was a Critic-flagged scope
addition: the issue body said "any temporary state that holds key
material" and the round-mixed x[] qualifies. Cost is one 64-byte
wipe per ChaCha block, dominated by the 20-round permutation it
follows; benchmarked overhead is in the noise.

The seed-time `r->buf` zero-fill (rand_tls.c:117) deliberately
stays as plain memset -- it is initialisation before the first
keystream block overwrites the buffer, not secret-erasure, so
DSE is not a hazard.

CI gates added (.github/workflows/ci-pr.yml):

  Phase 2a (auto build, Ubuntu GCC):
    Runs `objdump -d libksuid.so.<ver> | grep -E
    'call .*<(explicit_bzero|ksuid_explicit_bzero)'` and fails
    the build if fewer than four surviving call sites are found.
    The floor of 4 matches the source-level call count; observed
    locally on glibc 2.43 / GCC 15.2.1 is 6 surviving calls (the
    static-inline shim is partially inlined and partially kept
    out-of-line). The path matcher uses `find -type f` so the
    libksuid.so.<ver>.p object-archive directory cannot leak
    into the disasm input. Critic R5 mitigation.

  Phase 2b (KSUID_FORCE_VOLATILE_FALLBACK build):
    A NEW dedicated job that builds with
    -DKSUID_FORCE_VOLATILE_FALLBACK=1 to bypass every platform
    primitive and exercise the volatile-fn-ptr fallback path.
    test_wipe runs against this build to prove the fallback
    still zeroes correctly. A secondary objdump grep asserts
    the fallback library has zero `call <explicit_bzero@plt>`
    references, catching a regression where a future
    contributor adds a primitive without gating on the force
    macro. Critic R4-coverage mitigation -- without this, every
    matrix lane would silently select explicit_bzero and the
    fallback would ship untested.

Out-of-scope, deliberately not addressed:

  - TLS-state lifetime: the per-thread ksuid_tls_rng_t survives
    until the OS reclaims the TLS block. Wiping at thread exit is
    issue #4. A `TODO(#4)` banner in rand_tls.c documents the
    boundary so a reader does not assume this PR closed it.

13/13 tests pass on both auto and force-fallback builds; clang-
tidy 22 still reports zero findings; gst-indent leaves the
working tree untouched; meson reports `wipe backend: explicit_bzero
(<string.h>)` on Linux glibc 2.43.
macOS Clang on the macos-latest runner failed to compile wipe.h with

  ../libksuid/wipe.h:83:3: error: call to undeclared function
  'memset_s'; ISO C99 and later do not support implicit function
  declarations [-Wimplicit-function-declaration]

The other matrix lanes pass: Ubuntu GCC + Clang select
explicit_bzero (<string.h>); Windows MSVC selects SecureZeroMemory;
the wipe-fallback job forces the volatile path. Only macOS falls
through every explicit_bzero probe (the macos-latest SDK doesn't
expose the prototype the way the probe is shaped) and lands on the
memset_s arm.

Root cause: the memset_s prototype in <string.h> on every libc that
provides it (glibc, Apple libc, MUSL, ...) is gated behind
__STDC_WANT_LIB_EXT1__. wipe.h tried to opt in by defining the
macro right before its conditional <string.h> include, but the
header had already pulled <string.h> in unconditionally at the top
-- and the include guard prevented the second include from
re-emitting the prototype. The opt-in came too late.

Fix: when the meson probe selects the memset_s arm, also push
-D__STDC_WANT_LIB_EXT1__=1 into common_args so every translation
unit sees the macro set BEFORE its first <string.h> include,
regardless of include order. The redundant `#define` inside wipe.h
becomes a no-op and is replaced with a comment that explains why
the project-wide define is the only correct fix.

Linux glibc 2.43 is unaffected because the project-arg macro is
harmless when set on a libc that already exposes memset_s
unconditionally (or doesn't ship it at all). The primitive that
gets selected on Linux GCC is still explicit_bzero (<string.h>)
per the meson summary line.

Verified locally on Linux GCC: 13/13 tests pass on both auto and
KSUID_FORCE_VOLATILE_FALLBACK builds; meson summary unchanged. The
fix is small enough that it is being landed on the same PR as the
shim itself rather than as a separate follow-up.
@justinjoy justinjoy merged commit cd155c9 into main Apr 30, 2026
11 checks passed
@justinjoy justinjoy deleted the feature/issue-2-explicit-bzero branch April 30, 2026 06:30
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wipe CSPRNG state with explicit_bzero / SecureZeroMemory shim (defeat DSE)

1 participant