Skip to content

A couple of sanitizer-related tweaks#41808

Draft
mrc0mmand wants to merge 5 commits intosystemd:mainfrom
mrc0mmand:test-san-tweaks
Draft

A couple of sanitizer-related tweaks#41808
mrc0mmand wants to merge 5 commits intosystemd:mainfrom
mrc0mmand:test-san-tweaks

Conversation

@mrc0mmand
Copy link
Copy Markdown
Member

@mrc0mmand mrc0mmand commented Apr 24, 2026

The sanitizer job was FUBAR on Fedora Rawhide (and RHEL 10 to some extent) due to several changes:

  • latest LLVM (v22) introduced a change that occasionally generates a false-positive warning when running with sanitizers
  • several tools had to be "ASan-wrapped" because:
    • util-linux started linking against libsystemd which propagated to other tools depending on its shared libraries (like libmount)
    • libssh started depending on libfido2, which depends on libudev; this then translated to an interesting depedency chain where tpm2 utils got a dependency on libudev through libcurl -> libssh -> libfido2
  • polkit added MemoryMax= to its service file, which is incompatible with ASan-runs (at least with the current limits)

See the commits for more detailed descriptions.

Also, one note: the santizer job is currently still FUBAR on Fedora Rawhide (or, more specifically, the TEST-50-DISSECT and TEST-58-REPART), because mkfs.erofs also gained a dependency on libudev (through libcurl, see above), but the wrapping currently doesn't work as it also depends on libqpl which is linked with libtsan (which is incompatible with other sanitizers). This is currently tracked in https://bugzilla.redhat.com/show_bug.cgi?id=2461146

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 24, 2026

Claude review of PR #41808 (ef3c073)

Suggestions

  • mkfs.erofs ASan/TSan conflict (dismissed) — mkosi/mkosi.sanitizers/mkosi.postinst:88 — Adding mkfs.erofs to the ASan wrap list while it links libtsan via libqpl causes sanitizer incompatibility; consider omitting until resolved or adding a comment
  • grep condition incomplete for memory limitsmkosi/mkosi.sanitizers/mkosi.postinst:15 — The grep only matches MemoryDeny/SystemCall but the override now also clears MemoryMax/MemorySwapMax; comment says "drop any memory limits" but not all such units are matched
  • Consider also resetting MemoryHigh=mkosi/mkosi.sanitizers/mkosi.postinst:19 — If the goal is to remove memory pressure from sanitized services, clearing MemoryHigh= (soft throttling limit) would be consistent with clearing MemoryMax= and MemorySwapMax=
  • Consider also resetting MemoryZSwapMax=mkosi/mkosi.sanitizers/mkosi.postinst:15 — For completeness, consider also matching and resetting MemoryZSwapMax= as another memory-limiting cgroup directive that could interact poorly with sanitizer-inflated memory usage
  • Startup memory limit variants not resetmkosi/mkosi.sanitizers/mkosi.postinst:20 — The drop-in clears MemoryMax/MemoryHigh/MemorySwapMax but not their Startup* counterparts; if a unit explicitly sets e.g. StartupMemoryMax=, that limit survives
  • verify_asan_link_order in LSAN_OPTIONS has no effectmkosi/mkosi.sanitizers/mkosi.postinst:153 — verify_asan_link_order is an ASan flag but is placed in LSAN_OPTIONS where it is ignored; consider moving to ASAN_OPTIONS or dropping it
  • ASAN_RT_PATH is now dead codemkosi/mkosi.sanitizers/mkosi.postinst:151 — After removing LD_PRELOAD, the ~15 lines computing ASAN_RT_PATH via ldd are unused and include a potentially confusing exit 1 path
  • Stale LD_PRELOAD export in TEST-07-PID1test/units/TEST-07-PID1.user-namespace-path.sh:14 — The test sources systemd-asan-env and exports LD_PRELOAD, but the env file no longer sets LD_PRELOAD after this PR. The comment says lsns needs the sanitizer preload to work; if so, this test is silently broken

Nits

  • Grammar in commenttest/integration-tests/integration-test-wrapper.py:147 — "can throw following warning" should be "can throw the following warning"
  • Inconsistent tpm2 tool examples (dismissed) — mkosi/mkosi.sanitizers/mkosi.postinst:110 — Commit message uses tpm2_pcrread but code comment uses tpm2_readpublic and tpm2_pcrextend
  • Missing TODO/FIXME for temporary workaroundtest/integration-tests/integration-test-wrapper.py:144 — The comment says "temporarily" but has no TODO/FIXME marker to track removal once the upstream LLVM issue is fixed
  • Regex negative lookahead specificity (dismissed) — test/integration-tests/integration-test-wrapper.py:157 — The negative lookahead (?!WARNING: ptrace) only excludes warnings starting with exactly that text; a slightly more flexible pattern would be more resilient against upstream wording changes
  • Comment doesn't reflect additional ASan optionsmkosi/mkosi.sanitizers/mkosi.postinst:143 — The comment only mentions disabling LSan for external binaries, but the ASAN_OPTIONS line now also sets quarantine_size_mb=0 and malloc_context_size=0

Workflow run

Comment thread mkosi/mkosi.sanitizers/mkosi.postinst Outdated
Comment thread mkosi/mkosi.sanitizers/mkosi.postinst Outdated
Comment thread test/integration-tests/integration-test-wrapper.py Outdated
Comment thread mkosi/mkosi.sanitizers/mkosi.postinst
@bluca
Copy link
Copy Markdown
Member

bluca commented Apr 24, 2026

  • polkit added MemoryMax= to its service file, which is incompatible with ASan-runs (at least with the current limits)

yeah this is BS, I'll file a revert as it makes no sense

@mrc0mmand mrc0mmand force-pushed the test-san-tweaks branch 2 times, most recently from 00eaa17 to 3c66f08 Compare April 24, 2026 13:25
Comment thread mkosi/mkosi.sanitizers/mkosi.postinst
Comment thread test/integration-tests/integration-test-wrapper.py Outdated
Comment thread mkosi/mkosi.sanitizers/mkosi.postinst
@bluca bluca added ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR and removed good-to-merge/with-minor-suggestions labels Apr 24, 2026
@mrc0mmand mrc0mmand marked this pull request as draft April 25, 2026 10:06
@github-actions github-actions Bot added please-review PR is ready for (re-)review by a maintainer and removed ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR labels Apr 25, 2026
@mrc0mmand mrc0mmand removed the please-review PR is ready for (re-)review by a maintainer label Apr 25, 2026
Comment thread mkosi/mkosi.sanitizers/mkosi.postinst
Comment thread mkosi/mkosi.sanitizers/mkosi.postinst
@mrc0mmand mrc0mmand force-pushed the test-san-tweaks branch 2 times, most recently from 1449175 to 2e91179 Compare April 25, 2026 15:05
Comment thread test/integration-tests/integration-test-wrapper.py
Turns out that the util-linux dep on libsystemd caused more fun than I
originally anticipated:

$ lddtree /usr/bin/dfuzzer
dfuzzer => /usr/bin/dfuzzer (interpreter => /lib64/ld-linux-x86-64.so.2)
    libgio-2.0.so.0 => /lib64/libgio-2.0.so.0
        libgmodule-2.0.so.0 => /lib64/libgmodule-2.0.so.0
        libz.so.1 => /lib64/libz.so.1
        libmount.so.1 => /lib64/libmount.so.1
            libblkid.so.1 => /lib64/libblkid.so.1
            libsystemd.so.0 => /lib64/libsystemd.so.0
                libm.so.6 => /lib64/libm.so.6
                    ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2
        libselinux.so.1 => /lib64/libselinux.so.1
            libpcre2-8.so.0 => /lib64/libpcre2-8.so.0
...

Also, the tpm2 utils now depend on libudev through libcurl -> libssh ->
libfido2 dep chain:

$ lddtree /usr/bin/tpm2_pcrread
tpm2_pcrread => /usr/bin/tpm2_pcrread (interpreter => /lib64/ld-linux-x86-64.so.2)
    ...
    libcurl.so.4 => /lib64/libcurl.so.4
    ...
        libssh.so.4 => /lib64/libssh.so.4
            libfido2.so.1 => /lib64/libfido2.so.1
                libcbor.so.0.13 => /lib64/libcbor.so.0.13
                libudev.so.1 => /lib64/libudev.so.1
                    libgcc_s.so.1 => /lib64/libgcc_s.so.1
...

Follow-up for 8030e0b.
Comment thread mkosi/mkosi.sanitizers/mkosi.postinst
As the memory usage under sanitizers is quite unpredictable.

This is currently relevant mainly for Polkit, as it introduced memory
limits for its polkitd.service unit in the latest version [0] which are
very easy to trigger when running under sanitizers (as polkitd depends
on libsystemd which brings ASan into polkitd's address space).

[0] polkit-org/polkit@7d9c06c
LLVM 22 introduced an additional check [0] for ptrace() syscall when
invoking sanitizers [0] which currently produces a false-positive
warning when running some of our units under sanitizers:

[   47.524680] systemd-timedated[740]: ==740==WARNING: ptrace appears to be blocked (is seccomp enabled?). LeakSanitizer may hang.
[   47.524680] systemd-timedated[740]: ==740==Child exited with signal 15.
...
[ 1555.734223] systemd-oomd[93]: ==93==WARNING: ptrace appears to be blocked (is seccomp enabled?). LeakSanitizer may hang.
[ 1555.734223] systemd-oomd[93]: ==93==Child exited with signal 15.
...

It is a false positive because we disable the seccomp filters
system-wide for our units in the sanitizer jobs.

Now, from what I've seen so far this happens only in
Type=notify(-reload) units that also utilize bus_event_loop_with_idle().
This, combined with the fact that the ptrace()-check child process from
[0] checks only if the child process was killed by _any_ signal, means
that if the systemd unit exits on its own after becoming idle and then
something sends it SIGTERM (either via explicit `systemctl stop` or
during system shutdown), this SIGTERM might hit the ptrace()-check child
process from the sanitizer handler (as we also send the signal to all
processes in the target cgroup), which the parent process then
mistakenly evaluates as a blocked ptrace() syscall, even though the
check process wasn't killed by SIGSYS.

I filed this as [1] to the LLVM project, but let's also temporarily
ignore the warning in the sanitizer report processing, as it currently
causes annoying test fails.

[0] llvm/llvm-project@a708b4b
[1] llvm/llvm-project#193714
…aries

Let's drop the quarantine that ASan uses for use-after-free detection,
as it's pointless in wrapped binaries and can consume up to 256 MiB of
memory (with the default configuration). Also, don't keep any stack
traces for allocations & deallocations, which should (slightly) help
with both memory & performance overhead.
The original find was matching even our test units, which caused issues
when the check was extended with Memory*= directives, as we stripped
them off from test units for TEST-55-OOMD where we certainly need them.
Since the stripping was meant primarily for "production-grade" units,
let's limit it to units under /etc/systemd/system/ and
/usr/lib/systemd/system/.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants