Add: opt-in compiler sanitizers (ASAN/UBSan + TSAN) for host targets#915
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR introduces comprehensive ASAN/TSAN/UBSAN sanitizer support. A new sanitizer module validates preset/token combinations and maps to runtime libraries. Build and test systems thread sanitizer flags through CMake targets and compiler invocations. CI adds a nightly job to run instrumented scene tests on both platforms. ChangesSanitizer Support (ASAN/TSAN/UBSAN) Infrastructure
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces compiler sanitizer support (ASAN, TSAN, and UBSAN) for host-compiled targets, including the simulation runtime, kernels, orchestration, and onboard host runtime. It adds build-time and run-time configuration options, a centralized sanitizer helper module, and updates to the testing documentation and CI schedule. Feedback on these changes suggests adding --no-build-isolation to the documented pip command in CMakeLists.txt, importing from __future__ import annotations in Python files using PEP 585 generic collections to ensure compatibility with Python versions earlier than 3.10, and quoting CMake path variables like ${SIMPLER_CMAKE_DIR} to robustly handle directories containing spaces.
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/ci.yml:
- Around line 561-575: The checkout step currently uses actions/checkout@v5 and
setup-python@v6 unpinned and leaves the workflow token persisted; update the
checkout invocation referenced as "actions/checkout" to include
persist-credentials: false and replace both "uses: actions/checkout@v5" and
"uses: actions/setup-python@v6" with their corresponding immutable commit SHAs
(pin to a specific commit SHA) so the workflow uses pinned action versions and
does not persist credentials.
- Around line 6-9: The workflow-level schedule cron currently triggers the
entire workflow (causing jobs like pre-commit, packaging-matrix, ut, and
detect-changes to run on cron), but only sanitizer-sim was intended; update the
workflow so that either the sanitizer-sim job is the only one gated to the
schedule or move the cron into a separate workflow: add explicit job-level
guards such as if: github.event_name == 'pull_request' to jobs pre-commit,
packaging-matrix, ut, and detect-changes (or relocate the schedule entry into a
new workflow containing only the sanitizer-sim job) so cron runs only execute
the intended sanitizer-sim job.
In `@simpler_setup/sanitizers.py`:
- Around line 51-58: The validate function currently allows the 'thread'
sanitizer everywhere; update validate(tokens: str) to detect non-Linux hosts and
raise a ValueError when 'thread' (TSAN) is requested on non-Linux platforms:
check the platform (e.g., via sys.platform or platform.system()) at the start of
validate and if 'thread' is present in toks and the platform is not Linux, raise
a clear ValueError explaining TSAN is Linux-only; keep the existing
_THREAD_INCOMPATIBLE check and error message for the other incompatibilities.
- Around line 61-75: The preload_lib function currently ignores the raw token
"leak", so callers like resolve() that accept raw token lists never trigger
preloading for leak builds; update preload_lib(tokens: str) to treat "leak" the
same as "address" (return "libasan.so") or alternatively add upfront validation
in resolve() to reject unsupported raw tokens (including "leak") with a clear
error. Locate the preload_lib function and either add "leak" to the toks checks
alongside "address" or implement token validation in resolve() to raise on
unsupported tokens so the sanitizer runtime requirement isn't silently missed.
In `@simpler_setup/scene_test.py`:
- Around line 1355-1368: After resolving and validating the sanitizer tokens via
_san_resolve and _san_validate, perform the same "sanitizer runtime preloaded"
guard that the conftest path uses before assigning KernelCompiler._sanitizers;
if the runtime is not preloaded, call parser.error with an explanatory message
(similar to the conftest guard) so standalone runs (invoking _san_tokens and
KernelCompiler._sanitizers) fail early with an actionable error instead of
hitting dlopen/runtime failures.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0af69c3c-ca69-4f7e-80d8-80658aa4de1f
📒 Files selected for processing (22)
.github/workflows/ci.ymlCMakeLists.txtcmake/sanitizers.cmakeconftest.pydocs/ci.mddocs/testing.mdsimpler_setup/build_runtimes.pysimpler_setup/kernel_compiler.pysimpler_setup/runtime_compiler.pysimpler_setup/sanitizers.pysimpler_setup/scene_test.pysimpler_setup/toolchain.pysrc/a2a3/platform/onboard/host/CMakeLists.txtsrc/a2a3/platform/sim/aicore/CMakeLists.txtsrc/a2a3/platform/sim/aicpu/CMakeLists.txtsrc/a2a3/platform/sim/host/CMakeLists.txtsrc/a5/platform/onboard/host/CMakeLists.txtsrc/a5/platform/sim/aicore/CMakeLists.txtsrc/a5/platform/sim/aicpu/CMakeLists.txtsrc/a5/platform/sim/host/CMakeLists.txtsrc/common/log/CMakeLists.txtsrc/common/sim_context/CMakeLists.txt
a6e1c15 to
bd91022
Compare
Closes part of hw-native-sys#904. Sanitizers are host-toolchain-only, threaded through a single --sanitizer parameter (no env vars), covering runtime + per-test kernels/orchestration, plus a nightly CI sweep. - cmake/sanitizers.cmake: generic helper, -fsanitize=${SIMPLER_SANITIZERS}; -Wno-error=tsan so TSAN's "atomic_thread_fence not supported" warning doesn't fail the AICPU target's -Werror build (known/acceptable TSAN limitation) - simpler_setup/sanitizers.py: preset table + mutual-exclusion validation + preload lib/command + host_cxx — one source of truth (unit-tested) - toolchain.py: is_host marker; GxxToolchain prefer_g15 — under a sanitizer the sim runtime/helpers/orchestration build with g++-15 to MATCH the sim kernels (mixing g++ and g++-15 sanitizer runtimes is an ABI mismatch that fails at .so load). Onboard host stays g++ (its kernels are device, never instrumented) - runtime + kernel_compiler: --sanitizer -> host targets only (is_host gate) - conftest.py + scene_test.py: --sanitizer compiles kernels to match, validates, and BOTH (pytest + standalone) fail fast with the exact LD_PRELOAD command - top CMakeLists: SIMPLER_SANITIZER cache var (pip install --config-settings) - .github/workflows/sanitizers.yml: SEPARATE nightly workflow (schedule + workflow_dispatch) so cron fires ONLY the sanitizer jobs, never the PR/ self-hosted pipeline (asan/tsan × a2a3sim/a5sim, ubuntu-only) - docs/testing.md + ci.md; tests/ut/py/test_sanitizers.py Sanitizers are Linux-only (macOS gcc-15's libasan isn't found by Apple ld). Validated on Linux (docker): a2a3sim ASAN build + run, dynamic_register 6/6 green (~1.7x overhead); TSAN host/aicpu/aicore build. The pre-existing sim-oversubscription flake is more frequent under ASAN's slower timing, so the nightly job uses a generous timeout and is not a PR gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Implements #904: opt-in compiler sanitizers for host-compiled code, driven by a single
--sanitizerparameter (no env vars), covering the runtime + per-test kernels/orchestration, plus a nightly CI sweep.Scope rule — sanitize iff host-compiled (auto-enforced by
Toolchain.is_host): sim instruments host+aicpu+aicore +sim_context/log+ kernels + orchestration; onboard instruments host only; device toolchains (ccec/aarch64) never. Device custom arenas bypass ASAN redzones (documented limitation).Key design points
--sanitizerparameter, no env vars. Install-time:pip install --config-settings=cmake.define.SIMPLER_SANITIZER=asan .; test-time--sanitizercompiles the per-test kernels to match + fails fast with the exactLD_PRELOADcommand (both pytest and standalone paths)..soload. Onboard host stays g++ (its kernels are device, never instrumented).cmake/sanitizers.cmakeis generic (-fsanitize=${SIMPLER_SANITIZERS});-Wno-error=tsankeeps TSAN's "atomic_thread_fence unsupported" warning from failing the AICPU-Werrorbuild..github/workflows/sanitizers.yml, schedule + workflow_dispatch) so the cron fires only the sanitizer jobs — never the PR/self-hosted pipeline. Not a PR gate.Testing (validated on Linux via docker)
dynamic_register::test_register_after_init_then_run6/6 green (~1.7× overhead). Build instrumentation confirmed vianm(host/aicpu/aicore + helpers);--sanitizer none= 0.__tsaninstrumentation.tests/ut/py/test_sanitizers.py(preset/validate/preload logic) — 20/20.