Skip to content

ci: fix gcc-ASAN workflow — adopt r-hub container on x86#262

Merged
ms609 merged 5 commits into
mainfrom
claude/lucid-burnell-130149
Jul 3, 2026
Merged

ci: fix gcc-ASAN workflow — adopt r-hub container on x86#262
ms609 merged 5 commits into
mainfrom
claude/lucid-burnell-130149

Conversation

@ms609

@ms609 ms609 commented Jul 3, 2026

Copy link
Copy Markdown
Owner

Problem

The gcc-ASAN workflow on main has been broken and disabled_manually.
main's ASan.yml was frozen since Sept 2025 on the bare-runner +
manual-makevars
approach, which hits a kernel ASLR/shadow-memory clash on
current ubuntu-24.04 runners:

==NNNN==Shadow memory range interleaves with an existing memory mapping.
        ASan cannot proceed correctly. ABORTING.
==NNNN==This might be related to ELF_ET_DYN_BASE change in Linux 4.12.

This aborts before any R code runs. (Note: the arm-runner/r-hub-container
breakage seen on unmerged branches never affected main — that was a
separate red herring.)

Rejected fixes (empirically)

Both standard ASLR workarounds were tried and verified to still abort:

Attempt Commit Result
sudo sysctl vm.mmap_rnd_bits=28 8a5de02 sysctl applied, shadow-memory abort persisted (run 28644456099)
setarch "$(uname -m)" -R (disable ASLR outright) 9a2ef4c / f8b1ac0 still aborted identically (run 28645101973)

The shadow-memory clash is x86-bare-runner-specific and not reliably
fixable this way on today's runner images.

Fix

Adopt the r-hub gcc-asan container on ubuntu-latest (x86) via
ms609/actions/asan@main — the container ships R-devel already built with
ASAN/UBSAN and sidesteps the shadow-memory clash by construction. This is the
exact config TreeTools runs green (42 min on 2026-06-16, and a fresh 17-min
green run today). Adds exclude-packages: Rogue (skips Rogue's ~30-min Rfast
source build) and remotes to Config/Needs/memcheck (the vignettes leg's
devtools::build_vignettes() needs it; devtools 2.5.0+ demoted remotes to
Suggests).

Verification

  • Run 28646564439:
    tests + examples green under real ASAN; vignettes failed only on the
    remotes dependency.
  • Run 28648322374:
    all three legs green (tests/examples/vignettes) after the remotes fix.

No memory-safety findings surfaced in TreeSearch's own C++.

Merge note

Please squash-merge: the branch carries 3 abandoned sysctl/setarch commits
(kept for the empirical record) before the working container fix (562dd70)
and the remotes fix (8e9c510).

🤖 Generated with Claude Code

ms609 and others added 5 commits July 3, 2026 08:05
main's ASan.yml never received the vm.mmap_rnd_bits=28 fix from
867df93 (2026-03-24) -- that commit, and everything after it
(container/arm experiments), landed only on unmerged feature
branches. Main was still on the pre-fix Sept-2025 config, so
GitHub's current ubuntu-24.04 runners hit "Shadow memory range
interleaves with an existing memory mapping" (google/sanitizers#856)
on every dispatch.

Also add remotes to Config/Needs/memcheck: the vignettes leg was
failing separately on "The package 'remotes' is required" from
devtools::build_vignettes().

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
vm.mmap_rnd_bits=28 (the original 867df93 fix) no longer prevents
the shadow-memory clash on today's ubuntu-24.04 runners -- verified
run 28644456099 still aborted with "Shadow memory range interleaves"
in both tests and examples legs even with the sysctl applied
(confirmed applied: vm.mmap_rnd_bits = 28 in the log). The March fix
was already probabilistic ("vignettes happened to avoid the
collision"), not deterministic.

Lower rnd_bits to 8 and additionally wrap the ASAN Rscript invocation
in `setarch $(uname -m) -R`, which disables ASLR outright
(personality ADDR_NO_RANDOMIZE) for that process tree -- a
deterministic fix rather than a smaller random window.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
28 is the minimum vm.mmap_rnd_bits the kernel accepts on x86_64
(sysctl rejected 8 with "Invalid argument", killing all three legs
in ~13s). Keep the sysctl at 28 and rely on the setarch -R wrapper
from 9a2ef4c as the actual deterministic ASLR fix.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
… path

The bare-runner + manual-makevars approach on main cannot be made
reliable: it hits a kernel ASLR/shadow-memory clash (google/sanitizers
#856, "Shadow memory range interleaves") that neither sudo sysctl
vm.mmap_rnd_bits=28 (commits 8a5de02) nor `setarch -R` to disable
ASLR outright (9a2ef4c/f8b1ac05) prevents on today's ubuntu-24.04
runners — both were verified to still abort before reaching any R code.

Switch to the r-hub gcc-asan container via ms609/actions/asan@main,
which ships R-devel already built with ASAN/UBSAN and sidesteps the
shadow-memory clash by construction. This is the exact config
TreeTools ran green for 42 min on 2026-06-16 (commit decfe3e9),
before an unrelated ubuntu-latest->ubuntu-24.04-arm switch broke it
(the r-hub image has no arm64 variant; the container dies at init).
We stay on ubuntu-latest (x86) and add exclude-packages: Rogue to
skip Rogue's ~30-min Rfast source build.

Also reverts the now-unnecessary DESCRIPTION Config/Needs/memcheck
remotes addition: the action installs devtools with its hard deps
(which include remotes), so build_vignettes() finds it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Verified run 28646564439 under the r-hub container: tests and examples
passed clean under real ASAN, but vignettes failed on "The package
'remotes' is required" from devtools::build_vignettes(). devtools 2.5.0+
moved remotes to Suggests, so the action's hard-deps install of
Config/Needs/memcheck (devtools) does not pull it. Name it explicitly.
(TreeTools does not hit this — its memcheck uses pkgdown, not devtools.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ms609 ms609 merged commit f2f6de8 into main Jul 3, 2026
10 checks passed
ms609 added a commit that referenced this pull request Jul 3, 2026
The r-hub gcc-asan container image has no arm64 variant; on
ubuntu-24.04-arm the container dies at init before any R work runs
(every run since 2026-06-19 failed in ~30-40s). Cherry-picked from
TreeSearch main's fix (PR #262) so the T-329 fix can be verified
under a working ASan CI.
ms609 added a commit that referenced this pull request Jul 4, 2026
* fix(constraint): reject impossible constraints, guard collapsed-split OOB (T-329)

An impossible (non-laminar) constraint reached random_constrained_tree()
unfiltered: .PrepareConstraint() only filtered splits by size, never
checked pairwise compatibility. In random_constrained_tree(), a split that
loses all its tips to tighter non-laminar splits collapses (split_root =
-1); if it still has a strict-superset parent, the parent's child-split
loop pushed that -1 as a phantom item, causing tree.parent[-1] and
tree.left/right[n_tip-1] out-of-bounds writes.

Primary fix: .PrepareConstraint() now validates pairwise split
compatibility (disjoint or nested) and stops with a clear error before
reaching the C++ builder. Defensive fix: guard split_root[j] >= 0 in
random_constrained_tree()'s child-split collection, mirroring the
existing root-level guard.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

* ci(ASan): switch mem-check runner to ubuntu-latest (x86)

The r-hub gcc-asan container image has no arm64 variant; on
ubuntu-24.04-arm the container dies at init before any R work runs
(every run since 2026-06-19 failed in ~30-40s). Cherry-picked from
TreeSearch main's fix (PR #262) so the T-329 fix can be verified
under a working ASan CI.

* ci(asan): install phangorn for tests leg, exclude MaxMin from vignettes leg

The tests leg only installs hard deps + Config/Needs/memcheck (Suggests are
skipped for tests/examples), but tests/testthat/*.R make heavy unguarded use
of phangorn::phyDat() etc. -> "there is no package called 'phangorn'".  Add
it to Config/Needs/memcheck so it installs regardless of leg.

The vignettes leg installs all immediate Suggests via a single
pak::pkg_install() batch, which resolves plain package names against
CRAN/Bioconductor only - it doesn't see the local DESCRIPTION's Remotes
mapping.  MaxMin is GitHub-only (Remotes: ms609/MaxMin), so that batched
solve fails outright with "Can't find package called MaxMin", taking every
other Suggest down with it.  Exclude MaxMin, mirroring the existing Rogue
exclusion: its only use (WideSample()) is requireNamespace-guarded and no
vignette/example exercises it.

* fix: guard zero-length memcpy in TreeState::load_tip_states

Datasets whose characters are all uninformative (e.g. every observed
state is a singleton) collapse to zero blocks and a zero-length
tip_states vector in build_dataset(). std::vector::data() on an empty
vector may legally return nullptr, which memcpy's nonnull-parameter
attribute forbids — flagged by UBSan (gcc-ASAN run 28662381835) as a
null-pointer violation reached via ts_char_steps() -> init_from_edge()
-> load_tip_states(). Skip the copy when there is nothing to copy.

Covered by the existing "ConcordantInformation() works" regression
test in test-Concordance.R, which already exercises an all-singleton
dataset via ConcordantInformation() -> CharacterLength() ->
ts_char_steps().

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

* fix(constraint): use four-gamete split compatibility, not laminar-subset (T-329 follow-up)

T-329 (19f4f97) added a laminarity gate to .PrepareConstraint() that tested
each constraint character's "1"-side as a full bipartition vs everything-else
and accepted only disjoint-or-nested pairs. This rejected SATISFIABLE
constraints: it (1) folded wildcard "?" and unconstrained tips into the "0"
side though they are free to plot either side, and (2) tested only three of
the four split-compatibility cases, missing the case where the two "0" sides
are disjoint (the "1" sides jointly cover the constrained tips).

The gcc-ASAN CI (run 28662381835) surfaced this on the pre-existing Morphy
test and the tree-search vignette "complex-constraints" chunk (ab|cef & abcd|ef,
g free) -- both displayable on ((a,b),(d,(c,(e,f))))+g. T-329's own "impossible"
example {t1,t2,t3}&{t3,t4,t5} was likewise satisfiable (coexist on
((t1,t2),t3,(t4,t5))).

Fix: build a companion 0-group matrix and apply the four-gamete test on both
groups (wildcards excluded) -- incompatible iff all four group intersections
are non-empty. Valid constraints return to their working pre-T-329 path;
genuinely-impossible ones still error. The C++ OOB guard (ts_wagner.cpp
split_root[j] >= 0) is the real crash fix and is retained unchanged.

Tests: corrected the first T-329 case to a genuinely four-gamete-incompatible
constraint ({t1,t2,t3}&{t2,t3,t4}); added a positive regression asserting the
non-laminar-but-compatible case is accepted. Verified locally: Morphy suite,
vignette chunk, and 446 constraint tests all green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(wagner): guard zero-Fitch-words start-tree build (all-hierarchy HSJ/xform)

When every character belongs to a hierarchy block, the recoded equal-weights
dataset has zero blocks, so DataSet::total_words == 0 and the per-word Fitch
state vectors (ds.tip_states, edge_set) are empty. wagner_tree() unconditionally
took the address of element 0 of those empty vectors
(&ds.tip_states[tip * total_words] == &ds.tip_states[0], and likewise
&edge_set[node * tw]) to feed the incremental insertion-cost machinery. That is
undefined behaviour (address of a past-the-end element of an empty vector) and
aborted under the hardened libstdc++ assertions in the gcc-ASAN CI
(run 28662381835, ts-xform group; _GLIBCXX_ASSERTIONS operator[] __n < size()).

With no Fitch characters every insertion cost is identically zero, so guard the
per-word work behind have_words = ds.total_words > 0: skip the edge-set
precompute and evaluate the indirect length as 0, while still running the DFS so
constraints are honoured and the first legal edge is chosen. The search then
optimises the tree from the Sankoff (xform) / hierarchy-DP (HSJ) term alone.
The have_words == true path is behaviourally identical to before.

Regression tests (test-ts-xform.R, test-ts-hsj.R) run an all-hierarchy dataset
through MaximizeParsimony() so the hardened/ASAN CI covers the zero-words path.
Verified locally with a -D_GLIBCXX_ASSERTIONS -O0 -g build: the whole ts-xform
group (incl. SK-01, which previously aborted) and ts-hsj group now pass with
no abort; a normal equal-weights search is unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant