Skip to content

ci(asan): fix gcc-ASAN runner + deps, fix 3 bugs it surfaced#264

Merged
ms609 merged 8 commits into
cpp-searchfrom
claude/asan-suggests-fix
Jul 4, 2026
Merged

ci(asan): fix gcc-ASAN runner + deps, fix 3 bugs it surfaced#264
ms609 merged 8 commits into
cpp-searchfrom
claude/asan-suggests-fix

Conversation

@ms609

@ms609 ms609 commented Jul 3, 2026

Copy link
Copy Markdown
Owner

Summary

Restores the gcc-ASAN memcheck workflow to a working state on cpp-search
(runner + dependency-install fixes), which surfaced three genuine bugs in
cpp-search's own code — all three are fixed here too.

CI infrastructure (2 commits):

  • dde27674 — switch mem-check runner to the r-hub gcc-asan container on
    ubuntu-latest (x86); the prior bare-runner config hit a kernel
    ASLR/shadow-memory clash and aborted before any R code ran.
  • b0d96e44 — install phangorn for the tests leg (Config/Needs/memcheck)
    and exclude MaxMin from the vignettes leg's Suggests batch-install
    (GitHub-only Remotes: ref that pak::pkg_install() can't resolve,
    taking ~13 other Suggests down with it). WideSample(), the only
    MaxMin:: use, is requireNamespace-guarded.

Bugs surfaced by the fixed CI, now fixed (5 commits):

  • 19f4f973 / f3ad9fe0 — T-329: .PrepareConstraint() rejected
    satisfiable constraints (laminar-subset check folded wildcards into the
    0-side and missed a 4th compatibility case); fixed to proper four-gamete
    compatibility on both groups, with the OOB guard on collapsed splits
    retained.
  • 6c378377 — UBSan nonnull violation in TreeState::load_tip_states:
    guard the zero-length memcpy so an empty tip-state vector doesn't hand
    a null base pointer to a nonnull-declared parameter.
  • 3d15aaab — all-hierarchy HSJ/xform datasets have zero Fitch words
    (total_words == 0), so wagner_tree() took the address of element 0 of
    empty vectors — UB that aborted under hardened libstdc++ assertions.
    Guarded behind have_words = ds.total_words > 0; insertion cost is
    identically zero on that path, DFS still runs so constraints are honoured.

Verification

  • Merge-tree dry run against current cpp-search tip: clean, no conflicts.
  • All three ASan legs (tests/examples/vignettes) reach real ASan execution
    on claude/asan-suggests-fix before these bugfixes (run 28662381835);
    each individual bugfix was validated by its own targeted regression test
    (test-ts-hsj.R, test-ts-xform.R for the wagner guard; existing
    test-Morphy.R/vignette constraint cases for T-329) plus a local
    -D_GLIBCXX_ASSERTIONS build for the wagner fix.
  • Full ASan re-run against this merged branch not yet triggered (workflow
    is disabled_manually to avoid burning CI minutes outside active use) —
    recommend enabling it once for a final green check before merge.

Test plan

  • gh workflow enable ASan.yml --repo ms609/TreeSearch and dispatch on
    this branch; confirm all three legs green, then disable again.
  • Normal R CMD check / existing test suite (unaffected by these changes
    other than the two new regression tests).

🤖 Generated with Claude Code

ms609 and others added 8 commits July 3, 2026 06:39
… OOB (T-329)

An impossible (non-laminar) constraint reached random_constrained_tree()
unfiltered: .PrepareConstraint() only filtered splits by size, never
checked pairwise compatibility. In random_constrained_tree(), a split that
loses all its tips to tighter non-laminar splits collapses (split_root =
-1); if it still has a strict-superset parent, the parent's child-split
loop pushed that -1 as a phantom item, causing tree.parent[-1] and
tree.left/right[n_tip-1] out-of-bounds writes.

Primary fix: .PrepareConstraint() now validates pairwise split
compatibility (disjoint or nested) and stops with a clear error before
reaching the C++ builder. Defensive fix: guard split_root[j] >= 0 in
random_constrained_tree()'s child-split collection, mirroring the
existing root-level guard.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
The r-hub gcc-asan container image has no arm64 variant; on
ubuntu-24.04-arm the container dies at init before any R work runs
(every run since 2026-06-19 failed in ~30-40s). Cherry-picked from
TreeSearch main's fix (PR #262) so the T-329 fix can be verified
under a working ASan CI.
…es leg

The tests leg only installs hard deps + Config/Needs/memcheck (Suggests are
skipped for tests/examples), but tests/testthat/*.R make heavy unguarded use
of phangorn::phyDat() etc. -> "there is no package called 'phangorn'".  Add
it to Config/Needs/memcheck so it installs regardless of leg.

The vignettes leg installs all immediate Suggests via a single
pak::pkg_install() batch, which resolves plain package names against
CRAN/Bioconductor only - it doesn't see the local DESCRIPTION's Remotes
mapping.  MaxMin is GitHub-only (Remotes: ms609/MaxMin), so that batched
solve fails outright with "Can't find package called MaxMin", taking every
other Suggest down with it.  Exclude MaxMin, mirroring the existing Rogue
exclusion: its only use (WideSample()) is requireNamespace-guarded and no
vignette/example exercises it.
Datasets whose characters are all uninformative (e.g. every observed
state is a singleton) collapse to zero blocks and a zero-length
tip_states vector in build_dataset(). std::vector::data() on an empty
vector may legally return nullptr, which memcpy's nonnull-parameter
attribute forbids — flagged by UBSan (gcc-ASAN run 28662381835) as a
null-pointer violation reached via ts_char_steps() -> init_from_edge()
-> load_tip_states(). Skip the copy when there is nothing to copy.

Covered by the existing "ConcordantInformation() works" regression
test in test-Concordance.R, which already exercises an all-singleton
dataset via ConcordantInformation() -> CharacterLength() ->
ts_char_steps().

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
…set (T-329 follow-up)

T-329 (19f4f97) added a laminarity gate to .PrepareConstraint() that tested
each constraint character's "1"-side as a full bipartition vs everything-else
and accepted only disjoint-or-nested pairs. This rejected SATISFIABLE
constraints: it (1) folded wildcard "?" and unconstrained tips into the "0"
side though they are free to plot either side, and (2) tested only three of
the four split-compatibility cases, missing the case where the two "0" sides
are disjoint (the "1" sides jointly cover the constrained tips).

The gcc-ASAN CI (run 28662381835) surfaced this on the pre-existing Morphy
test and the tree-search vignette "complex-constraints" chunk (ab|cef & abcd|ef,
g free) -- both displayable on ((a,b),(d,(c,(e,f))))+g. T-329's own "impossible"
example {t1,t2,t3}&{t3,t4,t5} was likewise satisfiable (coexist on
((t1,t2),t3,(t4,t5))).

Fix: build a companion 0-group matrix and apply the four-gamete test on both
groups (wildcards excluded) -- incompatible iff all four group intersections
are non-empty. Valid constraints return to their working pre-T-329 path;
genuinely-impossible ones still error. The C++ OOB guard (ts_wagner.cpp
split_root[j] >= 0) is the real crash fix and is retained unchanged.

Tests: corrected the first T-329 case to a genuinely four-gamete-incompatible
constraint ({t1,t2,t3}&{t2,t3,t4}); added a positive regression asserting the
non-laminar-but-compatible case is accepted. Verified locally: Morphy suite,
vignette chunk, and 446 constraint tests all green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…SJ/xform)

When every character belongs to a hierarchy block, the recoded equal-weights
dataset has zero blocks, so DataSet::total_words == 0 and the per-word Fitch
state vectors (ds.tip_states, edge_set) are empty. wagner_tree() unconditionally
took the address of element 0 of those empty vectors
(&ds.tip_states[tip * total_words] == &ds.tip_states[0], and likewise
&edge_set[node * tw]) to feed the incremental insertion-cost machinery. That is
undefined behaviour (address of a past-the-end element of an empty vector) and
aborted under the hardened libstdc++ assertions in the gcc-ASAN CI
(run 28662381835, ts-xform group; _GLIBCXX_ASSERTIONS operator[] __n < size()).

With no Fitch characters every insertion cost is identically zero, so guard the
per-word work behind have_words = ds.total_words > 0: skip the edge-set
precompute and evaluate the indirect length as 0, while still running the DFS so
constraints are honoured and the first legal edge is chosen. The search then
optimises the tree from the Sankoff (xform) / hierarchy-DP (HSJ) term alone.
The have_words == true path is behaviourally identical to before.

Regression tests (test-ts-xform.R, test-ts-hsj.R) run an all-hierarchy dataset
through MaximizeParsimony() so the hardened/ASAN CI covers the zero-words path.
Verified locally with a -D_GLIBCXX_ASSERTIONS -O0 -g build: the whole ts-xform
group (incl. SK-01, which previously aborted) and ts-hsj group now pass with
no abort; a normal equal-weights search is unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Brings in the zero-Fitch-words start-tree guard (3d15aaa) alongside the
other two chip fixes (null-ptr load_tip_states, T-329 four-gamete) and
the ASan CI dependency-install fix, so all three genuine bugs surfaced
by the fixed workflow are now on one branch.
@ms609 ms609 merged commit c9ea624 into cpp-search Jul 4, 2026
8 of 10 checks passed
@ms609 ms609 deleted the claude/asan-suggests-fix branch July 4, 2026 04:46
ms609 added a commit that referenced this pull request Jul 4, 2026
…R#264)

Any dataset that collapses to 0 Fitch blocks (total_words == 0) -- not just
the all-hierarchy HSJ/xform case PR #264 fixed in wagner_tree/load_tip_states,
but also plain equal-weights data where every character is constant or an
autapomorphy -- leaves tip_states/prelim/edge_set genuinely empty
std::vectors. Code that unconditionally took the address of element 0 of
these empty vectors is undefined behaviour: it aborts under hardened
libstdc++ assertions/ASan but is silent (out-of-bounds address, never
dereferenced if the following block loop is also 0-trip) in a plain release
build.

Audited every zero-word-reachable site sharing this pattern and guarded each
with the same have_words ternary wagner_tree already uses (never bare
nullptr+0 arithmetic, which UBSan's pointer-overflow check would still flag):

- ts_sector.cpp: compute_from_above_for_sector (called unconditionally on
  every build_reduced_dataset(), i.e. every sector reduction including the
  default rasStarts=1 -- the most severe of these, reachable with no
  non-default control), build_ras_sector (the RAS-restart start-tree
  builder, reachable via SearchControl(rasStarts>=2)), and
  build_reduced_dataset_collapsed (reachable via sectorCollapseTarget>0).
- ts_prune_reinsert.cpp: build_reduced_dataset and expand_and_reinsert,
  reachable via pruneReinsertCycles>0 (auto-selected by the "large" preset
  at nTip>=120).
- ts_temper.cpp: stochastic_tbr_phase (annealing), same "large"-preset
  reachability, needed its own top-level total_words==0 early return
  matching nni_search/tbr_search/ratchet_search/drift_search's existing
  style.
- ts_tree.cpp: TreeState::save_node_state's prealloc fast path -- a shared
  primitive under TBR/ratchet/temper, guarded once at the source.
- ts_wagner.cpp: wagner_goloboff_scores/wagner_entropy_scores (biased-Wagner
  tip scoring), also reachable via the "large" preset.

Regression tests added to test-ts-sector.R (default control, rasStarts=3,
sectorCollapseTarget=6, pruneReinsertCycles=1, annealCycles=1, all through
MaximizeParsimony() on a 40-tip all-uninformative EW dataset) and
test-ts-wagner.R (ts_wagner_bias_bench directly). Reachability of all 6
sites was confirmed empirically (temporary REprintf probes on each
total_words==0 branch, fired under every new test, then removed).

Verified the tests are non-tautological, not just reachability-proxy: a
negative control reverting build_ras_sector's guard back to the unconditional
&rd.data.tip_states[...] reproduces the exact predicted abort
(Assertion '__n < this->size()' failed) under a local -D_GLIBCXX_ASSERTIONS
-O0 -g build, with a gdb backtrace confirming build_ras_sector (ts_sector.cpp)
-> search_sector -> xss_search -> run_single_replicate -> driven_search ->
ts_driven_search, i.e. reachable from the public MaximizeParsimony() API.
Restoring the guard reruns clean. Full targeted suite (sector, wagner, hsj,
xform, anneal, driven, drift, ratchet, tbr, Concordance) passes under the
same hardened build with no behavioural regression.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ms609 added a commit that referenced this pull request Jul 4, 2026
…R#264) (#265)

Any dataset that collapses to 0 Fitch blocks (total_words == 0) -- not just
the all-hierarchy HSJ/xform case PR #264 fixed in wagner_tree/load_tip_states,
but also plain equal-weights data where every character is constant or an
autapomorphy -- leaves tip_states/prelim/edge_set genuinely empty
std::vectors. Code that unconditionally took the address of element 0 of
these empty vectors is undefined behaviour: it aborts under hardened
libstdc++ assertions/ASan but is silent (out-of-bounds address, never
dereferenced if the following block loop is also 0-trip) in a plain release
build.

Audited every zero-word-reachable site sharing this pattern and guarded each
with the same have_words ternary wagner_tree already uses (never bare
nullptr+0 arithmetic, which UBSan's pointer-overflow check would still flag):

- ts_sector.cpp: compute_from_above_for_sector (called unconditionally on
  every build_reduced_dataset(), i.e. every sector reduction including the
  default rasStarts=1 -- the most severe of these, reachable with no
  non-default control), build_ras_sector (the RAS-restart start-tree
  builder, reachable via SearchControl(rasStarts>=2)), and
  build_reduced_dataset_collapsed (reachable via sectorCollapseTarget>0).
- ts_prune_reinsert.cpp: build_reduced_dataset and expand_and_reinsert,
  reachable via pruneReinsertCycles>0 (auto-selected by the "large" preset
  at nTip>=120).
- ts_temper.cpp: stochastic_tbr_phase (annealing), same "large"-preset
  reachability, needed its own top-level total_words==0 early return
  matching nni_search/tbr_search/ratchet_search/drift_search's existing
  style.
- ts_tree.cpp: TreeState::save_node_state's prealloc fast path -- a shared
  primitive under TBR/ratchet/temper, guarded once at the source.
- ts_wagner.cpp: wagner_goloboff_scores/wagner_entropy_scores (biased-Wagner
  tip scoring), also reachable via the "large" preset.

Regression tests added to test-ts-sector.R (default control, rasStarts=3,
sectorCollapseTarget=6, pruneReinsertCycles=1, annealCycles=1, all through
MaximizeParsimony() on a 40-tip all-uninformative EW dataset) and
test-ts-wagner.R (ts_wagner_bias_bench directly). Reachability of all 6
sites was confirmed empirically (temporary REprintf probes on each
total_words==0 branch, fired under every new test, then removed).

Verified the tests are non-tautological, not just reachability-proxy: a
negative control reverting build_ras_sector's guard back to the unconditional
&rd.data.tip_states[...] reproduces the exact predicted abort
(Assertion '__n < this->size()' failed) under a local -D_GLIBCXX_ASSERTIONS
-O0 -g build, with a gdb backtrace confirming build_ras_sector (ts_sector.cpp)
-> search_sector -> xss_search -> run_single_replicate -> driven_search ->
ts_driven_search, i.e. reachable from the public MaximizeParsimony() API.
Restoring the guard reruns clean. Full targeted suite (sector, wagner, hsj,
xform, anneal, driven, drift, ratchet, tbr, Concordance) passes under the
same hardened build with no behavioural regression.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant