Skip to content

Adams: O(kn log n) centroid-path consensus (Jansson, Li & Sung 2017)#7

Open
ms609 wants to merge 2 commits into
mainfrom
feature/adams-fast
Open

Adams: O(kn log n) centroid-path consensus (Jansson, Li & Sung 2017)#7
ms609 wants to merge 2 commits into
mainfrom
feature/adams-fast

Conversation

@ms609
Copy link
Copy Markdown
Owner

@ms609 ms609 commented Jun 2, 2026

Summary

Reimplements Adams() with the O(kn log n) centroid-path algorithm of Jansson, Li & Sung (2017) ("On finding the Adams consensus tree", Inf. Comput. 256:334–347; New_Adams_consensus_k, §3), replacing the pure-R recursion + per-level Newick round-trip and filling the pre-registered adamsConsensusCpp stub. The Adams consensus tree is unique, so the output is identical to the classical definition — only the running time changes.

Algorithm (src/cons_adams.cpp)

  • Per input tree: Euler tour + sparse-table O(1) LCA.
  • Per recursion level the restricted subtree ("nub") is rebuilt via the consecutive-LCA auxiliary-tree construction (root = the true lca_{T_j}(L')).
  • The consensus spine — leaves under the heavy child of every input's root — is expanded iteratively down the trees' centroid paths in unison; only the off-spine "side" blocks are recursed, each ≤ |L′|/2, giving O(log n) recursion depth.
  • Path compression handled correctly: each step's per-tree position is the min branch-level over the current remaining block (= the LCA depth), so a tree whose earlier leaves have split off jumps to its true LCA. (A first fixed-global-level version over-resolved on incongruent inputs; the at-scale oracle caught it and this is the fix.)
  • Reentrant (no file-scope state), VLA-free, RAII; Euler tour, Newick emission and the spine walk are all iterative (no deep recursion on caterpillars).

R/adams.R calls the C++ path on each tree's own root (Adams is rooted — no re-rooting), keeps the trivial guards + integer round-trip, and drops the .AdamsNewick/.AdamsPartition helpers.

Validation

  • Slow-Adams clade oracle (fact.exe rule 512, rooted = 1): clade-exact MATCH on every dataset, including the added larger-n / caterpillar / incongruent ones (n = 50/k20, n = 137/k10, n = 40 caterpillars). dev/oracle/check-oracle.R exits 0; its self-guard now also verifies Adams is on the C++ path.
  • tests/testthat/test-adams.R: 15 → 60 assertions — an independent in-suite R reference (the classical recursion) fuzzed over ~40 random / caterpillar / perturbed profiles, plus degree-1-suppression traps (star, identical caterpillar) and n = 3.
  • Full test suite 454 pass, 0 fail.
  • Profiling (dev/profiling/compare.R Adams): large win (e.g. n = 200 / k = 50: 0.905 s → 0.02 s, ~45×), scaling far better; no flagged regressions.

Housekeeping

NEWS bullet (kept under the shared 0.0.0.9007 dev heading); AGENTS.md FACT_RULE fix (adams=512 slow, not the stale 1024 buggy-fast) + method-roster line; inst/WORDLIST. roxygenise() only — RcppExports.* content untouched.

🤖 Generated with Claude Code

ms609 and others added 2 commits June 2, 2026 16:57
Replace the pure-R Adams recursion (plus per-level Newick round-trip) with a
C++ implementation of the O(kn log n) centroid-path algorithm of Jansson, Li &
Sung (2017), filling the pre-registered adamsConsensusCpp stub.

- src/cons_adams.cpp: per-input-tree Euler tour + sparse-table O(1) LCA; per
  recursion level the restricted subtree (nub) is rebuilt via the consecutive-
  LCA auxiliary-tree construction; the consensus "spine" (leaves under the heavy
  child of every input tree's root) is expanded iteratively down the centroid
  paths in unison -- the remaining block's per-tree position is its LCA depth, so
  path compression is handled -- recursing only on the <= |L'|/2 off-spine
  blocks. Reentrant, VLA-free, RAII.
- R/adams.R: call the C++ path on each tree's OWN root (Adams is rooted; no
  re-rooting); drop the .AdamsNewick/.AdamsPartition helpers; @details updated.
- Validated clade-for-clade against the slow FACT Adams (rule 512, rooted=1)
  across small, n=50/137, caterpillar and incongruent datasets (extended
  dev/oracle/check-oracle.R), plus an in-suite R reference fuzz and degree-1-
  suppression / n=3 traps (tests/testthat/test-adams.R). Full suite green;
  ~45x faster at n=200/k=50, scaling far better.
- Housekeeping: DESCRIPTION 0.0.0.9008; NEWS bullet; AGENTS.md FACT_RULE fix
  (adams=512 slow, not 1024 buggy-fast); inst/WORDLIST.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Integrate the MajorityPlus, Frequency and tooling commits from main.

Conflicts resolved:
- NEWS.md: keep the 0.0.0.9007 dev heading (no separate bump) and both the
  Adams and MajorityPlus bullets.
- inst/WORDLIST: keep both `radix` (main) and `reentrant` (here).
- DESCRIPTION: keep Version 0.0.0.9007.

Also extend the oracle self-guard to verify Adams (not only MajorityPlus) is on
the C++ path. Full test suite (454) green; dev/oracle/check-oracle.R exits 0
(self-guard OK, Adams clade-exact vs slow FACT at scale, all methods MATCH).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2026

Codecov Report

❌ Patch coverage is 97.70492% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 98.98%. Comparing base (9063623) to head (53fe6be).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/cons_adams.cpp 97.68% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main       #7      +/-   ##
==========================================
- Coverage   99.06%   98.98%   -0.08%     
==========================================
  Files          18       18              
  Lines        2879     3154     +275     
==========================================
+ Hits         2852     3122     +270     
- Misses         27       32       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant