T-243: FlatBlock metadata, flat EW indirect functions, TBR prefetch by ms609 · Pull Request #230 · ms609/TreeSearch

ms609 · 2026-03-26T06:07:29Z

Agent E. Performance optimization for large-tree (180+ tip) Fitch scoring inner loop.

Changes:

FlatBlock struct (24 bytes) replaces CharBlock (288 bytes) for hot-path metadata, reducing cache traffic during per-candidate indirect scoring
Six specialized flat EW indirect functions that skip per-block upweight_mask and weight checks
TBR rerooting software prefetch hints for L2 latency hiding

Validation:

All 2877 ts-* tests pass (score-identical to baseline under same seed)
Hamilton HPC benchmark (AMD EPYC 7702, 180 taxa, 10 identical-seed reps): median 11.538s → 11.360s (1.4% speedup, p=0.001 Welch t-test)
Zero API changes, zero maintenance burden
Effect is real but small at ≤88 tips (within noise); measurable when L2 is under pressure at 180+ tips

Infrastructure for indirect scoring optimization: 1. FlatBlock struct (24 bytes/block vs 288 bytes in CharBlock) packs hot-loop metadata (offset, n_states, active_mask, has_inapplicable) for cache-friendly access. Populated at build_dataset() time. 2. Flat indirect scoring functions (EW and NA-aware variants) that use FlatBlock and skip upweight_mask/weight overhead. Available as fitch_indirect_{bounded,cached}_flat and fitch_na_indirect_ {bounded,cached}_flat. NOT wired into search dispatch — see below. 3. Software prefetch in TBR rerooting inner loop: prefetch vroot_cache entry 2 iterations ahead. At 180+ tips (vroot_cache ~140 KB, L2), this hides ~10 cycle L2 latency. Negligible overhead at small sizes where vroot_cache fits in L1. Benchmarking notes (Agnarsson 62t, Zhu 75t, Dikow 88t, 10 seeds each): Flat dispatch (ternary or function pointer) showed no measurable benefit at these sizes — hardware prefetching of the sequential CharBlock array is already effective, and the dispatch overhead (extra branch or indirect call) marginally increases code complexity in the hot path. System-level timing variance on the test machine is ±15-30%, masking any sub-10% gain. The flat functions are retained as available infrastructure for large-tree optimization (180+ tips) where CharBlock cache traffic may become significant. They can be wired in via function pointers when a 180+ tip benchmark is available for validation. All 2877 ts-* tests pass with identical scores.

ms609 · 2026-03-26T06:18:01Z

GHA run 23580149481 failed with pre-existing issues (spelling 'TREE's' not in wordlist, code/doc mismatches, Rd \usage warnings). These exist on cpp-search HEAD — this branch adds only C++ changes (ts_data.h, ts_fitch.h, ts_fitch.cpp). No new issues introduced.

…to WORDLIST Pre-existing issues blocking GHA on cpp-search HEAD. Regenerated via roxygen2::roxygenise(load_code = load_installed).

ms609 added 2 commits March 26, 2026 06:49

fix: sync Rd docs (ratchetTaper, annealCycles in usage) + add TREE's …

3daa145

…to WORDLIST Pre-existing issues blocking GHA on cpp-search HEAD. Regenerated via roxygen2::roxygenise(load_code = load_installed).

fix: add 'speedup' to WORDLIST

4540eae

ms609 merged commit 68a488e into cpp-search Mar 26, 2026
3 of 12 checks passed

ms609 deleted the feature/hot-loop-opt branch March 27, 2026 06:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

T-243: FlatBlock metadata, flat EW indirect functions, TBR prefetch#230

T-243: FlatBlock metadata, flat EW indirect functions, TBR prefetch#230
ms609 merged 3 commits intocpp-searchfrom
feature/hot-loop-opt

ms609 commented Mar 26, 2026

Uh oh!

ms609 commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ms609 commented Mar 26, 2026

Uh oh!

ms609 commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant