T-243: FlatBlock metadata, flat EW indirect functions, TBR prefetch#230
Merged
ms609 merged 3 commits intocpp-searchfrom Mar 26, 2026
Merged
T-243: FlatBlock metadata, flat EW indirect functions, TBR prefetch#230ms609 merged 3 commits intocpp-searchfrom
ms609 merged 3 commits intocpp-searchfrom
Conversation
Infrastructure for indirect scoring optimization:
1. FlatBlock struct (24 bytes/block vs 288 bytes in CharBlock) packs
hot-loop metadata (offset, n_states, active_mask, has_inapplicable)
for cache-friendly access. Populated at build_dataset() time.
2. Flat indirect scoring functions (EW and NA-aware variants) that use
FlatBlock and skip upweight_mask/weight overhead. Available as
fitch_indirect_{bounded,cached}_flat and fitch_na_indirect_
{bounded,cached}_flat. NOT wired into search dispatch — see below.
3. Software prefetch in TBR rerooting inner loop: prefetch vroot_cache
entry 2 iterations ahead. At 180+ tips (vroot_cache ~140 KB, L2),
this hides ~10 cycle L2 latency. Negligible overhead at small sizes
where vroot_cache fits in L1.
Benchmarking notes (Agnarsson 62t, Zhu 75t, Dikow 88t, 10 seeds each):
Flat dispatch (ternary or function pointer) showed no measurable benefit
at these sizes — hardware prefetching of the sequential CharBlock array
is already effective, and the dispatch overhead (extra branch or indirect
call) marginally increases code complexity in the hot path. System-level
timing variance on the test machine is ±15-30%, masking any sub-10% gain.
The flat functions are retained as available infrastructure for large-tree
optimization (180+ tips) where CharBlock cache traffic may become
significant. They can be wired in via function pointers when a 180+ tip
benchmark is available for validation.
All 2877 ts-* tests pass with identical scores.
Owner
Author
|
GHA run 23580149481 failed with pre-existing issues (spelling 'TREE's' not in wordlist, code/doc mismatches, Rd \usage warnings). These exist on cpp-search HEAD — this branch adds only C++ changes ( |
…to WORDLIST Pre-existing issues blocking GHA on cpp-search HEAD. Regenerated via roxygen2::roxygenise(load_code = load_installed).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Agent E. Performance optimization for large-tree (180+ tip) Fitch scoring inner loop.
Changes:
FlatBlockstruct (24 bytes) replacesCharBlock(288 bytes) for hot-path metadata, reducing cache traffic during per-candidate indirect scoringValidation: