Skip to content

T-261+T-262: Eliminate std::fill zeroing, memcpy tip loading (8.6% TBR speedup)#232

Merged
ms609 merged 3 commits into
cpp-searchfrom
feature/eliminate-fill
Mar 26, 2026
Merged

T-261+T-262: Eliminate std::fill zeroing, memcpy tip loading (8.6% TBR speedup)#232
ms609 merged 3 commits into
cpp-searchfrom
feature/eliminate-fill

Conversation

@ms609
Copy link
Copy Markdown
Owner

@ms609 ms609 commented Mar 26, 2026

Agent E.

Changes

T-261: Remove redundant std::fill zeroing in reset_states()

Audited all 5 scoring passes (prelim, final_, down2, subtree_actives, local_cost). Every entry is written before read by the Fitch downpass/uppass/NA passes. The zeroing was provably redundant. Removed 5 std::fill(0) calls from reset_states() in ts_tree.cpp.

T-262: Bulk memcpy for tip state loading

Replaced element-by-element tip copy in load_tip_states() with std::memcpy() for contiguous tip regions (prelim and final_ arrays).

Performance

Interleaved A/B benchmark (5 pairs, Dikow2009, 88 tips):

  • Baseline median: 14.92s
  • Optimized median: 13.64s
  • Speedup: 8.6%

Testing

  • GHA 23598212017: PASS (ARM64 + Windows)
  • 10933 tests pass, 0 fail
  • Score verification: Vinther2008=83, Longrich2010=131, DeAssis2011=64 (all correct)

Also includes a test robustness fix: test-ts-na-incremental.R:174 timeout test now disables perturbStopFactor to prevent false negative on fast hardware.

ms609 added 3 commits March 26, 2026 13:39
T-261: Remove all 5 std::fill(0) calls from reset_states().
Every array entry read by score_tree/fitch_na_score is written before
read: prelim by pass 1, final_ by pass 2, down2 by pass 3,
subtree_actives by pass 1+3, local_cost by pass 1.

T-262: Replace element-by-element tip copy loop with bulk memcpy
for prelim and final_ arrays (contiguous in memory).

Interleaved A/B benchmark (Dikow2009, 88t, 5 pairs):
  Baseline median: 14.92s
  Optimized median: 13.64s
  Speedup: 8.6%

1059 targeted tests pass (469 scoring/search + 590 fuse/parallel/etc).
Score verification: Vinther2008=83(NJ), Longrich2010=131, DeAssis2011=64.
T-258 added intraFuse but didn't re-run roxygenise.
Fixes codoc mismatch WARNING in R CMD check.
The timeout test expects timed_out=TRUE, but on fast ARM64 hardware,
perturbStopFactor triggers after 46 consecutive non-improving reps
(~23ms at 23 tips) before the 50ms timeout. Disable perturbation
stopping in this test so the timeout is the only exit path.
@ms609 ms609 merged commit bb287f1 into cpp-search Mar 26, 2026
11 of 14 checks passed
@ms609 ms609 deleted the feature/eliminate-fill branch March 27, 2026 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant