T-261+T-262: Eliminate std::fill zeroing, memcpy tip loading (8.6% TBR speedup)#232
Merged
Conversation
T-261: Remove all 5 std::fill(0) calls from reset_states(). Every array entry read by score_tree/fitch_na_score is written before read: prelim by pass 1, final_ by pass 2, down2 by pass 3, subtree_actives by pass 1+3, local_cost by pass 1. T-262: Replace element-by-element tip copy loop with bulk memcpy for prelim and final_ arrays (contiguous in memory). Interleaved A/B benchmark (Dikow2009, 88t, 5 pairs): Baseline median: 14.92s Optimized median: 13.64s Speedup: 8.6% 1059 targeted tests pass (469 scoring/search + 590 fuse/parallel/etc). Score verification: Vinther2008=83(NJ), Longrich2010=131, DeAssis2011=64.
T-258 added intraFuse but didn't re-run roxygenise. Fixes codoc mismatch WARNING in R CMD check.
The timeout test expects timed_out=TRUE, but on fast ARM64 hardware, perturbStopFactor triggers after 46 consecutive non-improving reps (~23ms at 23 tips) before the 50ms timeout. Disable perturbation stopping in this test so the timeout is the only exit path.
ms609
added a commit
that referenced
this pull request
Mar 26, 2026
ms609
added a commit
that referenced
this pull request
Mar 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Agent E.
Changes
T-261: Remove redundant
std::fillzeroing inreset_states()Audited all 5 scoring passes (
prelim,final_,down2,subtree_actives,local_cost). Every entry is written before read by the Fitch downpass/uppass/NA passes. The zeroing was provably redundant. Removed 5std::fill(0)calls fromreset_states()ints_tree.cpp.T-262: Bulk memcpy for tip state loading
Replaced element-by-element tip copy in
load_tip_states()withstd::memcpy()for contiguous tip regions (prelimandfinal_arrays).Performance
Interleaved A/B benchmark (5 pairs, Dikow2009, 88 tips):
Testing
Also includes a test robustness fix:
test-ts-na-incremental.R:174timeout test now disablesperturbStopFactorto prevent false negative on fast hardware.