[Autoloop: tsb-perf-evolve]#249
Merged
Merged
Conversation
Combines the pre-partition loop and the separate radix key-init loop into a single O(n) pass, saving one full O(n) loop. Also uses compact fvals indexing (fvals[finCount] instead of fvals[origIdx]) for sequential memory access, and gathers directly from srcIdx after the radix sort, eliminating the intermediate copy-back-to-finSlice step (another O(finCount) loop saved). Net savings per call at n=100k: two fewer O(n) typed-array loop passes. Run: https://github.com/githubnext/tsessebe/actions/runs/25142029767 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 This PR is maintained by Autoloop. Each accepted iteration adds a commit to this branch.
Goal
Evolve
Series.sortValuesto be as fast as (or faster than) pandasSeries.sort_valueson a 100k numeric benchmark. Metric:tsb_mean_ms / pandas_mean_ms(lower is better; < 1.0 means tsb beats pandas).Current best metric: 27.999 (c003 — 155ms vs 5.56ms pandas)
Program issue: #189
State file:
tsb-perf-evolve.mdIteration 28 — Merge partition + radix-init into one pass
Combines the pre-partition scan and the separate radix key-initialisation loop into a single O(n) pass. Also switches
fvalsto compact (sequential) indexing and gathers output directly fromsrcIdxafter the radix sort, eliminating an intermediate O(finCount) copy-back step.Mechanism: saves ~2 × O(n) typed-array loop passes per
sortValuescall — fewer iterations through memory, better cache utilisation, less work for the JIT.