Fix SIMD scatter kernel rank computation bug#18
Merged
Conversation
Adding CLAUDE.md with task information for AI processing. This file will be removed when the task is complete. Issue: #17
Root cause: Only simd_lane == 0 wrote digit counts to simd_digit_counts, leaving counts for other digits in the SIMD group as zero. This caused incorrect rank computation and output positions. Fix: Changed condition from (simd_lane == 0) to (simd_rank == 0) so that the first thread for each digit in the SIMD group writes its count. Added case study in docs/case-studies/issue-17/. Fixes #17 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This reverts commit cae5104.
Owner
Author
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost estimation:
Now working session is ended, feel free to review and add any feedback on the solution draft. |
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes the GPU radix sort (SIMD) verification failure reported in Issue #17.
Root Cause
The bug was in
shaders/radix_sort.metalat lines 325-328 in theradix_scatter_simdkernel:The Problem: Only
simd_lane == 0wrote tosimd_digit_counts, but it only wrote the count for its own digit. This left counts for all other digits in the SIMD group as zero.Example: If SIMD group 0 has threads with digits [3, 5, 3, 7, 5, ...]:
The Fix
Changed the condition from
simd_lane == 0tosimd_rank == 0:This ensures that the first thread for each unique digit in the SIMD group writes its count, so all digits present get their counts correctly recorded.
Changes
shaders/radix_sort.metalsimd_lane == 0tosimd_rank == 0docs/case-studies/issue-17/analysis.mdTest Plan
cargo fmt -- --checkpassescargo clippy -- -D warningspassescargo testpasses (47 tests)cargo run --release -- 2684354(requires manual verification on Apple Silicon)Expected Results After Merge
Running
cargo run --release -- 2684354on macOS with Apple Silicon should show:Fixes #17
🤖 Generated with Claude Code