Fix: guard idle-dispatch do-while against empty cluster mask#566
Merged
poursoul merged 1 commit intohw-native-sys:mainfrom Apr 15, 2026
Merged
Conversation
bc42ada to
1911263
Compare
There was a problem hiding this comment.
Code Review
This pull request addresses a batch dispatch out-of-bounds (OOB) issue in the aicpu_executor by adding a guard to check for available clusters before popping from valid_cluster_states. If clusters are exhausted by a preceding task in a batch, the remaining tasks are now correctly re-enqueued. The PR also includes a comprehensive regression test suite consisting of AIC/AIV kernels, orchestration logic, and a Python test script. Feedback suggests optimizing the re-enqueueing logic by using push_batch for bulk operations instead of individual push calls to improve efficiency and reduce queue contention.
…ve-sys#565) When pop_ready_tasks_batch returns multiple tasks and the first task's do-while drains all idle clusters, subsequent tasks enter the do-while unconditionally (do-while executes body before checking guard), calling pop_first() on an empty bitmask. This returns -1 as cluster_offset, causing core_id_map_[-1] OOB access and undefined shift behavior that corrupts core_states_ and stalls the scheduler. Add a has_value() guard before the do-while: when clusters are exhausted mid-batch, re-enqueue via push_batch and break. Zero overhead on the common path (single branch, predicted taken).
1911263 to
65b362f
Compare
poursoul
approved these changes
Apr 15, 2026
zhusy54
pushed a commit
to zhusy54/simpler
that referenced
this pull request
Apr 15, 2026
Commit 9951499 refactored the two-phase dispatch helper but dropped the guard that PR hw-native-sys#566 (af3b1db) had introduced. When a multi-block task drains all available cores mid-batch, the next task in the batch entered the do-while unconditionally, calling cores.pop_first() on an empty bitmask (returns -1) and passing -1 as core_offset to dispatch_block(), causing OOB access. Re-add the guard before dispatched_any = true in both a2a3 and a5: when cores are exhausted, push_batch() re-enqueues the current and remaining batch tasks atomically and breaks out of the for-loop. The existing regression test (spmd_batch_dispatch_oob) covers this.
zhusy54
pushed a commit
to zhusy54/simpler
that referenced
this pull request
Apr 15, 2026
Commit 9951499 refactored the two-phase dispatch helper but dropped the guard that PR hw-native-sys#566 (af3b1db) had introduced. When a multi-block task drains all available cores mid-batch, the next task in the batch entered the do-while unconditionally, calling cores.pop_first() on an empty bitmask (returns -1) and passing -1 as core_offset to dispatch_block(), causing OOB access. Re-add the guard before dispatched_any = true in both a2a3 and a5: when cores are exhausted, push_batch() re-enqueues the current and remaining batch tasks atomically and breaks out of the for-loop. The existing regression test (spmd_batch_dispatch_oob) covers this.
2 tasks
poursoul
pushed a commit
that referenced
this pull request
Apr 16, 2026
Commit 9951499 refactored the two-phase dispatch helper but dropped the guard that PR #566 (af3b1db) had introduced. When a multi-block task drains all available cores mid-batch, the next task in the batch entered the do-while unconditionally, calling cores.pop_first() on an empty bitmask (returns -1) and passing -1 as core_offset to dispatch_block(), causing OOB access. Re-add the guard before dispatched_any = true in both a2a3 and a5: when cores are exhausted, push_batch() re-enqueues the current and remaining batch tasks atomically and breaks out of the for-loop. The existing regression test (spmd_batch_dispatch_oob) covers this. Co-authored-by: zhusy54 <zhusiyu1@hisilicon.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #565 — out-of-bounds
core_id_map_[-1]access in the idle-dispatch batch loop.pop_ready_tasks_batchreturns multiple tasks (got > 1) and the first task's do-while drains all idle clusters (common whenlogical_block_num >> cluster_count, e.g. paged-attention with block_num=256 on 24 clusters), subsequent tasks enter the do-while unconditionally — becausedo { ... } while(...)executes the body before checking the guard — callingpop_first()on an empty bitmask, which returns -1-1is used ascluster_offset→core_id_map_[-1](OOB read) and1ULL << -1(undefined shift), corruptingcore_states_and stalling the schedulerhas_value()guard before the do-while; when clusters are exhausted mid-batch, re-enqueue the current and remaining tasks andbreak. Zero overhead on the common path (single predicted-taken branch)a2a3anda5variantsTest plan
spmd_batch_dispatch_oob: submits 2 back-to-back MIX tasks withblock_num=48(2x cluster count) to forcegot=2and trigger the OOB scenario — passes with fix, would crash/stall withoutspmd_multiblock_mixtest: passes (no regression)