Fix: guard idle-dispatch do-while against empty cluster mask by chenshengxin2026 · Pull Request #566 · hw-native-sys/simpler

chenshengxin2026 · 2026-04-15T06:44:54Z

Summary

Fixes #565 — out-of-bounds core_id_map_[-1] access in the idle-dispatch batch loop.

When pop_ready_tasks_batch returns multiple tasks (got > 1) and the first task's do-while drains all idle clusters (common when logical_block_num >> cluster_count, e.g. paged-attention with block_num=256 on 24 clusters), subsequent tasks enter the do-while unconditionally — because do { ... } while(...) executes the body before checking the guard — calling pop_first() on an empty bitmask, which returns -1
-1 is used as cluster_offset → core_id_map_[-1] (OOB read) and 1ULL << -1 (undefined shift), corrupting core_states_ and stalling the scheduler
Fix: add has_value() guard before the do-while; when clusters are exhausted mid-batch, re-enqueue the current and remaining tasks and break. Zero overhead on the common path (single predicted-taken branch)
Applied to both a2a3 and a5 variants

Test plan

New test spmd_batch_dispatch_oob: submits 2 back-to-back MIX tasks with block_num=48 (2x cluster count) to force got=2 and trigger the OOB scenario — passes with fix, would crash/stall without
Existing spmd_multiblock_mix test: passes (no regression)
Hardware device test on a2a3

gemini-code-assist

Code Review

This pull request addresses a batch dispatch out-of-bounds (OOB) issue in the aicpu_executor by adding a guard to check for available clusters before popping from valid_cluster_states. If clusters are exhausted by a preceding task in a batch, the remaining tasks are now correctly re-enqueued. The PR also includes a comprehensive regression test suite consisting of AIC/AIV kernels, orchestration logic, and a Python test script. Feedback suggests optimizing the re-enqueueing logic by using push_batch for bulk operations instead of individual push calls to improve efficiency and reduce queue contention.

…ve-sys#565) When pop_ready_tasks_batch returns multiple tasks and the first task's do-while drains all idle clusters, subsequent tasks enter the do-while unconditionally (do-while executes body before checking guard), calling pop_first() on an empty bitmask. This returns -1 as cluster_offset, causing core_id_map_[-1] OOB access and undefined shift behavior that corrupts core_states_ and stalls the scheduler. Add a has_value() guard before the do-while: when clusters are exhausted mid-batch, re-enqueue via push_batch and break. Zero overhead on the common path (single branch, predicted taken).

Commit 9951499 refactored the two-phase dispatch helper but dropped the guard that PR hw-native-sys#566 (af3b1db) had introduced. When a multi-block task drains all available cores mid-batch, the next task in the batch entered the do-while unconditionally, calling cores.pop_first() on an empty bitmask (returns -1) and passing -1 as core_offset to dispatch_block(), causing OOB access. Re-add the guard before dispatched_any = true in both a2a3 and a5: when cores are exhausted, push_batch() re-enqueues the current and remaining batch tasks atomically and breaks out of the for-loop. The existing regression test (spmd_batch_dispatch_oob) covers this.

Commit 9951499 refactored the two-phase dispatch helper but dropped the guard that PR #566 (af3b1db) had introduced. When a multi-block task drains all available cores mid-batch, the next task in the batch entered the do-while unconditionally, calling cores.pop_first() on an empty bitmask (returns -1) and passing -1 as core_offset to dispatch_block(), causing OOB access. Re-add the guard before dispatched_any = true in both a2a3 and a5: when cores are exhausted, push_batch() re-enqueues the current and remaining batch tasks atomically and breaks out of the for-loop. The existing regression test (spmd_batch_dispatch_oob) covers this. Co-authored-by: zhusy54 <zhusiyu1@hisilicon.com>

chenshengxin2026 force-pushed the fix/dispatch-cluster-offset-oob branch from bc42ada to 1911263 Compare April 15, 2026 06:46

gemini-code-assist bot reviewed Apr 15, 2026

View reviewed changes

Comment thread src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp Outdated

Comment thread src/a5/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp Outdated

chenshengxin2026 force-pushed the fix/dispatch-cluster-offset-oob branch from 1911263 to 65b362f Compare April 15, 2026 07:10

poursoul approved these changes Apr 15, 2026

View reviewed changes

poursoul merged commit af3b1db into hw-native-sys:main Apr 15, 2026
15 checks passed

zhusy54 mentioned this pull request Apr 15, 2026

Fix: re-guard dispatch_shape() do-while against empty core mask #569

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: guard idle-dispatch do-while against empty cluster mask#566

Fix: guard idle-dispatch do-while against empty cluster mask#566
poursoul merged 1 commit intohw-native-sys:mainfrom
chenshengxin2026:fix/dispatch-cluster-offset-oob

chenshengxin2026 commented Apr 15, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chenshengxin2026 commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chenshengxin2026 commented Apr 15, 2026 •

edited

Loading