Fix: re-guard dispatch_shape() do-while against empty core mask by zhusy54 · Pull Request #569 · hw-native-sys/simpler

zhusy54 · 2026-04-15T10:54:56Z

Summary

Re-add the cores.has_value() guard before the do-while block dispatch loop in dispatch_shape() for both a2a3 and a5
Commit 9951499 (two-phase dispatch refactor) dropped the guard that PR Fix: guard idle-dispatch do-while against empty cluster mask #566 (af3b1db) had introduced, reintroducing the OOB regression
When a multi-block task drains all available cores mid-batch, the next task in the batch would enter the do-while unconditionally, calling cores.pop_first() on an empty bitmask (returns -1) and passing -1 as core_offset to dispatch_block(), causing OOB access

Root Cause

pop_ready_tasks_batch() pops up to want = cores.count() tasks, but a single multi-block task can consume all those cores in its inner dispatch loop. Subsequent tasks in the same batch then see an empty mask. The fix: if !cores.has_value() before entering dispatch, call push_batch() to re-enqueue the current and remaining batch tasks atomically, then break.

Testing

Existing regression test tests/st/a2a3/tensormap_and_ringbuffer/spmd_batch_dispatch_oob/ covers this exact scenario (two back-to-back MIX tasks with block_num=48 on 24 clusters)
pre-commit checks pass (clang-format, clang-tidy, cpplint, check-headers)

Related Issues

Regression introduced by #9951499; original fix was #566 (#565)

Commit 9951499 refactored the two-phase dispatch helper but dropped the guard that PR hw-native-sys#566 (af3b1db) had introduced. When a multi-block task drains all available cores mid-batch, the next task in the batch entered the do-while unconditionally, calling cores.pop_first() on an empty bitmask (returns -1) and passing -1 as core_offset to dispatch_block(), causing OOB access. Re-add the guard before dispatched_any = true in both a2a3 and a5: when cores are exhausted, push_batch() re-enqueues the current and remaining batch tasks atomically and breaks out of the for-loop. The existing regression test (spmd_batch_dispatch_oob) covers this.

gemini-code-assist

Code Review

This pull request introduces guards in the AICPU executor for both a2a3 and a5 runtimes to handle scenarios where a task batch exhausts available cores. When no cores are available, the remaining tasks in the batch are re-enqueued to the scheduler's ready queue. I have no feedback to provide.

gemini-code-assist bot reviewed Apr 15, 2026

View reviewed changes

poursoul approved these changes Apr 16, 2026

View reviewed changes

poursoul merged commit b130c22 into hw-native-sys:main Apr 16, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: re-guard dispatch_shape() do-while against empty core mask#569

Fix: re-guard dispatch_shape() do-while against empty core mask#569
poursoul merged 1 commit intohw-native-sys:mainfrom
zhusy54:dual-sched-MIX

zhusy54 commented Apr 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhusy54 commented Apr 15, 2026

Summary

Root Cause

Testing

Related Issues

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants