Skip to content

Fix: re-guard dispatch_shape() do-while against empty core mask#569

Merged
poursoul merged 1 commit intohw-native-sys:mainfrom
zhusy54:dual-sched-MIX
Apr 16, 2026
Merged

Fix: re-guard dispatch_shape() do-while against empty core mask#569
poursoul merged 1 commit intohw-native-sys:mainfrom
zhusy54:dual-sched-MIX

Conversation

@zhusy54
Copy link
Copy Markdown
Contributor

@zhusy54 zhusy54 commented Apr 15, 2026

Summary

  • Re-add the cores.has_value() guard before the do-while block dispatch loop in dispatch_shape() for both a2a3 and a5
  • Commit 9951499 (two-phase dispatch refactor) dropped the guard that PR Fix: guard idle-dispatch do-while against empty cluster mask #566 (af3b1db) had introduced, reintroducing the OOB regression
  • When a multi-block task drains all available cores mid-batch, the next task in the batch would enter the do-while unconditionally, calling cores.pop_first() on an empty bitmask (returns -1) and passing -1 as core_offset to dispatch_block(), causing OOB access

Root Cause

pop_ready_tasks_batch() pops up to want = cores.count() tasks, but a single multi-block task can consume all those cores in its inner dispatch loop. Subsequent tasks in the same batch then see an empty mask. The fix: if !cores.has_value() before entering dispatch, call push_batch() to re-enqueue the current and remaining batch tasks atomically, then break.

Testing

  • Existing regression test tests/st/a2a3/tensormap_and_ringbuffer/spmd_batch_dispatch_oob/ covers this exact scenario (two back-to-back MIX tasks with block_num=48 on 24 clusters)
  • pre-commit checks pass (clang-format, clang-tidy, cpplint, check-headers)

Related Issues

Regression introduced by #9951499; original fix was #566 (#565)

Commit 9951499 refactored the two-phase dispatch helper but dropped
the guard that PR hw-native-sys#566 (af3b1db) had introduced. When a multi-block
task drains all available cores mid-batch, the next task in the batch
entered the do-while unconditionally, calling cores.pop_first() on an
empty bitmask (returns -1) and passing -1 as core_offset to
dispatch_block(), causing OOB access.

Re-add the guard before dispatched_any = true in both a2a3 and a5:
when cores are exhausted, push_batch() re-enqueues the current and
remaining batch tasks atomically and breaks out of the for-loop.
The existing regression test (spmd_batch_dispatch_oob) covers this.
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces guards in the AICPU executor for both a2a3 and a5 runtimes to handle scenarios where a task batch exhausts available cores. When no cores are available, the remaining tasks in the batch are re-enqueued to the scheduler's ready queue. I have no feedback to provide.

@poursoul poursoul merged commit b130c22 into hw-native-sys:main Apr 16, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants