Skip to content

Fix: guard idle-dispatch do-while against empty cluster mask#566

Merged
poursoul merged 1 commit intohw-native-sys:mainfrom
chenshengxin2026:fix/dispatch-cluster-offset-oob
Apr 15, 2026
Merged

Fix: guard idle-dispatch do-while against empty cluster mask#566
poursoul merged 1 commit intohw-native-sys:mainfrom
chenshengxin2026:fix/dispatch-cluster-offset-oob

Conversation

@chenshengxin2026
Copy link
Copy Markdown
Contributor

@chenshengxin2026 chenshengxin2026 commented Apr 15, 2026

Summary

Fixes #565 — out-of-bounds core_id_map_[-1] access in the idle-dispatch batch loop.

  • When pop_ready_tasks_batch returns multiple tasks (got > 1) and the first task's do-while drains all idle clusters (common when logical_block_num >> cluster_count, e.g. paged-attention with block_num=256 on 24 clusters), subsequent tasks enter the do-while unconditionally — because do { ... } while(...) executes the body before checking the guard — calling pop_first() on an empty bitmask, which returns -1
  • -1 is used as cluster_offsetcore_id_map_[-1] (OOB read) and 1ULL << -1 (undefined shift), corrupting core_states_ and stalling the scheduler
  • Fix: add has_value() guard before the do-while; when clusters are exhausted mid-batch, re-enqueue the current and remaining tasks and break. Zero overhead on the common path (single predicted-taken branch)
  • Applied to both a2a3 and a5 variants

Test plan

  • New test spmd_batch_dispatch_oob: submits 2 back-to-back MIX tasks with block_num=48 (2x cluster count) to force got=2 and trigger the OOB scenario — passes with fix, would crash/stall without
  • Existing spmd_multiblock_mix test: passes (no regression)
  • Hardware device test on a2a3

@chenshengxin2026 chenshengxin2026 force-pushed the fix/dispatch-cluster-offset-oob branch from bc42ada to 1911263 Compare April 15, 2026 06:46
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a batch dispatch out-of-bounds (OOB) issue in the aicpu_executor by adding a guard to check for available clusters before popping from valid_cluster_states. If clusters are exhausted by a preceding task in a batch, the remaining tasks are now correctly re-enqueued. The PR also includes a comprehensive regression test suite consisting of AIC/AIV kernels, orchestration logic, and a Python test script. Feedback suggests optimizing the re-enqueueing logic by using push_batch for bulk operations instead of individual push calls to improve efficiency and reduce queue contention.

Comment thread src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp Outdated
Comment thread src/a5/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp Outdated
…ve-sys#565)

When pop_ready_tasks_batch returns multiple tasks and the first task's
do-while drains all idle clusters, subsequent tasks enter the do-while
unconditionally (do-while executes body before checking guard), calling
pop_first() on an empty bitmask. This returns -1 as cluster_offset,
causing core_id_map_[-1] OOB access and undefined shift behavior that
corrupts core_states_ and stalls the scheduler.

Add a has_value() guard before the do-while: when clusters are exhausted
mid-batch, re-enqueue via push_batch and break. Zero overhead on the
common path (single branch, predicted taken).
@chenshengxin2026 chenshengxin2026 force-pushed the fix/dispatch-cluster-offset-oob branch from 1911263 to 65b362f Compare April 15, 2026 07:10
@poursoul poursoul merged commit af3b1db into hw-native-sys:main Apr 15, 2026
15 checks passed
zhusy54 pushed a commit to zhusy54/simpler that referenced this pull request Apr 15, 2026
Commit 9951499 refactored the two-phase dispatch helper but dropped
the guard that PR hw-native-sys#566 (af3b1db) had introduced. When a multi-block
task drains all available cores mid-batch, the next task in the batch
entered the do-while unconditionally, calling cores.pop_first() on an
empty bitmask (returns -1) and passing -1 as core_offset to
dispatch_block(), causing OOB access.

Re-add the guard before dispatched_any = true in both a2a3 and a5:
when cores are exhausted, push_batch() re-enqueues the current and
remaining batch tasks atomically and breaks out of the for-loop.
The existing regression test (spmd_batch_dispatch_oob) covers this.
zhusy54 pushed a commit to zhusy54/simpler that referenced this pull request Apr 15, 2026
Commit 9951499 refactored the two-phase dispatch helper but dropped
the guard that PR hw-native-sys#566 (af3b1db) had introduced. When a multi-block
task drains all available cores mid-batch, the next task in the batch
entered the do-while unconditionally, calling cores.pop_first() on an
empty bitmask (returns -1) and passing -1 as core_offset to
dispatch_block(), causing OOB access.

Re-add the guard before dispatched_any = true in both a2a3 and a5:
when cores are exhausted, push_batch() re-enqueues the current and
remaining batch tasks atomically and breaks out of the for-loop.
The existing regression test (spmd_batch_dispatch_oob) covers this.
poursoul pushed a commit that referenced this pull request Apr 16, 2026
Commit 9951499 refactored the two-phase dispatch helper but dropped
the guard that PR #566 (af3b1db) had introduced. When a multi-block
task drains all available cores mid-batch, the next task in the batch
entered the do-while unconditionally, calling cores.pop_first() on an
empty bitmask (returns -1) and passing -1 as core_offset to
dispatch_block(), causing OOB access.

Re-add the guard before dispatched_any = true in both a2a3 and a5:
when cores are exhausted, push_batch() re-enqueues the current and
remaining batch tasks atomically and breaks out of the for-loop.
The existing regression test (spmd_batch_dispatch_oob) covers this.

Co-authored-by: zhusy54 <zhusiyu1@hisilicon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: dispatch batch loop OOB when SPMD task drains all idle clusters

2 participants