Platform
a5sim (Ascend 950 simulation)
Runtime Variant
tensormap_and_ringbuffer
Description
During drain mode, core_trackers_[t].core_states_ (a plain uint64_t, non-atomic, unprotected) is subject to two classes of data race: read-write conflict and write-write conflict.
The drain_ack_mask barrier only guarantees that a thread has stopped issuing new dispatches. It does not guarantee that the thread has stopped completion polling (change_core_state). As a result, a thread that acks but returns early (because all_acked has not yet been reached) immediately re-enters the main scheduling loop and can call check_running_cores_for_completion, which writes core_states_. Concurrently, another thread may have already been elected and entered drain_worker_dispatch, reading and writing the same core_states_.
Race types
| Race type |
Concurrent party A |
Concurrent party B |
| Read-write conflict |
Thread 1 writes core_states_[1] (change_core_state, L451) |
Thread 2 (elected) reads core_states_[1] (get_valid_cluster_offset_states) |
| Write-write conflict |
Thread 1 writes core_states_[1] (change_core_state, L451) |
Thread 2 (elected) writes core_states_[1] (change_core_state, L631) |
Interleaving that triggers the race
Thread 0: ack → ack_mask=0x1 ≠ 0x7 → return to main loop
Thread 1: ack → ack_mask=0x3 ≠ 0x7 → return to main loop
Thread 1: [back in main loop] → change_core_state(bit_pos) ← writes core_trackers_[1].core_states_
← Thread 1's ack bit is still set in mask
Thread 2: ack → ack_mask=0x7 == all_acked → elected → drain_worker_dispatch
Thread 2: get_valid_cluster_offset_states() ← reads core_states_[1]
Thread 2: change_core_state(...) ← writes core_states_[1]
↑ Thread 1 and Thread 2 concurrently access the same core_states_ (data race, UB)
Steps to Reproduce
Insert the following before `tracker.change_core_state(bit_pos)` at L451 of `aicpu_executor.cpp`:
if ((drain_state_.drain_ack_mask.load(std::memory_order_relaxed)) != 0) { usleep(1000); }
assert((drain_state_.drain_worker_elected.load(std::memory_order_relaxed)) == 0);
Then increase the parameters of the `examples/a5/tensormap_and_ringbuffer/spmd_sync_start_stress` example (and its corresponding golden file) and run. The assert will fail with low probability. An assert failure confirms that the race-prone interleaving was reached; actual memory corruption is undefined behavior and may not produce a visible wrong result on every run.
Expected Behavior
正常运行,assert不失败
Actual Behavior
assert失败
Git Commit ID
8d5f25b
CANN Version
No response
Driver Version
No response
Host Platform
Linux (aarch64)
Additional Context
No response
Platform
a5sim (Ascend 950 simulation)
Runtime Variant
tensormap_and_ringbuffer
Description
During drain mode,
core_trackers_[t].core_states_(a plainuint64_t, non-atomic, unprotected) is subject to two classes of data race: read-write conflict and write-write conflict.The
drain_ack_maskbarrier only guarantees that a thread has stopped issuing new dispatches. It does not guarantee that the thread has stopped completion polling (change_core_state). As a result, a thread that acks but returns early (becauseall_ackedhas not yet been reached) immediately re-enters the main scheduling loop and can callcheck_running_cores_for_completion, which writescore_states_. Concurrently, another thread may have already been elected and entereddrain_worker_dispatch, reading and writing the samecore_states_.Race types
core_states_[1](change_core_state, L451)core_states_[1](get_valid_cluster_offset_states)core_states_[1](change_core_state, L451)core_states_[1](change_core_state, L631)Interleaving that triggers the race
Steps to Reproduce
Expected Behavior
正常运行,assert不失败
Actual Behavior
assert失败
Git Commit ID
8d5f25b
CANN Version
No response
Driver Version
No response
Host Platform
Linux (aarch64)
Additional Context
No response