Platform
a2a3 (Ascend 910B/C hardware)
Runtime Variant
tensormap_and_ringbuffer
Description
When aligning the standalone high-performance paged-attention test scripts with Case1 from
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention/test_spmd_paged_attention.py,
both of the newly added cases fail at runtime on hardware instead of completing successfully.
Affected files:
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/test_pa_accuracy.py
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/bench_pa_performance.py
Reference shape from spmd_paged_attention Case1:
batch=256
num_heads=16
kv_head_num=1
head_dim=128
block_size=128
context_len=8192
max_model_len=32768
dtype=bfloat16
The newly added highperf cases were intended to match that shape, but both crash:
bench_pa_performance.py: ("Qwen3-8B b256 h16/kv1 kv8192", 256, 16, 1, 128, 8192, 128)
test_pa_accuracy.py: {"batch": 256, "num_heads": 16, "num_kv_heads": 1, "head_dim": 128, "kv_seq": 8192, "block_size": 128}
Steps to Reproduce
- Use commit
57e7a6dd3ac15a28c08b878716a171e65420f26a.
- Modify the highperf scripts to add the Case1-aligned shape:
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/bench_pa_performance.py
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/test_pa_accuracy.py
- Build the standalone kernel library if needed:
cd tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels
bash ./compile.sh
- Run the accuracy script:
python ./test_pa_accuracy.py
- Run the benchmark script:
python ./bench_pa_performance.py --bf16
Expected Behavior
The Case1-aligned shape should run successfully in both scripts, matching the behavior of
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention/test_spmd_paged_attention.py
Case1, and should produce either correctness results (test_pa_accuracy.py) or benchmark
numbers (bench_pa_performance.py) without device/runtime exceptions.
Actual Behavior
Both scripts fail on hardware.
Observed errors include:
EE9999[PID: 3700733] 2026-04-25-16:02:42.600.249 (EE9999): rtDeviceSynchronizeWithTimeout execution failed,
reason=aicore exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
TraceBack (most recent call last):
wait for compute device to finish failed, runtime result = 507015.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]
and
RuntimeError: npuSynchronizeDevice:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:564 NPU function error:
SUSPECT REMOTE ERROR, error code is 507057
Git Commit ID
57e7a6d
CANN Version
8.5.0.alpha001
Driver Version
Unknown
Host Platform
Linux (aarch64)
Additional Context
Relevant reference case:
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention/test_spmd_paged_attention.py Case1
Relevant modified files:
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/test_pa_accuracy.py
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/bench_pa_performance.py
This issue is related to the Case1-alignment work associated with PR #655.
Platform
a2a3 (Ascend 910B/C hardware)
Runtime Variant
tensormap_and_ringbuffer
Description
When aligning the standalone high-performance paged-attention test scripts with
Case1fromtests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention/test_spmd_paged_attention.py,both of the newly added cases fail at runtime on hardware instead of completing successfully.
Affected files:
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/test_pa_accuracy.pytests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/bench_pa_performance.pyReference shape from
spmd_paged_attentionCase1:batch=256num_heads=16kv_head_num=1head_dim=128block_size=128context_len=8192max_model_len=32768dtype=bfloat16The newly added highperf cases were intended to match that shape, but both crash:
bench_pa_performance.py:("Qwen3-8B b256 h16/kv1 kv8192", 256, 16, 1, 128, 8192, 128)test_pa_accuracy.py:{"batch": 256, "num_heads": 16, "num_kv_heads": 1, "head_dim": 128, "kv_seq": 8192, "block_size": 128}Steps to Reproduce
57e7a6dd3ac15a28c08b878716a171e65420f26a.tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/bench_pa_performance.pytests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/test_pa_accuracy.pycd tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels bash ./compile.shExpected Behavior
The Case1-aligned shape should run successfully in both scripts, matching the behavior of
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention/test_spmd_paged_attention.pyCase1, and should produce either correctness results (
test_pa_accuracy.py) or benchmarknumbers (
bench_pa_performance.py) without device/runtime exceptions.Actual Behavior
Both scripts fail on hardware.
Observed errors include:
and
Git Commit ID
57e7a6d
CANN Version
8.5.0.alpha001
Driver Version
Unknown
Host Platform
Linux (aarch64)
Additional Context
Relevant reference case:
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention/test_spmd_paged_attention.pyCase1Relevant modified files:
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/test_pa_accuracy.pytests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/bench_pa_performance.pyThis issue is related to the Case1-alignment work associated with PR #655.