Skip to content

[Enhancement] PR#655 related: Case1-aligned spmd_paged_attention_highperf cases crash with aicore exception on a2a3 #677

@chenshengxin2026

Description

@chenshengxin2026

Platform

a2a3 (Ascend 910B/C hardware)

Runtime Variant

tensormap_and_ringbuffer

Description

When aligning the standalone high-performance paged-attention test scripts with Case1 from
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention/test_spmd_paged_attention.py,
both of the newly added cases fail at runtime on hardware instead of completing successfully.

Affected files:

  • tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/test_pa_accuracy.py
  • tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/bench_pa_performance.py

Reference shape from spmd_paged_attention Case1:

  • batch=256
  • num_heads=16
  • kv_head_num=1
  • head_dim=128
  • block_size=128
  • context_len=8192
  • max_model_len=32768
  • dtype=bfloat16

The newly added highperf cases were intended to match that shape, but both crash:

  • bench_pa_performance.py: ("Qwen3-8B b256 h16/kv1 kv8192", 256, 16, 1, 128, 8192, 128)
  • test_pa_accuracy.py: {"batch": 256, "num_heads": 16, "num_kv_heads": 1, "head_dim": 128, "kv_seq": 8192, "block_size": 128}

Steps to Reproduce

  1. Use commit 57e7a6dd3ac15a28c08b878716a171e65420f26a.
  2. Modify the highperf scripts to add the Case1-aligned shape:
    • tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/bench_pa_performance.py
    • tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/test_pa_accuracy.py
  3. Build the standalone kernel library if needed:
    cd tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels
    bash ./compile.sh
  4. Run the accuracy script:
    python ./test_pa_accuracy.py
  5. Run the benchmark script:
    python ./bench_pa_performance.py --bf16

Expected Behavior

The Case1-aligned shape should run successfully in both scripts, matching the behavior of
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention/test_spmd_paged_attention.py
Case1, and should produce either correctness results (test_pa_accuracy.py) or benchmark
numbers (bench_pa_performance.py) without device/runtime exceptions.

Actual Behavior

Both scripts fail on hardware.

Observed errors include:

EE9999[PID: 3700733] 2026-04-25-16:02:42.600.249 (EE9999):  rtDeviceSynchronizeWithTimeout execution failed,
reason=aicore exception[FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:65]
TraceBack (most recent call last):
wait for compute device to finish failed, runtime result = 507015.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:148]

and

RuntimeError: npuSynchronizeDevice:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:564 NPU function error:
SUSPECT REMOTE ERROR, error code is 507057

Git Commit ID

57e7a6d

CANN Version

8.5.0.alpha001

Driver Version

Unknown

Host Platform

Linux (aarch64)

Additional Context

Relevant reference case:

  • tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention/test_spmd_paged_attention.py Case1

Relevant modified files:

  • tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/test_pa_accuracy.py
  • tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention_highperf/kernels/bench_pa_performance.py

This issue is related to the Case1-alignment work associated with PR #655.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions