Skip to content

add _native_npu_attention support mask shape like [B,1,1,S]#13490

Open
chang-zhijie wants to merge 1 commit intohuggingface:mainfrom
chang-zhijie:native_npu
Open

add _native_npu_attention support mask shape like [B,1,1,S]#13490
chang-zhijie wants to merge 1 commit intohuggingface:mainfrom
chang-zhijie:native_npu

Conversation

@chang-zhijie
Copy link
Copy Markdown

@chang-zhijie chang-zhijie commented Apr 16, 2026

This PR resolves the unsupported atten_mask shape error when running attention with NPU (Ascend) devices.

Problem:
The NPU's fusion attention operator (e.g., npu_fusion_attention) does not support automatic broadcasting for attention masks.
When a mask of shape [batch, seq_len] or [batch, 1, 1, seq_len] is passed, the operator fails with an error similar to:
get unsupported atten_mask shape, the shape is [B, 1, 1, S] – while only shapes like [B, N, S, S], [B, 1, S, S], [1, 1, S, S], or [S, S] are accepted.

Solution:
When running on NPU, explicitly expand the mask to [B, 1, S, S] to satisfy the operator’s shape constraints.

Reference:
Ascend NPU fusion attention API:
https://www.hiascend.com/document/detail/zh/Pytorch/730/apiref/torchnpuCustomsapi/docs/context/torch_npu-npu_fusion_attention.md

Here is a code for ErnieImage example of using the NPU backend:

python
import torch
import torch_npu
from diffusers import ErnieImagePipeline
from diffusers.utils import load_image

pipe = ErnieImagePipeline.from_pretrained("/model_dir/ERNIE-Image", torch_dtype=torch.bfloat16)
pipe = pipe.to("npu")
pipe.transformer.set_attention_backend("_native_npu")
generator = torch.Generator(device="npu")

prompt = "A black and white Chinese rural dog"
images = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    num_inference_steps=50,
    guidance_scale=5.0,
    generator=generator,
    use_pe=True,
).images
images[0].save("ernie-image-output.png")

@github-actions github-actions bot added models size/S PR with diff < 50 LOC labels Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

models size/S PR with diff < 50 LOC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant