[WebNN EP] Support MultiHeadAttention(MHA) by peishenyan · Pull Request #24079 · microsoft/onnxruntime

peishenyan · 2025-03-18T06:39:40Z

Description

Adds support for MultiHeadAttention via WebNN matmul, transpose, reshape, and other operations that follow the logic in the MHA subgraph below

 Abbreviatios: B is batch_size, S is sequence_length, W is hidden_size, P is past_sequence_length
               N is number of attention heads, H is head size, and W=N*H, h=Sqrt(H)
    Notes: If the datatype of the inputs (qkv and past kv) is float16, we cast them to float32 to ensure data precision.

                 query     key     value
                   |        |        |
           q_Reshape   k_Reshape   v_Reshape  (shape=B,S,H,N)
                   |        |        |
          q_Transpose  k_Transpose v_Transpose (perm=0,2,1,3)
             \           /           |
              \         /            |
present_key<---\----Concat <---------|----past_key
               |      |              |
               |  opt_k_transpose    |
               \  (0,1,3,2)          |
                \    /               |  past_value
                qk_MatMul            |     /
                     |  scale        |    /
                     |   /           |   /
                  qk_Div           Concat------> present_value
                      |              |
                      |              /
                     Add <----------/---------------attention_bias
                      |            /
                    Softmax       /
                       \         /
                        \       /
                      qkv_MatMul
                             |
                          Transpose (perm=0,2,1,3)
                             |
                          Reshape---(shape=B,P,W)
                             |
                           output

Motivation and Context

peishenyan · 2025-04-10T14:13:16Z

Made some modifications according to #23416.
Now, this PR is ready for review. @Honry @fdwr @guschmue PTAL. Thanks.

Honry

Thanks @peishenyan, some comments and pls. add this new op to the webnn-operators.md file.

onnxruntime/core/providers/webnn/builders/impl/attention_helper.h

onnxruntime/core/providers/webnn/builders/impl/mha_op_builder.cc

fdwr · 2025-04-11T04:27:16Z

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline,Windows GPU WebGPU CI Pipeline,Windows OpenVINO CI Pipeline

fdwr · 2025-04-11T04:27:19Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

fdwr · 2025-04-11T04:27:22Z

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI

fdwr · 2025-04-11T04:27:23Z

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

azure-pipelines · 2025-04-11T04:27:26Z

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines · 2025-04-11T04:27:34Z

Azure Pipelines successfully started running 2 pipeline(s).

azure-pipelines · 2025-04-11T04:27:37Z

Azure Pipelines successfully started running 3 pipeline(s).

azure-pipelines · 2025-04-11T04:27:38Z

Azure Pipelines successfully started running 4 pipeline(s).

fdwr · 2025-04-12T02:39:43Z

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline,Windows GPU WebGPU CI Pipeline,Windows OpenVINO CI Pipeline

fdwr · 2025-04-12T02:39:46Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

fdwr · 2025-04-12T02:39:48Z

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI

fdwr · 2025-04-12T02:39:50Z

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

azure-pipelines · 2025-04-12T02:39:51Z

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines · 2025-04-12T02:40:00Z

Azure Pipelines successfully started running 2 pipeline(s).

azure-pipelines · 2025-04-12T02:40:01Z

Azure Pipelines successfully started running 3 pipeline(s).

azure-pipelines · 2025-04-12T02:40:04Z

Azure Pipelines successfully started running 3 pipeline(s).

peishenyan · 2025-04-12T13:27:15Z

Oh my fault...I forgot to format op_builder_factory.cc file...Maybe I force-push a new commit? I apologize for the inconvenience.

fdwr · 2025-04-14T04:41:01Z

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI

fdwr · 2025-04-14T04:41:03Z

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

azure-pipelines · 2025-04-14T04:41:10Z

Azure Pipelines successfully started running 3 pipeline(s).

azure-pipelines · 2025-04-14T04:41:13Z

Azure Pipelines successfully started running 2 pipeline(s).

azure-pipelines · 2025-04-14T04:41:18Z

Azure Pipelines successfully started running 3 pipeline(s).

fdwr

👍

fdwr · 2025-04-14T05:11:16Z

Will await @Honry's re-review.

Honry · 2025-04-14T05:23:31Z

@peishenyan you forgot to add the op info to the webnn-operators.md , others LGTM.

fdwr · 2025-04-14T07:06:53Z

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline,Windows GPU WebGPU CI Pipeline,Windows OpenVINO CI Pipeline

fdwr · 2025-04-14T07:06:56Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

fdwr · 2025-04-14T07:06:59Z

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI

fdwr · 2025-04-14T07:07:00Z

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

azure-pipelines · 2025-04-14T07:07:02Z

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines · 2025-04-14T07:07:11Z

Azure Pipelines successfully started running 2 pipeline(s).

azure-pipelines · 2025-04-14T07:07:12Z

Azure Pipelines successfully started running 3 pipeline(s).

azure-pipelines · 2025-04-14T07:07:15Z

Azure Pipelines successfully started running 3 pipeline(s).

fdwr

👍

Honry

👍

peishenyan · 2025-04-14T11:04:32Z

That's so weird. ONNX Runtime CUDA Builds / Windows GPU CUDA CI Pipeline (pull_request) Failed...

This test has passed every time before this commit, but I only changed doc file in this commit.

peishenyan · 2025-04-21T03:14:29Z

Hi @fdwr, is it possible to re-trigger test and achieve a passed result?

Honry · 2025-04-22T00:43:17Z

@peishenyan, you may need to rebase the code to latest main.

fdwr · 2025-04-22T03:47:41Z

/azp run ONNX Runtime CUDA Builds / Windows GPU CUDA CI Pipeline (pull_request)

azure-pipelines · 2025-04-22T03:47:47Z

No pipelines are associated with this pull request.

fdwr · 2025-04-22T23:38:46Z

I'll retry the 2 required ones again (Linux CI / Build Linux x64 Release / build_test_pipeline (pull_request) ...). If they don't pass today, you'll need to try remerging with main.

peishenyan · 2025-04-23T12:17:30Z

Amazing... they finally passed😂

fdwr · 2025-04-23T21:24:06Z

Amazing... they finally passed

Merging, as the 5 remaining failing tests are unrelated pervasive and persistent infrastructure issues.

### Description  Adds support for MultiHeadAttention via WebNN matmul, transpose, reshape, and other operations that follow the logic in the MHA subgraph below ``` Abbreviatios: B is batch_size, S is sequence_length, W is hidden_size, P is past_sequence_length N is number of attention heads, H is head size, and W=N*H, h=Sqrt(H) Notes: If the datatype of the inputs (qkv and past kv) is float16, we cast them to float32 to ensure data precision. query key value | | | q_Reshape k_Reshape v_Reshape (shape=B,S,H,N) | | | q_Transpose k_Transpose v_Transpose (perm=0,2,1,3) \ / | \ / | present_key<---\----Concat <---------|----past_key | | | | opt_k_transpose | \ (0,1,3,2) | \ / | past_value qk_MatMul | / | scale | / | / | / qk_Div Concat------> present_value | | | / Add <----------/---------------attention_bias | / Softmax / \ / \ / qkv_MatMul | Transpose (perm=0,2,1,3) | Reshape---(shape=B,P,W) | output ``` ### Motivation and Context  Signed-off-by: bfilipek <bartlomiej.filipek@intel.com>

peishenyan marked this pull request as draft March 18, 2025 08:20

peishenyan added 5 commits April 10, 2025 17:07

An implementation for MHA

b4e43fb

fix concat past kv bugs

7d2de76

fix mha output cast bugs

cc1b964

solve the conflict after code rebase

09de359

update comments

8a50801

peishenyan force-pushed the mha_attention branch from 5e67f9b to 09de359 Compare April 10, 2025 14:10

peishenyan marked this pull request as ready for review April 10, 2025 14:10

Honry reviewed Apr 11, 2025

View reviewed changes

fdwr reviewed Apr 11, 2025

View reviewed changes

onnxruntime/core/providers/webnn/builders/impl/mha_op_builder.cc Outdated Show resolved Hide resolved

address comments

cfe51a7

format document

75f2b0e

fdwr previously approved these changes Apr 14, 2025

View reviewed changes

add MHA to webnn-operators.md

01659e5

peishenyan dismissed fdwr’s stale review via 01659e5 April 14, 2025 06:13

fdwr approved these changes Apr 14, 2025

View reviewed changes

Honry approved these changes Apr 14, 2025

View reviewed changes

fdwr merged commit df581e1 into microsoft:main Apr 23, 2025
71 of 76 checks passed

Conversation

peishenyan commented Mar 18, 2025

Description

Motivation and Context

Uh oh!

peishenyan commented Apr 10, 2025

Uh oh!

Honry left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fdwr commented Apr 11, 2025

Uh oh!

fdwr commented Apr 11, 2025

Uh oh!

fdwr commented Apr 11, 2025

Uh oh!

fdwr commented Apr 11, 2025

Uh oh!

azure-pipelines bot commented Apr 11, 2025

Uh oh!

azure-pipelines bot commented Apr 11, 2025

Uh oh!

azure-pipelines bot commented Apr 11, 2025

Uh oh!

azure-pipelines bot commented Apr 11, 2025

Uh oh!

fdwr commented Apr 12, 2025

Uh oh!

fdwr commented Apr 12, 2025

Uh oh!

fdwr commented Apr 12, 2025

Uh oh!

fdwr commented Apr 12, 2025

Uh oh!

azure-pipelines bot commented Apr 12, 2025

Uh oh!

azure-pipelines bot commented Apr 12, 2025

Uh oh!

azure-pipelines bot commented Apr 12, 2025

Uh oh!

azure-pipelines bot commented Apr 12, 2025

Uh oh!

peishenyan commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fdwr commented Apr 14, 2025

Uh oh!

fdwr commented Apr 14, 2025

Uh oh!

azure-pipelines bot commented Apr 14, 2025

Uh oh!

azure-pipelines bot commented Apr 14, 2025

Uh oh!

azure-pipelines bot commented Apr 14, 2025

Uh oh!

fdwr left a comment

Choose a reason for hiding this comment

Uh oh!

fdwr commented Apr 14, 2025

Uh oh!

Honry commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fdwr commented Apr 14, 2025

Uh oh!

fdwr commented Apr 14, 2025

Uh oh!

fdwr commented Apr 14, 2025

Uh oh!

fdwr commented Apr 14, 2025

Uh oh!

azure-pipelines bot commented Apr 14, 2025

peishenyan commented Apr 12, 2025 •

edited

Loading

Honry commented Apr 14, 2025 •

edited

Loading