Skip to content

Improve MHA Pattern#617

Merged
xadupre merged 1 commit intomainfrom
att8
Feb 23, 2026
Merged

Improve MHA Pattern#617
xadupre merged 1 commit intomainfrom
att8

Conversation

@xadupre
Copy link
Copy Markdown
Collaborator

@xadupre xadupre commented Feb 23, 2026

No description provided.

@xadupre xadupre marked this pull request as ready for review February 23, 2026 21:56
Copilot AI review requested due to automatic review settings February 23, 2026 21:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request improves the Multi-Head Attention (MHA) pattern optimization by adding support for the NoT (No Transpose) variant, which occurs when using FusedMatMul operators instead of regular MatMul with Transpose. The PR corrects the capitalization from "noT" to "NoT" to align with established naming conventions (SW for Switch Where, GQA for Group Query Attention), and adds a comprehensive test to verify the optimization works correctly with FusedMatMul.

Changes:

  • Fixed capitalization of the no-transpose suffix from "noT" to "NoT" in FunctionAttentionPattern.apply()
  • Added "NoT_to" prefix to MultiHeadAttention3DPattern's _prefixes_operator_name to recognize LocalAttentionNoT operators
  • Added test case test_multi_head_attention_fused_matmul to verify FusedMatMul-based attention patterns are correctly optimized to MultiHeadAttention

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
experimental_experiment/xoptim/patterns_ort/llm_optim.py Added NoT_to prefix to MultiHeadAttention3DPattern to recognize LocalAttentionNoT variants created when FusedMatMul is used
experimental_experiment/xoptim/patterns/onnx_attention.py Fixed capitalization from "noT" to "NoT" in FunctionAttentionPattern.apply() for consistency, added clarifying comment for FusedMatMul branch
_unittests/ut_xoptim/test_graph_pattern_optimization_ort.py Added comprehensive test for FusedMatMul-based multi-head attention pattern optimization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1105 to +1110
_prefixes_operator_name = (
f"{FunctionAttentionPattern._operator_name}_to",
f"{FunctionAttentionPattern._operator_name}sQ_to",
f"{FunctionAttentionPattern._operator_name}SW_to",
f"{FunctionAttentionPattern._operator_name}SWsQ_to",
f"{FunctionAttentionPattern._operator_name}NoT_to",
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _prefixes_operator_name tuple is missing combination prefixes. Since both SW and NoT suffixes can occur together (when switch_where=True and transpose=None), the pattern should also include:
f"{FunctionAttentionPattern._operator_name}SWNoT_to",

Additionally, lines 1107 and 1109 appear incorrect. FunctionAttentionPattern never creates "LocalAttentionsQ" or "LocalAttentionSWsQ" nodes. The "sQ" suffix is part of GQA variants (GQAsQ), not standalone patterns. These lines should likely be removed or replaced with GQA-related prefixes.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok if you can add unittest for each of those

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 23, 2026

@xadupre I've opened a new pull request, #620, to work on those changes. Once the pull request is ready, I'll request review from you.

@xadupre xadupre merged commit 0799b0b into main Feb 23, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants