Skip to content

[ONNX] Use onnx Attention operator for scaled_dot_product_attention #149662

@csoiram

Description

@csoiram

🚀 The feature, motivation and pitch

ONNX have introduced MHA operator in opset 23 (https://onnx.ai/onnx/operators/onnx__Attention.html#l-onnx-op-attention-23). This could be used when exporting scaled_dot_product_attention to ONNX format. Currently the scaled_dot_product_attention gets broken down to constituent ops when exporting to ONNX which complicates the model and makes it harder to identify the attention block when compiling the network for inference in custom HW backends.

Alternatives

No response

Additional context

No response

Metadata

Metadata

Labels

module: onnxRelated to torch.onnxtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions