[ONNX] Use onnx Attention operator for scaled_dot_product_attention

### 🚀 The feature, motivation and pitch

ONNX have introduced MHA operator in opset 23 (https://onnx.ai/onnx/operators/onnx__Attention.html#l-onnx-op-attention-23). This could be used when exporting scaled_dot_product_attention to ONNX format. Currently the scaled_dot_product_attention gets broken down to constituent ops when exporting to ONNX which complicates the model and makes it harder to identify the attention block when compiling the network for inference in custom HW backends.

### Alternatives

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ONNX] Use onnx Attention operator for scaled_dot_product_attention #149662

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ONNX] Use onnx Attention operator for scaled_dot_product_attention #149662

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions