phi2 contrib ops changes #19112

wangyems · 2024-01-12T19:01:47Z

Description

support causal mask in MHA cpu
support custom rotary_dim in rotary_emb
add bf16 for rotary_emb
fix a bug in attention rotary

Motivation and Context

…hi2_ops

onnxruntime/test/contrib_ops/rotary_embedding_op_test.cc

onnxruntime/contrib_ops/cuda/bert/rotary_embedding_impl.cu

…hi2_ops

yufenglee

This reverts commit 21034a2.

### Description  1. support causal mask in MHA cpu 2. support custom rotary_dim in rotary_emb 3. add bf16 for rotary_emb 4. fix a bug in attention rotary ### Motivation and Context

### Description This PR updates the Whisper export with beam search by adding the following. - Fixes a bug when running `DecoderMaskedMultiHeadAttention` in the Whisper with beam search model - Sets the default PyTorch attention implementation to `eager` to allow existing attention fusions to continue working - Re-uses the cache directory when loading the PyTorch model to reduce memory used on disk - Adds `--disable_auto_mixed_precision` to the example FP16 export command ### Motivation and Context - [This PR](#19112) added the `is_unidirectional` parameter to `CheckInputs`, but it was not provided when checking the inputs in `DecoderMaskedMultiHeadAttention`. - [This PR](#19200) explains the reasoning behind why `eager` is used to load the `WhisperAttention` class. - By re-using the cache directory for loading the PyTorch model, only one copy of the PyTorch model is saved on disk instead of two copies. - By providing this flag, there will be less Cast nodes in the Whisper with beam search model to switch between FP16 and FP32 precision.

### Description This PR updates the Whisper export with beam search by adding the following. - Fixes a bug when running `DecoderMaskedMultiHeadAttention` in the Whisper with beam search model - Sets the default PyTorch attention implementation to `eager` to allow existing attention fusions to continue working - Re-uses the cache directory when loading the PyTorch model to reduce memory used on disk - Adds `--disable_auto_mixed_precision` to the example FP16 export command ### Motivation and Context - [This PR](microsoft/onnxruntime#19112) added the `is_unidirectional` parameter to `CheckInputs`, but it was not provided when checking the inputs in `DecoderMaskedMultiHeadAttention`. - [This PR](microsoft/onnxruntime#19200) explains the reasoning behind why `eager` is used to load the `WhisperAttention` class. - By re-using the cache directory for loading the PyTorch model, only one copy of the PyTorch model is saved on disk instead of two copies. - By providing this flag, there will be less Cast nodes in the Whisper with beam search model to switch between FP16 and FP32 precision.

snnn · 2025-09-05T21:23:01Z

This PR has been cherry-picked into the rel-1.17.0 branch in PR #19243. Removing the release:1.17.0 label.

### Description This PR updates the Whisper export with beam search by adding the following. - Fixes a bug when running `DecoderMaskedMultiHeadAttention` in the Whisper with beam search model - Sets the default PyTorch attention implementation to `eager` to allow existing attention fusions to continue working - Re-uses the cache directory when loading the PyTorch model to reduce memory used on disk - Adds `--disable_auto_mixed_precision` to the example FP16 export command ### Motivation and Context - [This PR](microsoft/onnxruntime#19112) added the `is_unidirectional` parameter to `CheckInputs`, but it was not provided when checking the inputs in `DecoderMaskedMultiHeadAttention`. - [This PR](microsoft/onnxruntime#19200) explains the reasoning behind why `eager` is used to load the `WhisperAttention` class. - By re-using the cache directory for loading the PyTorch model, only one copy of the PyTorch model is saved on disk instead of two copies. - By providing this flag, there will be less Cast nodes in the Whisper with beam search model to switch between FP16 and FP32 precision.

wangyems added 12 commits January 10, 2024 01:12

init

4170284

add temp spec

50ee7c7

remove comments

419efbf

remove comments

72ce435

mha cpu change

cc3e9da

rotary changes

c131725

Merge branch 'main' of github.com:microsoft/onnxruntime into wangye/p…

1f75ae3

…hi2_ops

update

0941ecb

fix a bug in checkinputs

59c6842

update

8f9117c

lint

6c5a056

docs

1a00608

wangyems commented Jan 13, 2024

View reviewed changes

onnxruntime/test/contrib_ops/rotary_embedding_op_test.cc Show resolved Hide resolved

wangyems requested review from kunal-vaishnavi and yufenglee January 13, 2024 00:13

skip dml, fix a bug in attention.cc

79fdf46

wangyems added the release:1.17.0 label Jan 19, 2024

lint

d308bfb

yufenglee reviewed Jan 19, 2024

View reviewed changes

onnxruntime/contrib_ops/cuda/bert/rotary_embedding_impl.cu Outdated Show resolved Hide resolved

wangyems and others added 2 commits January 20, 2024 00:54

Merge branch 'main' of github.com:microsoft/onnxruntime into wangye/p…

e3eea51

…hi2_ops

Update rotary_embedding_impl.cu

ed1d3d5

yufenglee approved these changes Jan 22, 2024

View reviewed changes

wangyems merged commit 21034a2 into main Jan 22, 2024

wangyems deleted the wangye/phi2_ops branch January 22, 2024 18:17

mszhanyi added a commit that referenced this pull request Jan 23, 2024

Revert "phi2 contrib ops changes (#19112)"

4b0ad85

This reverts commit 21034a2.

kunal-vaishnavi mentioned this pull request Jan 30, 2024

Update Whisper export with beam search #19322

Merged

jambayk mentioned this pull request Feb 13, 2024

This is an invalid model microsoft/Olive#935

Closed

snnn removed the release:1.17.0 label Sep 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

phi2 contrib ops changes #19112

phi2 contrib ops changes #19112

Uh oh!

wangyems commented Jan 12, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

yufenglee left a comment

Uh oh!

snnn commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

phi2 contrib ops changes #19112

phi2 contrib ops changes #19112

Uh oh!

Conversation

wangyems commented Jan 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

Uh oh!

Uh oh!

yufenglee left a comment

Choose a reason for hiding this comment

Uh oh!

snnn commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wangyems commented Jan 12, 2024 •

edited

Loading