relational attention bias cuda op#14149

Merged

gh-yewang merged 10 commits intomainfrom

Jan 7, 2023

Contributor

gh-yewang commented Jan 5, 2023 •

edited

Loading

Description

This cuda op implements the compute_bias() method in T5 Attention including the permutation.

note:

bias_table needs to be saved in col-major. be careful when implementing fusion script
second input(sequence length) is placed on cpu. (using Shape node's output should be good)
the first dimension of output is 1, so extra_add_qk in attention should support broadcasting
compute_bias() only used in self-attn in t5

TODO: docs change will be applied later

Motivation and Context

It's part of the process of optimizing t5 attention as well as t5 based generation model

Ubuntu added 3 commits

January 5, 2023 01:48


          checkin rpb cuda op

076b94b

ut

aa53cf1


          fix parity to match hf

b8f1aa2

gh-yewang requested a review from tianleiwu

January 5, 2023 22:56

Ubuntu added 2 commits

January 5, 2023 23:17


          fix typo

3001c2b


          update docs

1d23994

tianleiwu reviewed

View reviewed changes

onnxruntime/contrib_ops/cuda/bert/relative_attn_bias.cc Outdated

tianleiwu reviewed

View reviewed changes

onnxruntime/core/graph/contrib_ops/bert_defs.cc Outdated

tianleiwu reviewed

View reviewed changes

onnxruntime/core/graph/contrib_ops/bert_defs.cc Outdated

tianleiwu reviewed

View reviewed changes

onnxruntime/core/graph/contrib_ops/bert_defs.cc Outdated

tianleiwu reviewed

View reviewed changes

onnxruntime/core/graph/contrib_ops/bert_defs.cc Outdated

tianleiwu reviewed

View reviewed changes

onnxruntime/contrib_ops/cuda/cuda_contrib_kernels.cc Outdated

tianleiwu reviewed

View reviewed changes

onnxruntime/test/contrib_ops/relative_attention_bias_test.cc Outdated

tianleiwu reviewed

View reviewed changes

onnxruntime/core/graph/contrib_ops/bert_defs.cc Outdated

tianleiwu reviewed

View reviewed changes

onnxruntime/contrib_ops/cuda/bert/relative_attn_bias_impl.cu

Ubuntu and others added 3 commits

January 6, 2023 07:34


          update 1

871cd6c


          update docs

e1ed5f8


          Merge branch 'main' into wangye/rpb

b710da2

gh-yewang requested a review from tianleiwu

January 6, 2023 21:41

Ubuntu and others added 2 commits

January 6, 2023 21:53


          update docs

8deb426


          Merge branch 'main' into wangye/rpb

54721bd

tianleiwu approved these changes

View reviewed changes

gh-yewang merged commit 5eac2c1 into main

gh-yewang deleted the wangye/rpb branch

January 7, 2023 01:32

gh-yewang mentioned this pull request

Improve rpb cuda kernel #14195

Merged

gh-yewang added a commit that referenced this pull request


          Improve rpb cuda kernel (#14195)

342a5bf

### Description
Average latency (ms) of float16 relative position bias cuda kernel on
V100:

Kernel\Seq_Len  | 16 | 32 | 64 | 128 | 256 | 384 | 512 | 768 | 1024 |
2048 | 4096
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
Before| 0.0494 | 0.0654 | 0.1519 | 0.4322 | 1.1865 | 2.4091 | 4.3676 |
14.912 | 36.517 | 142.09 | 561.80
After | 0.0483 | 0.0651 | 0.1294 | 0.3858 | 1.1128 | 2.2988 | 3.8391 |
14.290 | 34.542 | 136.13 | 529.54

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Review of this comment
#14149

Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet