Skip to content

relational attention bias cuda op#14149

Merged
gh-yewang merged 10 commits intomainfrom
wangye/rpb
Jan 7, 2023
Merged

relational attention bias cuda op#14149
gh-yewang merged 10 commits intomainfrom
wangye/rpb

Conversation

@gh-yewang
Copy link
Copy Markdown
Contributor

@gh-yewang gh-yewang commented Jan 5, 2023

Description

This cuda op implements the compute_bias() method in T5 Attention including the permutation.

note:

  1. bias_table needs to be saved in col-major. be careful when implementing fusion script
  2. second input(sequence length) is placed on cpu. (using Shape node's output should be good)
  3. the first dimension of output is 1, so extra_add_qk in attention should support broadcasting
  4. compute_bias() only used in self-attn in t5

TODO: docs change will be applied later

Motivation and Context

It's part of the process of optimizing t5 attention as well as t5 based generation model

@gh-yewang gh-yewang requested a review from tianleiwu January 5, 2023 22:56
Comment thread onnxruntime/contrib_ops/cuda/bert/relative_attn_bias.cc Outdated
Comment thread onnxruntime/core/graph/contrib_ops/bert_defs.cc Outdated
Comment thread onnxruntime/core/graph/contrib_ops/bert_defs.cc Outdated
Comment thread onnxruntime/core/graph/contrib_ops/bert_defs.cc Outdated
Comment thread onnxruntime/core/graph/contrib_ops/bert_defs.cc Outdated
Comment thread onnxruntime/contrib_ops/cuda/cuda_contrib_kernels.cc Outdated
Comment thread onnxruntime/test/contrib_ops/relative_attention_bias_test.cc Outdated
Comment thread onnxruntime/core/graph/contrib_ops/bert_defs.cc Outdated
Comment thread onnxruntime/contrib_ops/cuda/bert/relative_attn_bias_impl.cu
@gh-yewang gh-yewang requested a review from tianleiwu January 6, 2023 21:41
@gh-yewang gh-yewang merged commit 5eac2c1 into main Jan 7, 2023
@gh-yewang gh-yewang deleted the wangye/rpb branch January 7, 2023 01:32
gh-yewang added a commit that referenced this pull request Jan 11, 2023
### Description
Average latency (ms) of float16 relative position bias cuda kernel on
V100:

Kernel\Seq_Len  | 16 | 32 | 64 | 128 | 256 | 384 | 512 | 768 | 1024 |
2048 | 4096
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
Before| 0.0494 | 0.0654 | 0.1519 | 0.4322 | 1.1865 | 2.4091 | 4.3676 |
14.912 | 36.517 | 142.09 | 561.80
After | 0.0483 | 0.0651 | 0.1294 | 0.3858 | 1.1128 | 2.2988 | 3.8391 |
14.290 | 34.542 | 136.13 | 529.54

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Review of this comment
#14149

Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants