Add RemovePadding and RestorePadding for BERT model by tianleiwu · Pull Request #13701 · microsoft/onnxruntime

tianleiwu · 2022-11-18T21:34:44Z

Description

Add two operators RemovePadding and RestorePadding based on ideal of effective transformer (https://github.com/bytedance/effective_transformer) to improve large batch size inference for BERT model.

Motivation and Context

yufenglee · 2022-11-22T05:15:03Z

+                "output tensor with shape (total_tokens, hidden_size)",
+                "T")
+        .Output(1,
+                "token_offset",


token_offset

why do you need token_offset? It is redundant to cumulated_seq_len

yufenglee · 2022-11-22T05:34:58Z

+
+  const auto& dims = input->Shape().GetDims();
+  if (dims.size() != 3) {
+    return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "Input 'input' is expected to have 3 dimensions, got ",


'input'

nit: 0

yufenglee · 2022-11-22T05:59:06Z

+//   total_token_count: 1 + 2 + 4 = 7
+//   max_token_count: 4
+// cumulated_token_count: 0, 1, 1+2, 1+2+4
+__global__ void getTokenOffset(int* token_count_buffer,


global_ void getTokenOffset(int* token_count_buffer,

It can be implemented with cub::BlockScan. The kernel can be launched with Grid: 1, Block: batch.
For kernel:

it uses cub::BlockScan to compute cumulated_token_count firstly.

then each thread fills its token_offset

For offset idx > token_size, we don't need to fill it actually because the restoring won't use it

global_ void getTokenOffset(int* token_count_buffer,

It can be implemented with cub::BlockScan. The kernel can be launched with Grid: 1, Block: batch. For kernel:

it uses cub::BlockScan to compute cumulated_token_count firstly.

then each thread fills its token_offset

Good suggestion. There is a TODO in comment that is related to this:
// TODO(tianleiwu): Use cub::DevicePartition::Flagged like BuildGlobalIndex in longformer_global_impl.cu
// to build token_offset when sequence length is large.
I could do it in another pull request later.

For offset idx > token_size, we don't need to fill it actually because the restoring won't use it

The purpose is to fill zeros for those padded tokens (to make result determined). Otherwise, we will need the fill the whole output with zeros first, then use another kernel to restore non-padding tokens.

Another purpose is to make the shape as (batch_size, sequence_length). Otherwise, we will need pass these two values to restore padding operator.

Add two operators RemovePadding and RestorePadding based on ideal of effective transformer (https://github.com/bytedance/effective_transformer) to improve large batch size inference for BERT model.

Add RemovePadding and RestorePadding

921fa5b

tianleiwu requested review from gh-yewang and yufenglee November 18, 2022 21:34

update doc

fb71f64

yufenglee reviewed Nov 22, 2022

View reviewed changes

yufenglee approved these changes Nov 22, 2022

View reviewed changes

tianleiwu merged commit 8b0e0f4 into main Nov 22, 2022

tianleiwu deleted the tlwu/bert_pad branch November 22, 2022 18:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RemovePadding and RestorePadding for BERT model#13701

Add RemovePadding and RestorePadding for BERT model#13701
tianleiwu merged 2 commits into
mainfrom
tlwu/bert_pad

tianleiwu commented Nov 18, 2022

Uh oh!

yufenglee Nov 22, 2022

Uh oh!

yufenglee Nov 22, 2022

Uh oh!

yufenglee Nov 22, 2022

Uh oh!

yufenglee Nov 22, 2022

Uh oh!

tianleiwu Nov 22, 2022

Uh oh!

tianleiwu Nov 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tianleiwu commented Nov 18, 2022

Description

Motivation and Context

Uh oh!

yufenglee Nov 22, 2022

Choose a reason for hiding this comment

Uh oh!

yufenglee Nov 22, 2022

Choose a reason for hiding this comment

Uh oh!

yufenglee Nov 22, 2022

Choose a reason for hiding this comment

Uh oh!

yufenglee Nov 22, 2022

Choose a reason for hiding this comment

Uh oh!

tianleiwu Nov 22, 2022

Choose a reason for hiding this comment

Uh oh!

tianleiwu Nov 22, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants