[ORT] Filter out invalid inputs in ORTModelForXXX forward pass #225

JingyaHuang · 2022-06-20T15:46:09Z

Context

TL;DR

Long story to tell...
For DeBERTa model, the tokenizer gives out token_type_ids by default. However, the exported IR might not contain token_type_ids(eg. the case when config.type_vocab_size=0 if exported by transformers.onnx.export). In this situation:

The forward pass will fail if the user takes directly the output as input(as our snippet does).
Otherwise, they need to add another line to filter out invalid input themselves which needs a deeper understanding of the model and its tokenizer.

Considering the user experience, I think that we shall add this filter directly in the ORTModelForXXX.

What does this PR do?

Filter out invalid inputs in ORTModelForXXX.

Fixes #207

HuggingFaceDocBuilderDev · 2022-06-20T15:49:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

philschmid · 2022-06-21T13:37:47Z

I am not sure that we should add an additional check on each forward pass if the input keys are the same then the one the model expects.
Have you run any before after model latency checks?
This should have some serious impact on latency and sounds more like we are solving the issue at the wrong place.

philschmid · 2022-06-21T13:46:28Z

However, the exported IR might not contain token_type_ids(eg. the case when config.type_vocab_size=0 if exported by transformers.onnx.export). In this situation:

When the tokenizers return token_type_ids by default why aren't we exporting them in transformers.onnx.export by default too? We are doing similar things for GPT2 which also always return attention_mask in the IE for batching.

JingyaHuang · 2022-06-21T15:13:43Z

However, the exported IR might not contain token_type_ids(eg. the case when config.type_vocab_size=0 if exported by transformers.onnx.export). In this situation:

When the tokenizers return token_type_ids by default why aren't we exporting them in transformers.onnx.export by default too? We are doing similar things for GPT2 which also always return attention_mask in the IE for batching.

In the implementation of DeBERTa there exists a control flow to decide whether the embedding will be instantiated, in the case of config.type_vocab_size=0, the token_type_ids will not be traced even when it exists in the onnx_config.inputs, which is different to the case of GPT-2.

JingyaHuang · 2022-06-21T15:16:46Z

@philschmid I agree with the fact that we shall not add it in the ORTModel if it introduces any impact on the latency, all in all, the case of DeBERTa is a very edge case. But maybe we can add a tool to help users clean their inputs somewhere else.

philschmid · 2022-06-21T15:51:33Z

In the implementation of DeBERTa there exists a control flow to decide whether the embedding will be instantiated, in the case of config.type_vocab_size=0, the token_type_ids will not be traced even when it exists in the onnx_config.inputs, which is different to the case of GPT-2.

How is transformers and regular pipeline currently handling this situation? Is there a way to commit the token_type_ids in the tokenizer for example?

JingyaHuang · 2022-06-21T16:24:35Z

In the implementation of DeBERTa there exists a control flow to decide whether the embedding will be instantiated, in the case of config.type_vocab_size=0, the token_type_ids will not be traced even when it exists in the onnx_config.inputs, which is different to the case of GPT-2.

How is transformers and regular pipeline currently handling this situation? Is there a way to commit the token_type_ids in the tokenizer for example?

I don't think that the pipeline in transformers handles this case, as token_type_ids is truly optional for PyTorch model. But for our case, as InferenceSession is not tolerant of invalid inputs, the pipeline might fail. And for the tokenizer, as it is independent of config and the IR whether contains token_type_ids depends on config.type_vocab_size, I can't see an elegant way to work around it. Any idea?

philschmid · 2022-06-22T12:20:33Z

I don't think that the pipeline in transformers handles this case, as token_type_ids is truly optional for PyTorch model. But for our case, as InferenceSession is not tolerant of invalid inputs, the pipeline might fail. And for the tokenizer, as it is independent of config and the IR whether contains token_type_ids depends on config.type_vocab_size, I can't see an elegant way to work around it. Any idea?

Maybe adjusting/configuring the tokenizer to output token_type_ids or not.

JingyaHuang · 2022-06-23T14:12:06Z

I don't think that the pipeline in transformers handles this case, as token_type_ids is truly optional for PyTorch model. But for our case, as InferenceSession is not tolerant of invalid inputs, the pipeline might fail. And for the tokenizer, as it is independent of config and the IR whether contains token_type_ids depends on config.type_vocab_size, I can't see an elegant way to work around it. Any idea?

Maybe adjusting/configuring the tokenizer to output token_type_ids or not.

It makes sense to me! Though the case is a little tricky for users, you are right that it is hacky to introduce it into ORTModel which has a bad impact on the speed.

And for the configuration of the tokenizer for the inference,

Case 1: users use ORTModelForXXX for inference
This case is ok, users can configure return_token_type_ids to avoid the problem.

 tokenizer = {processor_class}.from_pretrained("{checkpoint}")
model = {model_class}.from_pretrained("{checkpoint}")
-inputs = tokenizer("My name is Philipp and I live in Germany.", return_tensors="pt")
+inputs = tokenizer("My name is Philipp and I live in Germany.", return_tensors="pt", return_token_type_ids=False)
outputs = model(**inputs)

Case 2: users use pipeline

In this case, as users can't configure token_type_ids when instantiating the tokenizer, and the preprocess function [doesn't take return_token_type_ids into consideration] (https://github.com/huggingface/transformers/blob/main/src/transformers/pipelines/token_classification.py#L195-L200), the pipeline will fail when the model doesn't contain token_type_ids as input.

Possible solution:
Add return_token_type_ids argument to preprocess function of each pipeline in transformers and set to False when self.model.config.type_vocab_size < 1. What do you think about that?

philschmid · 2022-06-23T15:15:31Z

I think you can open an issue with explaining the situation for DeBERTa and the solution you propose in transformers to start the conversation about making the changes/ what changes would be needed.

JingyaHuang · 2022-06-23T16:44:11Z

Issue opened in transformers to continue the discussion. Thanks @philschmid!

JingyaHuang · 2022-07-08T08:08:45Z

Close as this should be a work in transformers.

JingyaHuang added 30 commits February 4, 2022 01:29

Create class AutoOnnxConfig

6d40e4e

Apply black and isort

16fd7d0

Apply latest isort

5329615

Replace register list by using FeaturesManager

f2607e8

Fix style

ea25745

Add raising when model or task unsupported

d08ef70

Add ort-train workflow for test

19d07c3

From nightly build to push to build

e54ef00

Comment some workflow for test

444a758

Comment check code quality for faster test

2f6dfe7

test build docker image

6e373e7

add dockerfiles and bash script to test the workflow

f251c9c

Fix TTY

b72f73f

Fix TTY

ba14dfc

Fix TTY

8c9f93c

Fix TTY

73a85c0

Fix TTY

9f171da

Fix TTY

057e116

Fix TTY

fff38ca

Fix TTY

43f535b

Fix TTY

31eb8dd

Fix GPU device no found

54973aa

Fix GPU device no found

4861542

Fix GPU device no found

5bac17e

Fix GPU device no found

4670895

Fix GPU device no found

2ea16c8

Fix GPU device no found

bfc7547

Fix GPU device no found

d3367f6

Fix GPU device no found

4297ab7

Fix GPU device no found

bb2f2f5

JingyaHuang added 9 commits June 14, 2022 13:00

Merge branch 'main' into jingya-AutoOnnxConfig

6d857c4

Fix to be compatible with transformers

652211a

Refactoring the test

f6adc44

Add dependency for test

1989df0

Add dependency for test

01d29eb

Merge branch 'main' into jingya-AutoOnnxConfig

d9f2127

fix dependencies

eb3ee32

Fix unittest

12448bb

Add filter

e7ecb85

JingyaHuang added 2 commits June 20, 2022 15:54

Remove files from another branch

cf2626c

Warning of invalid inputs

70db417

JingyaHuang requested review from lewtun, philschmid and echarlaix June 21, 2022 13:29

Add missed change

f9c4a2b

JingyaHuang mentioned this pull request Jun 23, 2022

Fail when using pipeline for the inference of DeBERTa-Vx ORTModels huggingface/transformers#17853

Closed

JingyaHuang closed this Jul 8, 2022

JingyaHuang deleted the ort-invalid-inputs-filter branch July 12, 2022 16:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ORT] Filter out invalid inputs in ORTModelForXXX forward pass #225

[ORT] Filter out invalid inputs in ORTModelForXXX forward pass #225

JingyaHuang commented Jun 20, 2022 •

edited

HuggingFaceDocBuilderDev commented Jun 20, 2022

philschmid commented Jun 21, 2022

philschmid commented Jun 21, 2022

JingyaHuang commented Jun 21, 2022

JingyaHuang commented Jun 21, 2022

philschmid commented Jun 21, 2022

JingyaHuang commented Jun 21, 2022

philschmid commented Jun 22, 2022

JingyaHuang commented Jun 23, 2022

philschmid commented Jun 23, 2022

JingyaHuang commented Jun 23, 2022

JingyaHuang commented Jul 8, 2022

[ORT] Filter out invalid inputs in ORTModelForXXX forward pass #225

[ORT] Filter out invalid inputs in ORTModelForXXX forward pass #225

Conversation

JingyaHuang commented Jun 20, 2022 • edited

Context

What does this PR do?

HuggingFaceDocBuilderDev commented Jun 20, 2022

philschmid commented Jun 21, 2022

philschmid commented Jun 21, 2022

JingyaHuang commented Jun 21, 2022

JingyaHuang commented Jun 21, 2022

philschmid commented Jun 21, 2022

JingyaHuang commented Jun 21, 2022

philschmid commented Jun 22, 2022

JingyaHuang commented Jun 23, 2022

philschmid commented Jun 23, 2022

JingyaHuang commented Jun 23, 2022

JingyaHuang commented Jul 8, 2022

JingyaHuang commented Jun 20, 2022 •

edited