Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ORT] Filter out invalid inputs in ORTModelForXXX forward pass #225

Closed
wants to merge 44 commits into from

Conversation

JingyaHuang
Copy link
Collaborator

@JingyaHuang JingyaHuang commented Jun 20, 2022

Context

TL;DR

Transformers #17617

Long story to tell...
For DeBERTa model, the tokenizer gives out token_type_ids by default. However, the exported IR might not contain token_type_ids(eg. the case when config.type_vocab_size=0 if exported by transformers.onnx.export). In this situation:

  1. The forward pass will fail if the user takes directly the output as input(as our snippet does).
  2. Otherwise, they need to add another line to filter out invalid input themselves which needs a deeper understanding of the model and its tokenizer.

Considering the user experience, I think that we shall add this filter directly in the ORTModelForXXX.

What does this PR do?

  • Filter out invalid inputs in ORTModelForXXX.

Fixes #207

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@philschmid
Copy link
Member

I am not sure that we should add an additional check on each forward pass if the input keys are the same then the one the model expects.
Have you run any before after model latency checks?
This should have some serious impact on latency and sounds more like we are solving the issue at the wrong place.

@philschmid
Copy link
Member

However, the exported IR might not contain token_type_ids(eg. the case when config.type_vocab_size=0 if exported by transformers.onnx.export). In this situation:

When the tokenizers return token_type_ids by default why aren't we exporting them in transformers.onnx.export by default too? We are doing similar things for GPT2 which also always return attention_mask in the IE for batching.

@JingyaHuang
Copy link
Collaborator Author

However, the exported IR might not contain token_type_ids(eg. the case when config.type_vocab_size=0 if exported by transformers.onnx.export). In this situation:

When the tokenizers return token_type_ids by default why aren't we exporting them in transformers.onnx.export by default too? We are doing similar things for GPT2 which also always return attention_mask in the IE for batching.

In the implementation of DeBERTa there exists a control flow to decide whether the embedding will be instantiated, in the case of config.type_vocab_size=0, the token_type_ids will not be traced even when it exists in the onnx_config.inputs, which is different to the case of GPT-2.

@JingyaHuang
Copy link
Collaborator Author

@philschmid I agree with the fact that we shall not add it in the ORTModel if it introduces any impact on the latency, all in all, the case of DeBERTa is a very edge case. But maybe we can add a tool to help users clean their inputs somewhere else.

@philschmid
Copy link
Member

In the implementation of DeBERTa there exists a control flow to decide whether the embedding will be instantiated, in the case of config.type_vocab_size=0, the token_type_ids will not be traced even when it exists in the onnx_config.inputs, which is different to the case of GPT-2.

How is transformers and regular pipeline currently handling this situation? Is there a way to commit the token_type_ids in the tokenizer for example?

@JingyaHuang
Copy link
Collaborator Author

In the implementation of DeBERTa there exists a control flow to decide whether the embedding will be instantiated, in the case of config.type_vocab_size=0, the token_type_ids will not be traced even when it exists in the onnx_config.inputs, which is different to the case of GPT-2.

How is transformers and regular pipeline currently handling this situation? Is there a way to commit the token_type_ids in the tokenizer for example?

I don't think that the pipeline in transformers handles this case, as token_type_ids is truly optional for PyTorch model. But for our case, as InferenceSession is not tolerant of invalid inputs, the pipeline might fail. And for the tokenizer, as it is independent of config and the IR whether contains token_type_ids depends on config.type_vocab_size, I can't see an elegant way to work around it. Any idea?

@philschmid
Copy link
Member

I don't think that the pipeline in transformers handles this case, as token_type_ids is truly optional for PyTorch model. But for our case, as InferenceSession is not tolerant of invalid inputs, the pipeline might fail. And for the tokenizer, as it is independent of config and the IR whether contains token_type_ids depends on config.type_vocab_size, I can't see an elegant way to work around it. Any idea?

Maybe adjusting/configuring the tokenizer to output token_type_ids or not.

@JingyaHuang
Copy link
Collaborator Author

I don't think that the pipeline in transformers handles this case, as token_type_ids is truly optional for PyTorch model. But for our case, as InferenceSession is not tolerant of invalid inputs, the pipeline might fail. And for the tokenizer, as it is independent of config and the IR whether contains token_type_ids depends on config.type_vocab_size, I can't see an elegant way to work around it. Any idea?

Maybe adjusting/configuring the tokenizer to output token_type_ids or not.

It makes sense to me! Though the case is a little tricky for users, you are right that it is hacky to introduce it into ORTModel which has a bad impact on the speed.

And for the configuration of the tokenizer for the inference,

  • Case 1: users use ORTModelForXXX for inference
    This case is ok, users can configure return_token_type_ids to avoid the problem.
 tokenizer = {processor_class}.from_pretrained("{checkpoint}")
model = {model_class}.from_pretrained("{checkpoint}")
-inputs = tokenizer("My name is Philipp and I live in Germany.", return_tensors="pt")
+inputs = tokenizer("My name is Philipp and I live in Germany.", return_tensors="pt", return_token_type_ids=False)
outputs = model(**inputs)
  • Case 2: users use pipeline

In this case, as users can't configure token_type_ids when instantiating the tokenizer, and the preprocess function [doesn't take return_token_type_ids into consideration] (https://github.com/huggingface/transformers/blob/main/src/transformers/pipelines/token_classification.py#L195-L200), the pipeline will fail when the model doesn't contain token_type_ids as input.

Possible solution:
Add return_token_type_ids argument to preprocess function of each pipeline in transformers and set to False when self.model.config.type_vocab_size < 1. What do you think about that?

@philschmid
Copy link
Member

I think you can open an issue with explaining the situation for DeBERTa and the solution you propose in transformers to start the conversation about making the changes/ what changes would be needed.

@JingyaHuang
Copy link
Collaborator Author

Issue opened in transformers to continue the discussion. Thanks @philschmid!

@JingyaHuang
Copy link
Collaborator Author

Close as this should be a work in transformers.

@JingyaHuang JingyaHuang closed this Jul 8, 2022
@JingyaHuang JingyaHuang deleted the ort-invalid-inputs-filter branch July 12, 2022 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Support for DeBERTaV2
3 participants