Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Layoutlm onnx support (Issue #13300) #13562

Merged
merged 30 commits into from
Sep 21, 2021

Conversation

nishprabhu
Copy link
Contributor

What does this PR do?

This PR extends ONNX support to LayoutLM as explained in https://huggingface.co/transformers/serialization.html?highlight=onnx#converting-an-onnx-model-using-the-transformers-onnx-package

Fixes Issue #13300

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@NielsRogge @mfuntowicz @LysandreJik

@michaelbenayoun
Copy link
Member

Looks almost ready, great work @nishprabhu!!
Thank you for your contribution.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Do you want to have a look @NielsRogge @mfuntowicz ?

@nishprabhu
Copy link
Contributor Author

Hi, any update about this? Any other changes required?
@LysandreJik @mfuntowicz @NielsRogge

@NielsRogge
Copy link
Contributor

LGTM!

@LysandreJik LysandreJik merged commit ddd4d02 into huggingface:master Sep 21, 2021
@nishprabhu nishprabhu deleted the layoutlm-onnx-support branch September 22, 2021 03:54
@LysandreJik
Copy link
Member

This is making the LayoutLM ONNX test fail for the following reason:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <transformers.models.layoutlm.configuration_layoutlm.LayoutLMOnnxConfig object at 0x7f344811ceb0>
tokenizer = PreTrainedTokenizerFast(name_or_path='microsoft/layoutlm-base-uncased', vocab_size=30522, model_max_len=512, is_fast=T...okens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'})
batch_size = -1, seq_length = -1, is_pair = False
framework = <TensorType.PYTORCH: 'pt'>

    def generate_dummy_inputs(
        self,
        tokenizer: PreTrainedTokenizer,
        batch_size: int = -1,
        seq_length: int = -1,
        is_pair: bool = False,
        framework: Optional[TensorType] = None,
    ) -> Mapping[str, Any]:
        """
        Generate inputs to provide to the ONNX exporter for the specific framework
    
        Args:
            tokenizer: The tokenizer associated with this model configuration
            batch_size: The batch size (int) to export the model for (-1 means dynamic axis)
            seq_length: The sequence length (int) to export the model for (-1 means dynamic axis)
            is_pair: Indicate if the input is a pair (sentence 1, sentence 2)
            framework: The framework (optional) the tokenizer will generate tensor for
    
        Returns:
            Mapping[str, Tensor] holding the kwargs to provide to the model's forward function
        """
    
        input_dict = super().generate_dummy_inputs(tokenizer, batch_size, seq_length, is_pair, framework)
    
        # Generate a dummy bbox
        box = [48, 84, 73, 128]
    
        if not framework == TensorType.PYTORCH:
            raise NotImplementedError("Exporting LayoutLM to ONNX is currently only supported for PyTorch.")
    
        if not is_torch_available():
            raise ValueError("Cannot generate dummy inputs without PyTorch installed.")
        import torch
    
>       input_dict["bbox"] = torch.tensor(
            [
                [0] * 4,
                *[box] * seq_length,
                [self.max_2d_positions] * 4,
            ]
        ).tile(batch_size, 1, 1)
E       RuntimeError: Trying to create tensor with negative dimension -1: [-1, 2, 4]

Could you take a look @nishprabhu @michaelbenayoun?

Will skip this test in the meantime.

@nishprabhu nishprabhu restored the layoutlm-onnx-support branch September 23, 2021 05:15
@nishprabhu nishprabhu deleted the layoutlm-onnx-support branch September 23, 2021 05:20
@nishprabhu
Copy link
Contributor Author

I think we have to import the compute_effective_axis_dimension function from the onnx module and compute the batch_size and seq_length dimensions in the configuration_layoutlm.py file. We are currently using -1 as the value for both dimensions which is causing the error. @michaelbenayoun @LysandreJik

@nishprabhu nishprabhu mentioned this pull request Sep 23, 2021
5 tasks
Narsil pushed a commit to Narsil/transformers that referenced this pull request Sep 25, 2021
* Add support for exporting PyTorch LayoutLM to ONNX

* Added tests for converting LayoutLM to ONNX

* Add support for exporting PyTorch LayoutLM to ONNX

* Added tests for converting LayoutLM to ONNX

* cleanup

* Removed regression/ folder

* Add support for exporting PyTorch LayoutLM to ONNX

* Added tests for converting LayoutLM to ONNX

* cleanup

* Fixed import error

* Remove unnecessary import statements

* Changed max_2d_positions from class variable to instance variable of the config class

* Add support for exporting PyTorch LayoutLM to ONNX

* Added tests for converting LayoutLM to ONNX

* cleanup

* Add support for exporting PyTorch LayoutLM to ONNX

* cleanup

* Fixed import error

* Changed max_2d_positions from class variable to instance variable of the config class

* Use super class generate_dummy_inputs method

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

* Add support for Masked LM, sequence classification and token classification

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

* Removed uncessary import and method

* Fixed code styling

* Raise error if PyTorch is not installed

* Remove unnecessary import statement

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
stas00 pushed a commit to stas00/transformers that referenced this pull request Oct 12, 2021
* Add support for exporting PyTorch LayoutLM to ONNX

* Added tests for converting LayoutLM to ONNX

* Add support for exporting PyTorch LayoutLM to ONNX

* Added tests for converting LayoutLM to ONNX

* cleanup

* Removed regression/ folder

* Add support for exporting PyTorch LayoutLM to ONNX

* Added tests for converting LayoutLM to ONNX

* cleanup

* Fixed import error

* Remove unnecessary import statements

* Changed max_2d_positions from class variable to instance variable of the config class

* Add support for exporting PyTorch LayoutLM to ONNX

* Added tests for converting LayoutLM to ONNX

* cleanup

* Add support for exporting PyTorch LayoutLM to ONNX

* cleanup

* Fixed import error

* Changed max_2d_positions from class variable to instance variable of the config class

* Use super class generate_dummy_inputs method

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

* Add support for Masked LM, sequence classification and token classification

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

* Removed uncessary import and method

* Fixed code styling

* Raise error if PyTorch is not installed

* Remove unnecessary import statement

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
Albertobegue pushed a commit to Albertobegue/transformers that referenced this pull request Jan 13, 2022
* Add support for exporting PyTorch LayoutLM to ONNX

* Added tests for converting LayoutLM to ONNX

* Add support for exporting PyTorch LayoutLM to ONNX

* Added tests for converting LayoutLM to ONNX

* cleanup

* Removed regression/ folder

* Add support for exporting PyTorch LayoutLM to ONNX

* Added tests for converting LayoutLM to ONNX

* cleanup

* Fixed import error

* Remove unnecessary import statements

* Changed max_2d_positions from class variable to instance variable of the config class

* Add support for exporting PyTorch LayoutLM to ONNX

* Added tests for converting LayoutLM to ONNX

* cleanup

* Add support for exporting PyTorch LayoutLM to ONNX

* cleanup

* Fixed import error

* Changed max_2d_positions from class variable to instance variable of the config class

* Use super class generate_dummy_inputs method

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

* Add support for Masked LM, sequence classification and token classification

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

* Removed uncessary import and method

* Fixed code styling

* Raise error if PyTorch is not installed

* Remove unnecessary import statement

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
Albertobegue pushed a commit to Albertobegue/transformers that referenced this pull request Jan 27, 2022
* Add support for exporting PyTorch LayoutLM to ONNX

* Added tests for converting LayoutLM to ONNX

* Add support for exporting PyTorch LayoutLM to ONNX

* Added tests for converting LayoutLM to ONNX

* cleanup

* Removed regression/ folder

* Add support for exporting PyTorch LayoutLM to ONNX

* Added tests for converting LayoutLM to ONNX

* cleanup

* Fixed import error

* Remove unnecessary import statements

* Changed max_2d_positions from class variable to instance variable of the config class

* Add support for exporting PyTorch LayoutLM to ONNX

* Added tests for converting LayoutLM to ONNX

* cleanup

* Add support for exporting PyTorch LayoutLM to ONNX

* cleanup

* Fixed import error

* Changed max_2d_positions from class variable to instance variable of the config class

* Use super class generate_dummy_inputs method

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

* Add support for Masked LM, sequence classification and token classification

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

* Removed uncessary import and method

* Fixed code styling

* Raise error if PyTorch is not installed

* Remove unnecessary import statement

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants