Skip to content

Commit

Permalink
[LayoutLMv3] Add TensorFlow implementation (#18678)
Browse files Browse the repository at this point in the history
Co-authored-by: Esben Toke Christensen <esben.christensen@visma.com>
Co-authored-by: Lasse Reedtz <lasse.reedtz@visma.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
  • Loading branch information
5 people committed Aug 30, 2022
1 parent 7320d95 commit de8548e
Show file tree
Hide file tree
Showing 10 changed files with 2,227 additions and 10 deletions.
4 changes: 2 additions & 2 deletions docs/source/en/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ The documentation is organized in five parts:
- **GET STARTED** contains a quick tour and installation instructions to get up and running with 🤗 Transformers.
- **TUTORIALS** are a great place to begin if you are new to our library. This section will help you gain the basic skills you need to start using 🤗 Transformers.
- **HOW-TO GUIDES** will show you how to achieve a specific goal like fine-tuning a pretrained model for language modeling or how to create a custom model head.
- **CONCEPTUAL GUIDES** provides more discussion and explanation of the underlying concepts and ideas behind models, tasks, and the design philosophy of 🤗 Transformers.
- **CONCEPTUAL GUIDES** provides more discussion and explanation of the underlying concepts and ideas behind models, tasks, and the design philosophy of 🤗 Transformers.
- **API** describes each class and function, grouped in:

- **MAIN CLASSES** for the main classes exposing the important APIs of the library.
Expand Down Expand Up @@ -245,7 +245,7 @@ Flax), PyTorch, and/or TensorFlow.
| ImageGPT | | | | | |
| LayoutLM | | | | | |
| LayoutLMv2 | | | | | |
| LayoutLMv3 | | | | | |
| LayoutLMv3 | | | | | |
| LED | | | | | |
| LeViT | | | | | |
| Longformer | | | | | |
Expand Down
28 changes: 24 additions & 4 deletions docs/source/en/model_doc/layoutlmv3.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -26,18 +26,18 @@ Tips:

- In terms of data processing, LayoutLMv3 is identical to its predecessor [LayoutLMv2](layoutlmv2), except that:
- images need to be resized and normalized with channels in regular RGB format. LayoutLMv2 on the other hand normalizes the images internally and expects the channels in BGR format.
- text is tokenized using byte-pair encoding (BPE), as opposed to WordPiece.
- text is tokenized using byte-pair encoding (BPE), as opposed to WordPiece.
Due to these differences in data preprocessing, one can use [`LayoutLMv3Processor`] which internally combines a [`LayoutLMv3FeatureExtractor`] (for the image modality) and a [`LayoutLMv3Tokenizer`]/[`LayoutLMv3TokenizerFast`] (for the text modality) to prepare all data for the model.
- Regarding usage of [`LayoutLMv3Processor`], we refer to the [usage guide](layoutlmv2#usage-layoutlmv2processor) of its predecessor.
- Regarding usage of [`LayoutLMv3Processor`], we refer to the [usage guide](layoutlmv2#usage-layoutlmv2processor) of its predecessor.
- Demo notebooks for LayoutLMv3 can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/LayoutLMv3).
- Demo scripts can be found [here](https://github.com/huggingface/transformers/tree/main/examples/research_projects/layoutlmv3).

<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/layoutlmv3_architecture.png"
alt="drawing" width="600"/>
alt="drawing" width="600"/>

<small> LayoutLMv3 architecture. Taken from the <a href="https://arxiv.org/abs/2204.08387">original paper</a>. </small>

This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/microsoft/unilm/tree/master/layoutlmv3).
This model was contributed by [nielsr](https://huggingface.co/nielsr). The TensorFlow version of this model was added by [chriskoo](https://huggingface.co/chriskoo), [tokec](https://huggingface.co/tokec), and [lre](https://huggingface.co/lre). The original code can be found [here](https://github.com/microsoft/unilm/tree/master/layoutlmv3).


## LayoutLMv3Config
Expand Down Expand Up @@ -84,3 +84,23 @@ This model was contributed by [nielsr](https://huggingface.co/nielsr). The origi

[[autodoc]] LayoutLMv3ForQuestionAnswering
- forward

## TFLayoutLMv3Model

[[autodoc]] TFLayoutLMv3Model
- call

## TFLayoutLMv3ForSequenceClassification

[[autodoc]] TFLayoutLMv3ForSequenceClassification
- call

## TFLayoutLMv3ForTokenClassification

[[autodoc]] TFLayoutLMv3ForTokenClassification
- call

## TFLayoutLMv3ForQuestionAnswering

[[autodoc]] TFLayoutLMv3ForQuestionAnswering
- call
4 changes: 2 additions & 2 deletions docs/source/it/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ tokenizer (chiamato "slow"). Un tokenizer "fast" supportato dalla libreria 🤗
| ImageGPT | | | | | |
| LayoutLM | | | | | |
| LayoutLMv2 | | | | | |
| LayoutLMv3 | | | | | |
| LayoutLMv3 | | | | | |
| LED | | | | | |
| Longformer | | | | | |
| LUKE | | | | | |
Expand Down Expand Up @@ -288,4 +288,4 @@ tokenizer (chiamato "slow"). Un tokenizer "fast" supportato dalla libreria 🤗
| YOLOS | | | | | |
| YOSO | | | | | |

<!-- End table-->
<!-- End table-->
18 changes: 18 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2343,6 +2343,16 @@
"TFLayoutLMPreTrainedModel",
]
)
_import_structure["models.layoutlmv3"].extend(
[
"TF_LAYOUTLMV3_PRETRAINED_MODEL_ARCHIVE_LIST",
"TFLayoutLMv3ForQuestionAnswering",
"TFLayoutLMv3ForSequenceClassification",
"TFLayoutLMv3ForTokenClassification",
"TFLayoutLMv3Model",
"TFLayoutLMv3PreTrainedModel",
]
)
_import_structure["models.led"].extend(["TFLEDForConditionalGeneration", "TFLEDModel", "TFLEDPreTrainedModel"])
_import_structure["models.longformer"].extend(
[
Expand Down Expand Up @@ -4801,6 +4811,14 @@
TFHubertModel,
TFHubertPreTrainedModel,
)
from .models.layoutlmv3 import (
TF_LAYOUTLMV3_PRETRAINED_MODEL_ARCHIVE_LIST,
TFLayoutLMv3ForQuestionAnswering,
TFLayoutLMv3ForSequenceClassification,
TFLayoutLMv3ForTokenClassification,
TFLayoutLMv3Model,
TFLayoutLMv3PreTrainedModel,
)
from .models.led import TFLEDForConditionalGeneration, TFLEDModel, TFLEDPreTrainedModel
from .models.longformer import (
TF_LONGFORMER_PRETRAINED_MODEL_ARCHIVE_LIST,
Expand Down
6 changes: 4 additions & 2 deletions src/transformers/models/auto/modeling_tf_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@
("gptj", "TFGPTJModel"),
("hubert", "TFHubertModel"),
("layoutlm", "TFLayoutLMModel"),
("layoutlmv3", "TFLayoutLMv3Model"),
("led", "TFLEDModel"),
("longformer", "TFLongformerModel"),
("lxmert", "TFLxmertModel"),
Expand Down Expand Up @@ -268,6 +269,7 @@
("gpt2", "TFGPT2ForSequenceClassification"),
("gptj", "TFGPTJForSequenceClassification"),
("layoutlm", "TFLayoutLMForSequenceClassification"),
("layoutlmv3", "TFLayoutLMv3ForSequenceClassification"),
("longformer", "TFLongformerForSequenceClassification"),
("mobilebert", "TFMobileBertForSequenceClassification"),
("mpnet", "TFMPNetForSequenceClassification"),
Expand Down Expand Up @@ -297,6 +299,7 @@
("flaubert", "TFFlaubertForQuestionAnsweringSimple"),
("funnel", "TFFunnelForQuestionAnswering"),
("gptj", "TFGPTJForQuestionAnswering"),
("layoutlmv3", "TFLayoutLMv3ForQuestionAnswering"),
("longformer", "TFLongformerForQuestionAnswering"),
("mobilebert", "TFMobileBertForQuestionAnswering"),
("mpnet", "TFMPNetForQuestionAnswering"),
Expand All @@ -316,7 +319,6 @@
]
)


TF_MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES = OrderedDict(
[
# Model for Token Classification mapping
Expand All @@ -331,6 +333,7 @@
("flaubert", "TFFlaubertForTokenClassification"),
("funnel", "TFFunnelForTokenClassification"),
("layoutlm", "TFLayoutLMForTokenClassification"),
("layoutlmv3", "TFLayoutLMv3ForTokenClassification"),
("longformer", "TFLongformerForTokenClassification"),
("mobilebert", "TFMobileBertForTokenClassification"),
("mpnet", "TFMPNetForTokenClassification"),
Expand Down Expand Up @@ -373,7 +376,6 @@
]
)


TF_MODEL_MAPPING = _LazyAutoMapping(CONFIG_MAPPING_NAMES, TF_MODEL_MAPPING_NAMES)
TF_MODEL_FOR_PRETRAINING_MAPPING = _LazyAutoMapping(CONFIG_MAPPING_NAMES, TF_MODEL_FOR_PRETRAINING_MAPPING_NAMES)
TF_MODEL_WITH_LM_HEAD_MAPPING = _LazyAutoMapping(CONFIG_MAPPING_NAMES, TF_MODEL_WITH_LM_HEAD_MAPPING_NAMES)
Expand Down
31 changes: 31 additions & 0 deletions src/transformers/models/layoutlmv3/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
from ...utils import (
OptionalDependencyNotAvailable,
_LazyModule,
is_tf_available,
is_tokenizers_available,
is_torch_available,
is_vision_available,
Expand Down Expand Up @@ -60,6 +61,21 @@
"LayoutLMv3PreTrainedModel",
]

try:
if not is_tf_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["modeling_tf_layoutlmv3"] = [
"TF_LAYOUTLMV3_PRETRAINED_MODEL_ARCHIVE_LIST",
"TFLayoutLMv3ForQuestionAnswering",
"TFLayoutLMv3ForSequenceClassification",
"TFLayoutLMv3ForTokenClassification",
"TFLayoutLMv3Model",
"TFLayoutLMv3PreTrainedModel",
]

try:
if not is_vision_available():
raise OptionalDependencyNotAvailable()
Expand Down Expand Up @@ -101,6 +117,21 @@
LayoutLMv3PreTrainedModel,
)

try:
if not is_tf_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .modeling_tf_layoutlmv3 import (
TF_LAYOUTLMV3_PRETRAINED_MODEL_ARCHIVE_LIST,
TFLayoutLMv3ForQuestionAnswering,
TFLayoutLMv3ForSequenceClassification,
TFLayoutLMv3ForTokenClassification,
TFLayoutLMv3Model,
TFLayoutLMv3PreTrainedModel,
)

try:
if not is_vision_available():
raise OptionalDependencyNotAvailable()
Expand Down
Loading

0 comments on commit de8548e

Please sign in to comment.