[WIP] Add UDOP models #21239

raghavanone · 2023-01-22T05:46:58Z

#20650

HuggingFaceDocBuilderDev · 2023-01-22T06:00:45Z

The documentation is not available anymore as the PR was closed or merged.

raghavanone · 2023-01-26T05:49:22Z

@sgugger @NielsRogge The model weights are here https://huggingface.co/ZinengTang/Udop/tree/main , But how to get the config for these models ?

logan-markewich · 2023-01-26T15:51:15Z

@raghavanone For reference, someone asked the same question on the UDOP repo: microsoft/i-Code#17

raghavanone · 2023-01-27T13:36:26Z

Note: Cannot proceed further without microsoft releasing the entire weights. Currently vision decoder weights have not been released.

maxjeblick · 2023-01-27T15:46:21Z

If I'm not mistaken, vision decoder weights should not be needed when using the text layout decoder part, only.
vision_encoder weights are part of the shared model weights.

logan-markewich · 2023-02-03T15:01:01Z

@raghavanone is there anything else blocking? It sounds like we can proceed with the given weights, assuming that we notify users that the vision decoder is not trained.

raghavanone · 2023-02-04T07:47:09Z

@logan-markewich Yes, I will work on closing this within couple of days .

raghavanone · 2023-02-04T15:09:00Z

@sgugger Need some pointers on How should this model be tested ? Can I follow the tests used for T5 model and replicate similar tests ?

raghavanone · 2023-02-07T03:03:40Z

@NielsRogge Any pointer here ?

WaterKnight1998 · 2023-02-09T12:17:18Z

I hope it gets merged soon @raghavanone . Nice work :)

amyeroberts

Thanks for the PR and the work adding this model! I've done my best to address everything in the first PR - let me know if anything is unclear.

Some general notes:

There's numerous references to other models, in particar T5, throughout this PR. Can you make sure to update all of these to reflect the new model?
Make sure to add docstrings to the public classes and functions that can be directly imported from the init
Consistent naming patterns: some of the classes' prefix is Udop, others UDOP. We're moving away from variable casing in models names, and all should be updated to be Udop.
The model folder should be all lower case udop i.e. src/transformers/models/udop.
All custom layers in the modeling file should have a Udop prefix e.g. UdopBlock
All building blocks layers should take the config and initialise themselves with the values it contains rather than having many keywords in the init
Use as expressive variable names as possible, rather than e.g. B use batch_size
Utilise # Copied from wherever possible
Make sure to add tests for all possible input modalities
There should be integration tests added for the models checking their logits

src/transformers/__init__.py

src/transformers/models/UDOP/modeling_udop.py

amyeroberts

Thanks for the PR and the work adding this model! I've done my best to address everything in the first PR - let me know if anything is unclear.

Some general notes:

There's numerous references to other models, in particar T5, throughout this PR. Can you make sure to update all of these to reflect the new model?
Make sure to add docstrings to the public classes and functions that can be directly imported from the init
Consistent naming patterns: some of the classes' prefix is Udop, others UDOP. We're moving away from variable casing in models names, and all should be updated to be Udop.
The model folder should be all lower case udop i.e. src/transformers/models/udop.
All custom layers in the modeling file should have a Udop prefix e.g. UdopBlock
All building blocks layers should take the config and initialise themselves with the values it contains rather than having many keywords in the init
Use as expressive variable names as possible, rather than e.g. B use batch_size
Utilise # Copied from wherever possible
Make sure to add tests for all possible input modalities
There should be integration tests added for the models checking their logits

plamb-viso · 2023-02-22T19:26:06Z

Forgive my naiveté, why do all the tests call from_pretrained() on some variation of t5? The UDOP model checkpoints are here. Could these be used?

plamb-viso · 2023-02-22T19:48:29Z

Ah, I see that the test script they provide also uses T5-large, I expected it to use one of those checkpoints

thefirebanks · 2023-03-04T16:32:25Z

@raghavanone how are things going with this so far? I'm very interested in using this model as soon as it gets integrated - if you need a hand with anything let me know! And thanks for bringing it into the library 😄

raghavanone · 2023-03-06T08:18:17Z

@raghavanone how are things going with this so far? I'm very interested in using this model as soon as it gets integrated - if you need a hand with anything let me know! And thanks for bringing it into the library 😄

@thefirebanks I am working on fixing last few tests. Hoping to close this PR very soon. Sorry for the delay.

plamb-viso · 2023-03-06T22:23:12Z

@raghavanone I am currently trying to finetune UdopUniModelForConditionalGeneration using this PR. I ran into the following exception while training:

 File "/opt/conda/lib/python3.8/site-packages/transformers/models/udop/modeling_udop.py", line 2422, in forward
 encoder_outputs = self.encoder(
 TypeError: forward() got an unexpected keyword argument 'ids_keep'`

I explained what appears to be happening in this comment.

It looks like the ids_keep parameter was removed from UdopUniStack but not removed from the call to it in UdopUniModelForConditionalGeneration

EDIT
Looks like output_attentions, also needs to be removed
And in the self.decoder() call, cross_attn_head_mask, output_attentions

Happy to make the changes myself with repo permissions

raghavanone · 2023-03-07T07:10:59Z

@raghavanone I am currently trying to finetune UdopUniModelForConditionalGeneration using this PR. I ran into the following exception while training:
 File "/opt/conda/lib/python3.8/site-packages/transformers/models/udop/modeling_udop.py", line 2422, in forward
 encoder_outputs = self.encoder(
 TypeError: forward() got an unexpected keyword argument 'ids_keep'`
I explained what appears to be happening in this comment.

It looks like the ids_keep parameter was removed from UdopUniStack but not removed from the call to it in UdopUniModelForConditionalGeneration

EDIT Looks like output_attentions, also needs to be removed And in the self.decoder() call, cross_attn_head_mask, output_attentions

Happy to make the changes myself with repo permissions

@plamb-viso Yes, removing those parameters were not done in all places, I have fixed it locally. I am working on fixing failing tests. This the last step pending for merging. Fixing these tests are taking more time than expected.

NielsRogge · 2023-03-07T15:10:13Z

README.md

@@ -419,6 +419,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
 1. **[Trajectory Transformer](https://huggingface.co/docs/transformers/model_doc/trajectory_transformers)** (from the University of California at Berkeley) released with the paper [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) by Michael Janner, Qiyang Li, Sergey Levine
 1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
 1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
+1. **[Udop](model_doc/udop)** (from Microsoft) released with the paper [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/pdf/2212.02623.pdf) by Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.


Suggested change

1. **[Udop](model_doc/udop)** (from Microsoft) released with the paper [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/pdf/2212.02623.pdf) by Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.

1. **[UDOP](https://huggingface.co/docs/transformers/main/model_doc/udop)** (from Microsoft) released with the paper [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/abs/2212.02623) by Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.

Please run make fix-copies again to update the other README's

NielsRogge · 2023-03-07T15:10:59Z

docs/source/en/model_doc/udop.mdx

+specific language governing permissions and limitations under the License.
+-->
+
+# Udop


Suggested change

# Udop

# UDOP

NielsRogge · 2023-03-07T15:11:20Z

docs/source/en/model_doc/udop.mdx

+
+## Overview
+
+The udop models was presented in [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/pdf/2212.02623.pdf) by Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal.


Suggested change

The udop models was presented in [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/pdf/2212.02623.pdf) by Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal.

The UDOP model was presented in [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/abs/2212.02623) by Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal.

NielsRogge · 2023-03-07T15:12:03Z

docs/source/en/model_doc/udop.mdx

+T5 comes in two variations :
+
+- [udop-uni]
+
+- [udop-dual]


Links will have to be added here. Also T5 -> UDOP

raghavanone · 2023-03-08T11:42:19Z

src/transformers/models/auto/configuration_auto.py

@@ -175,6 +175,8 @@
        ("transfo-xl", "TransfoXLConfig"),
        ("trocr", "TrOCRConfig"),
        ("tvlt", "TvltConfig"),
+        # ("udop_dual", "UdopConfig"),


@NielsRogge I am not sure how to handle here. There are two models in UDOP, both share same config.

How do I setup MODEL_FOR_PRETRAINING_MAPPING_NAMES ?
How do I setup CONFIG_MAPPING_NAMES ?

This just needs to be ("udop", "UdopConfig"). "udop" is the model_type.

plamb-viso · 2023-03-08T19:56:36Z

tests/models/udop/test_modeling_udop.py

+        with tempfile.TemporaryDirectory() as tmpdirname:
+            torch.onnx.export(
+                model,
+                (config_and_inputs[1], config_and_inputs[3], config_and_inputs[2]),


@raghavanone Trying to understand how you are exporting the model here. Judging from prepare_config_and_inputs you are passing values into the model for (the forward function parameters) input_ids, seg_data and image. But these parameters are not the first 3 parameters for the forward function. How does onnx know that you are inputting values for input_ids, seg_data and image in this tuple?

In short, I'm having a lot of problems trying to torch.jit.trace or torch.onnx.export UdopUniModelForConditionalGeneration because (i believe) the forward function parameters I'm passing data into are not at the beginning of the forward function and in consecutive order. If you try to pass in None for the parameters you aren't filling you get a Only Tensors and (possibly nested) Lists, Dicts, and Tuples of Tensors can be traced error (can't pass in Nonetype).

In fact, when I run this test in my env I get:

File "python3.9/site-packages/transformers/models/udop/modeling_udop.py", line 1512, in forward bbox = torch.clip(bbox, 0.0, 1.0) TypeError: clip() received an invalid combination of arguments - got (NoneType, float, float), but expected one of: * (Tensor input, Number min, Number max, *, Tensor out) * (Tensor input, Tensor min, Tensor max, *, Tensor out)

Which is the same error I'm getting while trying to export my finetuned version of it.

I am able to get passed this error if I arrange the parameters im filling to the beginning of the forward function and in the same order as passed to torch.onnx.export. However, more errors arise:

RuntimeError: 0 INTERNAL ASSERT FAILED at "/Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/jit/ir/alias_analysis.cpp":614, please report a bug to PyTorch. We don't have an op for aten::full_like but it isn't a special case. Argument types: Tensor, bool, int, int, Device, bool, NoneType,

@plamb-viso Exporting to onnx is yet to done.

@raghavanone you can remove all ONNX export related code from this PR, we can add ONNX support for UDOP in a follow-up PR

Also ONNX support now lives fully in optimum, not Transformers.

Thanks @NielsRogge, @sgugger, I'll try to not muddy this PR up anymore with unrelated questions, but I'm curious, what would be HuggingFace's process for testing tracing/onnx export of a new model (e.g. UDOP) after the PR supporting the models inclusion into transformers has been merged? Just curious if you can set expectations at all for when users could expect some test code that verifies the UDOP model inside transformers can properly export.

NielsRogge · 2023-03-12T11:37:05Z

src/transformers/models/udop/configuration_udop.py

+    This is the configuration class to store the configuration of a [`UdopDualForConditionalGeneration`] or a
+    [`UdopUnimodelForConditionalGeneration`]. It is used to instantiate a UdopDualForConditionalGeneration or
+    UdopUnimodelForConditionalGeneration model according to the specified arguments, defining the model architecture.


Suggested change

This is the configuration class to store the configuration of a [`UdopDualForConditionalGeneration`] or a

[`UdopUnimodelForConditionalGeneration`]. It is used to instantiate a UdopDualForConditionalGeneration or

UdopUnimodelForConditionalGeneration model according to the specified arguments, defining the model architecture.

This is the configuration class to store the configuration of a [`UdopForConditionalGeneration`] or a

[`UdopDualForConditionalGeneration`]. It is used to instantiate a UdopForConditionalGeneration or

UdopDualForConditionalGeneration model according to the specified arguments, defining the model architecture.

Maybe let's simplify the names of the model classes a bit. Let's call the default model UdopForConditionalGeneration and the other variant UdopDualForConditionalGeneration.

NielsRogge · 2023-03-12T11:37:55Z

src/transformers/models/udop/configuration_udop.py

+
+    Arguments:
+        vocab_size (`int`, *optional*, defaults to 32128):
+            Vocabulary size of the Udop model. Defines the number of different tokens that can be represented by the


Suggested change

Vocabulary size of the Udop model. Defines the number of different tokens that can be represented by the

Vocabulary size of the UDOP model. Defines the number of different tokens that can be represented by the

NielsRogge · 2023-03-12T11:38:22Z

src/transformers/models/udop/configuration_udop.py

+            Vocabulary size of the Udop model. Defines the number of different tokens that can be represented by the
+            `inputs_ids`.
+        d_model (`int`, *optional*, defaults to 512):
+            Size of the encoder layers and the pooler layer.


This probably is the hidden size of both the encoder and decoder?

NielsRogge · 2023-03-12T11:38:47Z

src/transformers/models/udop/configuration_udop.py

+            Size of the intermediate feed forward layer in each `UdopBlock`.
+        num_layers (`int`, *optional*, defaults to 6):
+            Number of hidden layers in the Transformer encoder.
+        num_decoder_layers (`int`, *optional*):


Any reason there's no num_encoder_layers argument?

maxjeblick · 2023-03-13T15:21:22Z

@raghavanone I saw you closed this PR. Skimming over your work, the PR seemed to be in a rather good state. Where there any blockers you encountered? IMO, it would be nice to add UDOP models in Hugginface at some point.

raghavanone · 2023-03-13T15:24:42Z

@maxjeblick @NielsRogge feels that the code original repo is bit hacky, he is working a separate PR to UDOP in better implementation, so closed this in consultation with him. He should open a PR soon .

@NielsRogge please do add more details for the benefit of folks following this PR

maxjeblick · 2023-03-13T15:27:10Z

Thanks a lot for the fast reply!

plamb-viso · 2023-03-13T16:02:02Z

@NielsRogge @raghavanone please link the new PR when its available for people subscribed to this one

NielsRogge · 2023-03-13T16:30:48Z

Hi yes I'll open a PR soon! Thanks a lot for your work already @raghavanone, will ping you on the PR

plamb-viso · 2023-03-24T16:38:01Z

Hi @NielsRogge I saw the large amount of commits on your new UDOP branch, curious if you have any idea on when you think a PR might be ready

plamb-viso · 2023-04-19T14:00:36Z

Sorry to keep hammering on this, but again have noticed a flurry of activity on that branch then almost 2 weeks off. Curious what the plan is for it @NielsRogge

NielsRogge · 2023-04-21T20:48:16Z

Hi @plamb-viso sorry for the late reply, the model is working, only have limited time to work on it. I'll open a PR this weekend/Monday.

For now you can already use the model if you're curious, check this code example regarding usage. Model is already on the hub here.

dtiarks · 2023-04-22T12:54:14Z

Out of curiosity @NielsRogge : did you ever use your implementation to fine tune it on a task like CORD?

NielsRogge · 2023-04-22T15:50:58Z

I've fine-tuned the model on a toy dataset of RVL-CDIP, works well but the model is pretty heavy, got OOM on Google Colab even with batch size = 1 so had to use a bigger GPU. The author only released large variants.

plamb-viso · 2023-04-22T18:21:32Z

In my original work on @raghavanone 's version of the model, I also had to use a batch size of 1 to get it to not OOM on 40gb GPUs

raghavanone mentioned this pull request Jan 27, 2023

Question about tokenizer and config.json microsoft/i-Code#17

Closed

logan-markewich mentioned this pull request Jan 30, 2023

DocVQA microsoft/i-Code#18

Closed

amyeroberts reviewed Feb 16, 2023

View reviewed changes

plamb-viso mentioned this pull request Mar 2, 2023

Can load Udop-Dual-Large-224, but not Udop-Unimodel-Large-224 microsoft/i-Code#30

Closed

NielsRogge reviewed Mar 7, 2023

View reviewed changes

raghavanone added 5 commits March 8, 2023 11:16

Add UDOP models

b1ca883

Fix black issue

4744bdf

Fix isort issue

ec28677

Fix black issue

813537e

Fix import and make more changes

1bda59b

raghavanone added 4 commits March 8, 2023 16:18

Check repo failure

5916ba6

Check repo failure

03202f1

Remove config tuple

68326d6

Fix check_repo

2c598cb

raghavanone commented Mar 8, 2023

View reviewed changes

raghavanone added 2 commits March 8, 2023 17:14

Temp fix for check_repo

5d4bae8

run fix-copies

dde1d6c

plamb-viso reviewed Mar 8, 2023

View reviewed changes

NielsRogge reviewed Mar 12, 2023

View reviewed changes

raghavanone added 5 commits March 13, 2023 16:44

Fix all failing tests

55fd5bb

Fix style issues

5bf54c9

Fix style issues

57cddb5

Fix style issues

49b6c7c

Fix check repo failure

63eeb02

raghavanone closed this Mar 13, 2023

plamb-viso mentioned this pull request May 12, 2023

Add UDOP #22940

Merged

4 tasks

	1. [Udop](model_doc/udop) (from Microsoft) released with the paper [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/pdf/2212.02623.pdf) by Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.
	1. [UDOP](https://huggingface.co/docs/transformers/main/model_doc/udop) (from Microsoft) released with the paper [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/abs/2212.02623) by Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.


		## Overview

		The udop models was presented in [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/pdf/2212.02623.pdf) by Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal.

	The udop models was presented in [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/pdf/2212.02623.pdf) by Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal.
	The UDOP model was presented in [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/abs/2212.02623) by Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal.

	Vocabulary size of the Udop model. Defines the number of different tokens that can be represented by the
	Vocabulary size of the UDOP model. Defines the number of different tokens that can be represented by the

[WIP] Add UDOP models #21239

[WIP] Add UDOP models #21239

Conversation

raghavanone commented Jan 22, 2023

HuggingFaceDocBuilderDev commented Jan 22, 2023 • edited Loading

raghavanone commented Jan 26, 2023

logan-markewich commented Jan 26, 2023 • edited Loading

raghavanone commented Jan 27, 2023

maxjeblick commented Jan 27, 2023

logan-markewich commented Feb 3, 2023

raghavanone commented Feb 4, 2023

raghavanone commented Feb 4, 2023 • edited Loading

raghavanone commented Feb 7, 2023

WaterKnight1998 commented Feb 9, 2023 • edited Loading

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

plamb-viso commented Feb 22, 2023

plamb-viso commented Feb 22, 2023 • edited Loading

thefirebanks commented Mar 4, 2023

raghavanone commented Mar 6, 2023

plamb-viso commented Mar 6, 2023 • edited Loading

raghavanone commented Mar 7, 2023 • edited Loading

NielsRogge Mar 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NielsRogge Mar 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxjeblick commented Mar 13, 2023

raghavanone commented Mar 13, 2023

maxjeblick commented Mar 13, 2023

plamb-viso commented Mar 13, 2023

NielsRogge commented Mar 13, 2023

plamb-viso commented Mar 24, 2023

plamb-viso commented Apr 19, 2023

NielsRogge commented Apr 21, 2023 • edited Loading

dtiarks commented Apr 22, 2023

NielsRogge commented Apr 22, 2023

plamb-viso commented Apr 22, 2023

HuggingFaceDocBuilderDev commented Jan 22, 2023 •

edited

Loading

logan-markewich commented Jan 26, 2023 •

edited

Loading

raghavanone commented Feb 4, 2023 •

edited

Loading

WaterKnight1998 commented Feb 9, 2023 •

edited

Loading

plamb-viso commented Feb 22, 2023 •

edited

Loading

plamb-viso commented Mar 6, 2023 •

edited

Loading

raghavanone commented Mar 7, 2023 •

edited

Loading

NielsRogge Mar 7, 2023 •

edited

Loading

NielsRogge Mar 12, 2023 •

edited

Loading

NielsRogge commented Apr 21, 2023 •

edited

Loading