Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add UDOP models #21239

Closed
wants to merge 78 commits into from
Closed

Conversation

raghavanone
Copy link
Contributor

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jan 22, 2023

The documentation is not available anymore as the PR was closed or merged.

@raghavanone
Copy link
Contributor Author

@sgugger @NielsRogge The model weights are here https://huggingface.co/ZinengTang/Udop/tree/main , But how to get the config for these models ?

@logan-markewich
Copy link

logan-markewich commented Jan 26, 2023

@raghavanone For reference, someone asked the same question on the UDOP repo: microsoft/i-Code#17

@raghavanone
Copy link
Contributor Author

Note: Cannot proceed further without microsoft releasing the entire weights. Currently vision decoder weights have not been released.

@maxjeblick
Copy link

If I'm not mistaken, vision decoder weights should not be needed when using the text layout decoder part, only.
vision_encoder weights are part of the shared model weights.

@logan-markewich
Copy link

@raghavanone is there anything else blocking? It sounds like we can proceed with the given weights, assuming that we notify users that the vision decoder is not trained.

@raghavanone
Copy link
Contributor Author

@logan-markewich Yes, I will work on closing this within couple of days .

@raghavanone
Copy link
Contributor Author

raghavanone commented Feb 4, 2023

@sgugger Need some pointers on How should this model be tested ? Can I follow the tests used for T5 model and replicate similar tests ?

@raghavanone
Copy link
Contributor Author

@NielsRogge Any pointer here ?

@WaterKnight1998
Copy link

WaterKnight1998 commented Feb 9, 2023

I hope it gets merged soon @raghavanone . Nice work :)

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and the work adding this model! I've done my best to address everything in the first PR - let me know if anything is unclear.

Some general notes:

  • There's numerous references to other models, in particar T5, throughout this PR. Can you make sure to update all of these to reflect the new model?
  • Make sure to add docstrings to the public classes and functions that can be directly imported from the init
  • Consistent naming patterns: some of the classes' prefix is Udop, others UDOP. We're moving away from variable casing in models names, and all should be updated to be Udop.
  • The model folder should be all lower case udop i.e. src/transformers/models/udop.
  • All custom layers in the modeling file should have a Udop prefix e.g. UdopBlock
  • All building blocks layers should take the config and initialise themselves with the values it contains rather than having many keywords in the init
  • Use as expressive variable names as possible, rather than e.g. B use batch_size
  • Utilise # Copied from wherever possible
  • Make sure to add tests for all possible input modalities
  • There should be integration tests added for the models checking their logits

src/transformers/__init__.py Outdated Show resolved Hide resolved
src/transformers/__init__.py Outdated Show resolved Hide resolved
src/transformers/__init__.py Outdated Show resolved Hide resolved
src/transformers/__init__.py Outdated Show resolved Hide resolved
src/transformers/__init__.py Outdated Show resolved Hide resolved
src/transformers/models/UDOP/modeling_udop.py Outdated Show resolved Hide resolved
src/transformers/models/UDOP/modeling_udop.py Outdated Show resolved Hide resolved
src/transformers/models/UDOP/modeling_udop.py Outdated Show resolved Hide resolved
src/transformers/models/UDOP/modeling_udop.py Outdated Show resolved Hide resolved
src/transformers/models/UDOP/modeling_udop.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and the work adding this model! I've done my best to address everything in the first PR - let me know if anything is unclear.

Some general notes:

  • There's numerous references to other models, in particar T5, throughout this PR. Can you make sure to update all of these to reflect the new model?
  • Make sure to add docstrings to the public classes and functions that can be directly imported from the init
  • Consistent naming patterns: some of the classes' prefix is Udop, others UDOP. We're moving away from variable casing in models names, and all should be updated to be Udop.
  • The model folder should be all lower case udop i.e. src/transformers/models/udop.
  • All custom layers in the modeling file should have a Udop prefix e.g. UdopBlock
  • All building blocks layers should take the config and initialise themselves with the values it contains rather than having many keywords in the init
  • Use as expressive variable names as possible, rather than e.g. B use batch_size
  • Utilise # Copied from wherever possible
  • Make sure to add tests for all possible input modalities
  • There should be integration tests added for the models checking their logits

@plamb-viso
Copy link

Forgive my naiveté, why do all the tests call from_pretrained() on some variation of t5? The UDOP model checkpoints are here. Could these be used?

@plamb-viso
Copy link

plamb-viso commented Feb 22, 2023

Ah, I see that the test script they provide also uses T5-large, I expected it to use one of those checkpoints

@thefirebanks
Copy link

@raghavanone how are things going with this so far? I'm very interested in using this model as soon as it gets integrated - if you need a hand with anything let me know! And thanks for bringing it into the library 😄

@raghavanone
Copy link
Contributor Author

@raghavanone how are things going with this so far? I'm very interested in using this model as soon as it gets integrated - if you need a hand with anything let me know! And thanks for bringing it into the library 😄

@thefirebanks I am working on fixing last few tests. Hoping to close this PR very soon. Sorry for the delay.

@plamb-viso
Copy link

plamb-viso commented Mar 6, 2023

@raghavanone I am currently trying to finetune UdopUniModelForConditionalGeneration using this PR. I ran into the following exception while training:

 File "/opt/conda/lib/python3.8/site-packages/transformers/models/udop/modeling_udop.py", line 2422, in forward
 encoder_outputs = self.encoder(
 TypeError: forward() got an unexpected keyword argument 'ids_keep'`

I explained what appears to be happening in this comment.

It looks like the ids_keep parameter was removed from UdopUniStack but not removed from the call to it in UdopUniModelForConditionalGeneration

EDIT
Looks like output_attentions, also needs to be removed
And in the self.decoder() call, cross_attn_head_mask, output_attentions

Happy to make the changes myself with repo permissions

@raghavanone
Copy link
Contributor Author

raghavanone commented Mar 7, 2023

@raghavanone I am currently trying to finetune UdopUniModelForConditionalGeneration using this PR. I ran into the following exception while training:

 File "/opt/conda/lib/python3.8/site-packages/transformers/models/udop/modeling_udop.py", line 2422, in forward
 encoder_outputs = self.encoder(
 TypeError: forward() got an unexpected keyword argument 'ids_keep'`

I explained what appears to be happening in this comment.

It looks like the ids_keep parameter was removed from UdopUniStack but not removed from the call to it in UdopUniModelForConditionalGeneration

EDIT Looks like output_attentions, also needs to be removed And in the self.decoder() call, cross_attn_head_mask, output_attentions

Happy to make the changes myself with repo permissions

@plamb-viso Yes, removing those parameters were not done in all places, I have fixed it locally. I am working on fixing failing tests. This the last step pending for merging. Fixing these tests are taking more time than expected.

README.md Outdated
@@ -419,6 +419,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
1. **[Trajectory Transformer](https://huggingface.co/docs/transformers/model_doc/trajectory_transformers)** (from the University of California at Berkeley) released with the paper [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) by Michael Janner, Qiyang Li, Sergey Levine
1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
1. **[Udop](model_doc/udop)** (from Microsoft) released with the paper [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/pdf/2212.02623.pdf) by Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.
Copy link
Contributor

@NielsRogge NielsRogge Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. **[Udop](model_doc/udop)** (from Microsoft) released with the paper [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/pdf/2212.02623.pdf) by Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.
1. **[UDOP](https://huggingface.co/docs/transformers/main/model_doc/udop)** (from Microsoft) released with the paper [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/abs/2212.02623) by Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.

Please run make fix-copies again to update the other README's

specific language governing permissions and limitations under the License.
-->

# Udop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Udop
# UDOP


## Overview

The udop models was presented in [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/pdf/2212.02623.pdf) by Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The udop models was presented in [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/pdf/2212.02623.pdf) by Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal.
The UDOP model was presented in [Unifying Vision, Text, and Layout for Universal Document Processing](https://arxiv.org/abs/2212.02623) by Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal.

Comment on lines 34 to 38
T5 comes in two variations :

- [udop-uni]

- [udop-dual]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Links will have to be added here. Also T5 -> UDOP

@@ -175,6 +175,8 @@
("transfo-xl", "TransfoXLConfig"),
("trocr", "TrOCRConfig"),
("tvlt", "TvltConfig"),
# ("udop_dual", "UdopConfig"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NielsRogge I am not sure how to handle here. There are two models in UDOP, both share same config.

How do I setup MODEL_FOR_PRETRAINING_MAPPING_NAMES ?
How do I setup CONFIG_MAPPING_NAMES ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just needs to be ("udop", "UdopConfig"). "udop" is the model_type.

with tempfile.TemporaryDirectory() as tmpdirname:
torch.onnx.export(
model,
(config_and_inputs[1], config_and_inputs[3], config_and_inputs[2]),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raghavanone Trying to understand how you are exporting the model here. Judging from prepare_config_and_inputs you are passing values into the model for (the forward function parameters) input_ids, seg_data and image. But these parameters are not the first 3 parameters for the forward function. How does onnx know that you are inputting values for input_ids, seg_data and image in this tuple?

In short, I'm having a lot of problems trying to torch.jit.trace or torch.onnx.export UdopUniModelForConditionalGeneration because (i believe) the forward function parameters I'm passing data into are not at the beginning of the forward function and in consecutive order. If you try to pass in None for the parameters you aren't filling you get a Only Tensors and (possibly nested) Lists, Dicts, and Tuples of Tensors can be traced error (can't pass in Nonetype).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, when I run this test in my env I get:

  File "python3.9/site-packages/transformers/models/udop/modeling_udop.py", line 1512, in forward
    bbox = torch.clip(bbox, 0.0, 1.0)
TypeError: clip() received an invalid combination of arguments - got (NoneType, float, float), but expected one of:
 * (Tensor input, Number min, Number max, *, Tensor out)
 * (Tensor input, Tensor min, Tensor max, *, Tensor out)

Which is the same error I'm getting while trying to export my finetuned version of it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am able to get passed this error if I arrange the parameters im filling to the beginning of the forward function and in the same order as passed to torch.onnx.export. However, more errors arise:

RuntimeError: 0 INTERNAL ASSERT FAILED at "/Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/jit/ir/alias_analysis.cpp":614, please report a bug to PyTorch. We don't have an op for aten::full_like but it isn't a special case.  Argument types: Tensor, bool, int, int, Device, bool, NoneType,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plamb-viso Exporting to onnx is yet to done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raghavanone you can remove all ONNX export related code from this PR, we can add ONNX support for UDOP in a follow-up PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also ONNX support now lives fully in optimum, not Transformers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @NielsRogge, @sgugger, I'll try to not muddy this PR up anymore with unrelated questions, but I'm curious, what would be HuggingFace's process for testing tracing/onnx export of a new model (e.g. UDOP) after the PR supporting the models inclusion into transformers has been merged? Just curious if you can set expectations at all for when users could expect some test code that verifies the UDOP model inside transformers can properly export.

Comment on lines +26 to +28
This is the configuration class to store the configuration of a [`UdopDualForConditionalGeneration`] or a
[`UdopUnimodelForConditionalGeneration`]. It is used to instantiate a UdopDualForConditionalGeneration or
UdopUnimodelForConditionalGeneration model according to the specified arguments, defining the model architecture.
Copy link
Contributor

@NielsRogge NielsRogge Mar 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This is the configuration class to store the configuration of a [`UdopDualForConditionalGeneration`] or a
[`UdopUnimodelForConditionalGeneration`]. It is used to instantiate a UdopDualForConditionalGeneration or
UdopUnimodelForConditionalGeneration model according to the specified arguments, defining the model architecture.
This is the configuration class to store the configuration of a [`UdopForConditionalGeneration`] or a
[`UdopDualForConditionalGeneration`]. It is used to instantiate a UdopForConditionalGeneration or
UdopDualForConditionalGeneration model according to the specified arguments, defining the model architecture.

Maybe let's simplify the names of the model classes a bit. Let's call the default model UdopForConditionalGeneration and the other variant UdopDualForConditionalGeneration.


Arguments:
vocab_size (`int`, *optional*, defaults to 32128):
Vocabulary size of the Udop model. Defines the number of different tokens that can be represented by the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Vocabulary size of the Udop model. Defines the number of different tokens that can be represented by the
Vocabulary size of the UDOP model. Defines the number of different tokens that can be represented by the

Vocabulary size of the Udop model. Defines the number of different tokens that can be represented by the
`inputs_ids`.
d_model (`int`, *optional*, defaults to 512):
Size of the encoder layers and the pooler layer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably is the hidden size of both the encoder and decoder?

Size of the intermediate feed forward layer in each `UdopBlock`.
num_layers (`int`, *optional*, defaults to 6):
Number of hidden layers in the Transformer encoder.
num_decoder_layers (`int`, *optional*):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason there's no num_encoder_layers argument?

@maxjeblick
Copy link

@raghavanone I saw you closed this PR. Skimming over your work, the PR seemed to be in a rather good state. Where there any blockers you encountered? IMO, it would be nice to add UDOP models in Hugginface at some point.

@raghavanone
Copy link
Contributor Author

@maxjeblick @NielsRogge feels that the code original repo is bit hacky, he is working a separate PR to UDOP in better implementation, so closed this in consultation with him. He should open a PR soon .

@NielsRogge please do add more details for the benefit of folks following this PR

@maxjeblick
Copy link

Thanks a lot for the fast reply!

@plamb-viso
Copy link

@NielsRogge @raghavanone please link the new PR when its available for people subscribed to this one

@NielsRogge
Copy link
Contributor

Hi yes I'll open a PR soon! Thanks a lot for your work already @raghavanone, will ping you on the PR

@plamb-viso
Copy link

Hi @NielsRogge I saw the large amount of commits on your new UDOP branch, curious if you have any idea on when you think a PR might be ready

@plamb-viso
Copy link

Sorry to keep hammering on this, but again have noticed a flurry of activity on that branch then almost 2 weeks off. Curious what the plan is for it @NielsRogge

@NielsRogge
Copy link
Contributor

NielsRogge commented Apr 21, 2023

Hi @plamb-viso sorry for the late reply, the model is working, only have limited time to work on it. I'll open a PR this weekend/Monday.

For now you can already use the model if you're curious, check this code example regarding usage. Model is already on the hub here.

@dtiarks
Copy link

dtiarks commented Apr 22, 2023

Out of curiosity @NielsRogge : did you ever use your implementation to fine tune it on a task like CORD?

@NielsRogge
Copy link
Contributor

I've fine-tuned the model on a toy dataset of RVL-CDIP, works well but the model is pretty heavy, got OOM on Google Colab even with batch size = 1 so had to use a bigger GPU. The author only released large variants.

@plamb-viso
Copy link

In my original work on @raghavanone 's version of the model, I also had to use a batch size of 1 to get it to not OOM on 40gb GPUs

@plamb-viso plamb-viso mentioned this pull request May 12, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.