-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LayoutLMv2 OnnxConfig #16309
Add LayoutLMv2 OnnxConfig #16309
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
Ok thanks for the links, I will look at them when I got some time and start improving the actual PR (it seems that the last PR has been closed due to inactivity, so I will take what I can from this and improve my own PR) Thanks! |
@@ -235,6 +235,22 @@ def _generate_dummy_images( | |||
images.append(Image.fromarray(data.astype("uint8")).convert("RGB")) | |||
return images | |||
|
|||
def _generate_dummy_bbox(self, batch_size: int = 2, image_height: int = 40, image_width: int = 40) -> List[int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a way to generate dummy_bbox
if required by the preprocessor. The bounding box are scaled on the image size by default.
@@ -245,6 +261,7 @@ def generate_dummy_inputs( | |||
num_channels: int = 3, | |||
image_width: int = 40, | |||
image_height: int = 40, | |||
return_bbox: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added a new arguments to generate_dummy_inputs
to handle if the preprocessor needs bounding boxes or not.
@@ -295,6 +314,10 @@ def generate_dummy_inputs( | |||
) | |||
# Generate dummy inputs according to compute batch and sequence | |||
dummy_input = [" ".join([preprocessor.unk_token]) * seq_length] * batch_size | |||
# Generate dummy bounding boxes if needed by the preprocessor e.g. for LayoutLMv2 | |||
if return_bbox is True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the return_bbox
is True then it means the preprocessor needs bounding boxes and it returns the appropriate dictionary with dummy_bbox.
So, with the previous add the dummy inputs generation is working, but it seems that the There is a problem when embeddings = inputs_embeds + position_embeddings + spatial_position_embeddings + token_type_embeddings
torch.Size([1, 14, 768]) torch.Size([1, 63, 768]) torch.Size([1, 14, 768]) torch.Size([1, 14, 768]) Here is the error trace : Traceback (most recent call last):
File "/home/chainyo/miniconda3/envs/transformers/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/chainyo/miniconda3/envs/transformers/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/chainyo/miniconda3/envs/transformers/lib/python3.8/site-packages/transformers/onnx/__main__.py", line 99, in <module>
main()
File "/home/chainyo/miniconda3/envs/transformers/lib/python3.8/site-packages/transformers/onnx/__main__.py", line 81, in main
onnx_inputs, onnx_outputs = export(
File "/home/chainyo/miniconda3/envs/transformers/lib/python3.8/site-packages/transformers/onnx/convert.py", line 308, in export
return export_pytorch(preprocessor, model, config, opset, output, tokenizer=tokenizer)
File "/home/chainyo/miniconda3/envs/transformers/lib/python3.8/site-packages/transformers/onnx/convert.py", line 171, in export_pytorch
raise err
File "/home/chainyo/miniconda3/envs/transformers/lib/python3.8/site-packages/transformers/onnx/convert.py", line 148, in export_pytorch
onnx_export(
File "/home/chainyo/miniconda3/envs/transformers/lib/python3.8/site-packages/torch/onnx/__init__.py", line 275, in export
return utils.export(model, args, f, export_params, verbose, training,
File "/home/chainyo/miniconda3/envs/transformers/lib/python3.8/site-packages/torch/onnx/utils.py", line 88, in export
_export(model, args, f, export_params, verbose, training, input_names, output_names,
File "/home/chainyo/miniconda3/envs/transformers/lib/python3.8/site-packages/torch/onnx/utils.py", line 689, in _export
_model_to_graph(model, args, verbose, input_names,
File "/home/chainyo/miniconda3/envs/transformers/lib/python3.8/site-packages/torch/onnx/utils.py", line 458, in _model_to_graph
graph, params, torch_out, module = _create_jit_graph(model, args,
File "/home/chainyo/miniconda3/envs/transformers/lib/python3.8/site-packages/torch/onnx/utils.py", line 422, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
File "/home/chainyo/miniconda3/envs/transformers/lib/python3.8/site-packages/torch/onnx/utils.py", line 373, in _trace_and_get_graph_from_model
torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True)
File "/home/chainyo/miniconda3/envs/transformers/lib/python3.8/site-packages/torch/jit/_trace.py", line 1160, in _get_trace_graph
outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
File "/home/chainyo/miniconda3/envs/transformers/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/chainyo/miniconda3/envs/transformers/lib/python3.8/site-packages/torch/jit/_trace.py", line 127, in forward
graph, out = torch._C._create_graph_by_tracing(
File "/home/chainyo/miniconda3/envs/transformers/lib/python3.8/site-packages/torch/jit/_trace.py", line 118, in wrapper
outs.append(self.inner(*trace_inputs))
File "/home/chainyo/miniconda3/envs/transformers/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/chainyo/miniconda3/envs/transformers/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1039, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/chainyo/miniconda3/envs/transformers/lib/python3.8/site-packages/transformers/models/layoutlmv2/modeling_layoutlmv2.py", line 893, in forward
text_layout_emb = self._calc_text_embeddings(
File "/home/chainyo/miniconda3/envs/transformers/lib/python3.8/site-packages/transformers/models/layoutlmv2/modeling_layoutlmv2.py", line 758, in _calc_text_embeddings
embeddings = inputs_embeds + position_embeddings + spatial_position_embeddings + token_type_embeddings
RuntimeError: The size of tensor a (14) must match the size of tensor b (63) at non-singleton dimension 1 I'm investigating to fix the tensors problem. |
@chainyo have you find a solution for tensors problem ? 🤗 |
It seems that one tensor size is changing for no reason on my previous tests. I will dig more this week, I had no time last week. |
Great,if you want, we can organise a Google meet and looking the issue together. You can send me an email at : hubarjulien@gmail.com |
@chainyo I am getting errors while using your PR during inference time. I am using the below code for token-classification (FUNSD): Error: Any help is really appreciated. @chainyo @lewtun @michaelbenayoun |
You can try to add to the processor function token_type_ids like this :
|
there is a method called evaluation_loop which does inferencing for loaded onnx models. it expects only huggingface dataset as per https://github.com/huggingface/optimum. Please look at the snippet which I posted earlier. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hi! Thanks for working on this @chainyo 🙏 I am interested in this work. Are you still working on this? If you no longer have the time or resources to do so, would you be able to provide any next steps as you see it for this ONNX export to work? Thank you for your time and effort 🤗 |
Hey @chainyo, regarding your error with the tests - my guess is that the dummy data generation is the culprit. My suggestion would be to:
|
@malcolmgreaves I don't even remember where I was with this issue months ago. But I will try to work on it this week if I can. |
@lewtun You are right! It seems that someone solved the things I was trying to achieve by doing it with LayoutLMv3. So I will check that and see what I can apply to v2. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
What does this PR do?
Add LayoutLMv2 OnnxConfig to make this model available for conversion.
I took the same config as
LayoutLM
and added the adapted shebang.Who can review?
Models: @LysandreJik @lewtun