Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LayoutXLMProcessor applies the english Tesseract model #14511

Closed
Xargonus opened this issue Nov 24, 2021 · 1 comment · Fixed by #14514
Closed

LayoutXLMProcessor applies the english Tesseract model #14511

Xargonus opened this issue Nov 24, 2021 · 1 comment · Fixed by #14514

Comments

@Xargonus
Copy link
Contributor

Xargonus commented Nov 24, 2021

🚀 Feature request

LayoutXLMProcessor.__call__ should support a language argument for Tesseract OCR

Motivation

LayoutXLM is a multilingual version of the successful LayoutLMv2 model. The main reason to use it over LayoutLMV2 is to handle different languages, yet the current API does not allow specifying the language to be used in apply_tesseract.

Your contribution

I could submit a PR but I am not that familiar with the Transformers library to suggest the best place to add the lang argument.

@Xargonus Xargonus changed the title LayoutXLMProcessor applies the english tesseract model LayoutXLMProcessor applies the english Tesseract model Nov 24, 2021
@NielsRogge
Copy link
Contributor

Hi,

Great point, thanks for the feature request.

It would be rather straightforward: one should just add a lang parameter to this line.

Can you open a PR for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants