Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multilingual LayoutLM Release #236

Closed
wangqihanginthesky opened this issue Sep 9, 2020 · 13 comments
Closed

Multilingual LayoutLM Release #236

wangqihanginthesky opened this issue Sep 9, 2020 · 13 comments

Comments

@wangqihanginthesky
Copy link

Describe
I think you will release the multilingual LayoutLM soon.Will it include a Chinese version and a Japanese version?
Thank you.

@wolfshow
Copy link
Contributor

wolfshow commented Sep 9, 2020

@wangqihanginthesky, The multilingual LayoutLM is pre-trained with a huge amount of mulingual documents that includes Chinese and Japanese.

@JosPolfliet
Copy link

Hi @wolfshow, do you have any indication when the multilingual model would be released? Days, weeks, months?

@wolfshow
Copy link
Contributor

@JosPolfliet We are working on the multilingual datasets and evaluation. It should be within several weeks.

@archwolf118
Copy link

@

@JosPolfliet We are working on the multilingual datasets and evaluation. It should be within several weeks.

we are waiting! Thanks a lot.

@JosPolfliet
Copy link

Hi @wolfshow, I know you are all busy people. Any update on the multilingual LayoutLM model? Would love to benchmark it !

@wolfshow
Copy link
Contributor

wolfshow commented Nov 18, 2020

@JosPolfliet We are actively working on that, not only the mulingual LayoutLM models but also some labeled multilingual benchmark datasets.

@hasansalimkanmaz
Copy link

Hi @wolfshow, I really appreciate your work and thank you very much. I am subscribed to this thread, but unfortunately I couldn't see any activity about multilingual layoutlm model for a long time. Is there any news about multilingual model? I think community is looking forward to using it.

@wolfshow
Copy link
Contributor

@hasansalimkanmaz We will have an update for the multilingual version in the coming weeks. The main blocking issue is we are working on a human labeled benchmark for the multilingual document understanding. It takes a lot of time to label and check for the vendors.

@hasansalimkanmaz
Copy link

Thank you very much for your reply @wolfshow I am looking forward to using it.

@dreamhighchina
Copy link

大佬,我想问下这个多语言的模型,是单独分成汉语日语等,还是合并在一个模型里面?

@wolfshow
Copy link
Contributor

大佬,我想问下这个多语言的模型,是单独分成汉语日语等,还是合并在一个模型里面?

will be a single model

@knitemblazor
Copy link

knitemblazor commented Mar 28, 2021

i have completed a github repo regarding the training and prediction flow for Multilingual LayoutLM as there are limitations on labelled dataset
i would suggest you build a dataset for training followed by testing in your particular languages
I have currently tested it for hindi, malayalam, english combinations. I have released the training flow and model accordingly for these languages generally adhaar dataset
https://github.com/knitemblazor/Multilingual_LayoutLM

@wolfshow
Copy link
Contributor

LayoutXLM is coming by extending the LayoutLM into multilingual support!

https://arxiv.org/abs/2104.08836

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants