Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finetune the Docformer #42

Open
jack-gits opened this issue Sep 19, 2022 · 8 comments
Open

finetune the Docformer #42

jack-gits opened this issue Sep 19, 2022 · 8 comments

Comments

@jack-gits
Copy link

whether I can fine-tune the model of Docformer?
can you give some instruction please. thanks

@uakarsh
Copy link
Collaborator

uakarsh commented Sep 19, 2022

You can find the examples here

@jack-gits
Copy link
Author

how about this one https://github.com/shabie/docformer/blob/master/examples/DocFormer_for_MLM.ipynb.
what's the difference with above link.

@jack-gits
Copy link
Author

Docformer is based on Microsolft/LayoutLM, whether we can use it for commercial purpose?

@uakarsh
Copy link
Collaborator

uakarsh commented Sep 19, 2022

how about this one https://github.com/shabie/docformer/blob/master/examples/DocFormer_for_MLM.ipynb. what's the difference with above link.

It is just a pre-training strategy, however you mentioned about fine-tuning, so I shared the same.

I think, that recently LayoutLM where allowed for commercial purpose. You can search for it online.

@jack-gits
Copy link
Author

the liscense of layoutlmv3 has been changed back. curiously.

@jack-gits
Copy link
Author

image

if use_ocr=False, I can't encode the label. there's only have words and boxes in the input para.
how to deal with the labels?

@uakarsh
Copy link
Collaborator

uakarsh commented Sep 19, 2022

the liscense of layoutlmv3 has been changed back. curiously.

Looks like they don't want to allow them to use, however you can see if layoutlm can be used, since we only use the initial weights of embeddings out of it.

@uakarsh
Copy link
Collaborator

uakarsh commented Sep 19, 2022

image

if use_ocr=False, I can't encode the label. there's only have words and boxes in the input para. how to deal with the labels?

I had earlier shared the link for using DocFormer for token classification, you can visit it and use it for your your own purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants