Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo notebook for LayoutLMForSequenceClassification #287

Open
NielsRogge opened this issue Jan 13, 2021 · 13 comments
Open

Demo notebook for LayoutLMForSequenceClassification #287

NielsRogge opened this issue Jan 13, 2021 · 13 comments

Comments

@NielsRogge
Copy link

NielsRogge commented Jan 13, 2021

Hey there,

I've recently improved LayoutLM in the HuggingFace Transformers library by adding some more documentation + code examples, a demo notebook that illustrates how to fine-tune LayoutLMForTokenClassification on the FUNSD dataset, some integration tests that verify whether the implementation in HuggingFace Transformers gives the same output tensors on the same input data as the original implementation, and finally LayoutLMForSequenceClassification. My PR was merged yesterday :)

However, now I'm also preparing a notebook that illustrates how to fine-tune LayoutLMForSequenceClassification on (a small subset of) the RVL-CDIP dataset. However, it doesn't seem to be able to overfit the tiny subset (I have 16 images per class, so as there are 16 labels I have 256 training examples). You can run it here: https://colab.research.google.com/drive/1DUpTi2aL64AuIJ_9g6dGgKfltEEFqQbt?usp=sharing

Any feedback is greatly appreciated!

@NielsRogge
Copy link
Author

NielsRogge commented Jan 14, 2021

Btw, the demo notebook for fine-tuning LayoutLMForTokenClassification on the FUNSD dataset can be found here.

@aritzLizoain
Copy link

Hi @NielsRogge, thanks for providing the notebooks!

I am working with your demo notebook for fine-tuning LayoutLMForTokenClassification. How can we save the fine-tuned model in order to use it in for inference in the future? I don't see any output file after fine-tuning.

Thank you in advance!

@NielsRogge
Copy link
Author

Hi! In HuggingFace, a model can be saved using model.save_pretrained("name-of-your-directory"). This will save both the weights (pytorch_model.bin file), as well as the configuration (config.json) to the directory.

@aritzLizoain
Copy link

Thank you for your prompt reply!

@monuminu
Copy link

@NielsRogge Cant thank you enough . It really helped . I took your code and implemented without installing unlim basically pure transformers . Would love to add to your repo my notebook .

@VishnuGopireddy
Copy link

@NielsRogge I am getting "PicklingError: Can't pickle <class 'layoutlm.data.funsd.InputFeatures'>: import of module 'layoutlm.data.funsd' failed" while preparing a dataloader for FUNSD dataset. Can you please help?

@NielsRogge
Copy link
Author

Hi @monuminu @VishnuGopireddy I have a new notebook that adds visual features from a Resnet-101 backbone in addition to the text + layout features. You can find it here: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Add_image_embeddings_to_LayoutLM.ipynb

It relies entirely on HuggingFace Transformers, no need for the unilm repo anymore :)

@VishnuGopireddy
Copy link

@NielsRogge Awesome!!! Woking fine. Big thanks.

@monuminu
Copy link

@NielsRogge amazing work !

@VishnuGopireddy
Copy link

Hi @NielsRogge Nice work, I am able to get all the tags for each word. Is there any way/ approach to get correspondence between tags? I mean mapping question to the answer. Thanks...

@nkrot
Copy link

nkrot commented Jun 11, 2021

Hi @NielsRogge,
Thanks for the notebook, it is very instructive!
And it would be even more useful if it contained an example showing how to use the model fine-tuned.

@vinayakk1094
Copy link

The link seems to be broken - 'Sorry, the file you have requested does not exist.'

@NielsRogge
Copy link
Author

@vinayakk1094 hi, all tutorials can be found here (both for LayoutLM and LayoutLMv2): https://github.com/NielsRogge/Transformers-Tutorials

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants