New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weird output #31
Comments
Sorry for the delay, but can do let me know, from which layer did you extract the output? Regards, |
Hi @uakarsh Thanks for your response. I tried below code config = { fp = "img.jpeg" tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased") feature_extractor = modeling.ExtractFeatures(config) then I visualized the output. |
HI, Actually, we know that the output is (512, 768), now, this output results from the attention of three different entities:
Now, when we perform any downstream task, we have an encoded version of these three modalities, so the diagram (which you have plotted) would be helpful for the model to know, which encoding to attend to when performing the downstream task. The same can be seen in Pg No. 15, Figure 11. B of DocFormer Paper. Hope it helps |
Thanks for your info. How can I do entity level classification like in FUNSD dataset? |
I have almost finished the training script for RVL-CDIP (Document Classification), and have started working on FUNSD for token classification. You can visit my cloned repo (https://github.com/uakarsh/docformer/tree/master/examples/docformer_pl), and in the examples/docformer_pl, you can get the
Would update you soon!! |
@uakarsh Hello, Any update on NER with FUNSD using docformer? |
Hi
I ran the code, it is giving me final output that is too weird irrespective of changing the image. I am attaching it. Can you explain what it is?
Thanks
The text was updated successfully, but these errors were encountered: