Weird output #31

kmr2017 · 2022-05-05T15:34:40Z

Hi
I ran the code, it is giving me final output that is too weird irrespective of changing the image. I am attaching it. Can you explain what it is?

Thanks

uakarsh · 2022-05-17T06:14:24Z

Sorry for the delay, but can do let me know, from which layer did you extract the output?

Regards,

kmr2017 · 2022-05-18T10:32:42Z

Hi @uakarsh

Thanks for your response.

I tried below code

config = {
"coordinate_size": 96,
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"image_feature_pool_shape": [7, 7, 256],
"intermediate_ff_size_factor": 4,
"max_2d_position_embeddings": 1000,
"max_position_embeddings": 512,
"max_relative_positions": 8,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"shape_size": 96,
"vocab_size": 30522,
"layer_norm_eps": 1e-12,
}

fp = "img.jpeg"

tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")
encoding = dataset.create_features(fp, tokenizer, add_batch_dim=True)

feature_extractor = modeling.ExtractFeatures(config)
docformer = modeling.DocFormerEncoder(config)
v_bar, t_bar, v_bar_s, t_bar_s = feature_extractor(encoding)
output = docformer(v_bar, t_bar, v_bar_s, t_bar_s) # shape (1, 512, 768)

then I visualized the output.

uakarsh · 2022-05-27T18:10:01Z

HI,

Actually, we know that the output is (512, 768), now, this output results from the attention of three different entities:

Image feature of (512, 768)
Language Feature of (512, 768)
Spatial Dimension of (512, 768)

Now, when we perform any downstream task, we have an encoded version of these three modalities, so the diagram (which you have plotted) would be helpful for the model to know, which encoding to attend to when performing the downstream task.

The same can be seen in Pg No. 15, Figure 11. B of DocFormer Paper. Hope it helps

kmr2017 · 2022-06-07T11:08:53Z

Thanks for your info. How can I do entity level classification like in FUNSD dataset?

kmr2017 · 2022-06-07T11:09:00Z

@uakarsh

uakarsh · 2022-06-07T11:15:09Z

I have almost finished the training script for RVL-CDIP (Document Classification), and have started working on FUNSD for token classification.

You can visit my cloned repo (https://github.com/uakarsh/docformer/tree/master/examples/docformer_pl), and in the examples/docformer_pl, you can get the

Data visualizing
Dataset making
MLM with Pytorch Lightning
Document Classification with DocFormer (would be uploaded soon)
And next would be NER with FUNSD.

Would update you soon!!

BakingBrains · 2022-06-22T06:13:49Z

@uakarsh Hello,

Any update on NER with FUNSD using docformer?

uakarsh closed this as completed Jun 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird output #31

Weird output #31

kmr2017 commented May 5, 2022 •

edited

uakarsh commented May 17, 2022

kmr2017 commented May 18, 2022 •

edited

uakarsh commented May 27, 2022

kmr2017 commented Jun 7, 2022

kmr2017 commented Jun 7, 2022

uakarsh commented Jun 7, 2022 •

edited

BakingBrains commented Jun 22, 2022

Weird output #31

Weird output #31

Comments

kmr2017 commented May 5, 2022 • edited

uakarsh commented May 17, 2022

kmr2017 commented May 18, 2022 • edited

uakarsh commented May 27, 2022

kmr2017 commented Jun 7, 2022

kmr2017 commented Jun 7, 2022

uakarsh commented Jun 7, 2022 • edited

BakingBrains commented Jun 22, 2022

kmr2017 commented May 5, 2022 •

edited

kmr2017 commented May 18, 2022 •

edited

uakarsh commented Jun 7, 2022 •

edited