[LayoutLM] How to reproduce FUNSD result #134

nv-quan · 2020-05-14T10:35:32Z

Hello,
I have run fine tuning for the Sequence Labeling Task with FUNSD dataset but my I couldn't achieve the result presented in the paper (precision is only 40%), here are some scripts and log that I used, any idea about what could be wrong?
Thank you very much.
Training:

#!/bin/bash

python run_seq_labeling.py  --data_dir ~/mnt/data \
                            --model_type layoutlm \
                            --model_name_or_path ~/mnt/model \
                            --do_lower_case \
                            --max_seq_length 512 \
                            --do_train \
                            --num_train_epochs 100.0 \
                            --logging_steps 10 \
                            --save_steps -1 \
                            --output_dir ~/mnt/output \
                            --labels ~/mnt/data/labels.txt \
                            --per_gpu_train_batch_size 16 \
                            --fp16

Testing:

#!/bin/bash

python run_seq_labeling.py --do_predict\
  --model_type layoutlm\
  --model_name_or_path ~/mnt/model\
  --data_dir ~/mnt/data\
  --output_dir ~/mnt/output\
  --labels ~/mnt/data/labels.txt

Some log:

05/14/2020 09:40:45 - INFO - __main__ -   ***** Running training *****
05/14/2020 09:40:45 - INFO - __main__ -     Num examples = 150
05/14/2020 09:40:45 - INFO - __main__ -     Num Epochs = 100
05/14/2020 09:40:45 - INFO - __main__ -     Instantaneous batch size per GPU = 16
05/14/2020 09:40:45 - INFO - __main__ -     Total train batch size (w. parallel, distributed & accumulation) = 16
05/14/2020 09:40:45 - INFO - __main__ -     Gradient Accumulation steps = 1
05/14/2020 09:40:45 - INFO - __main__ -     Total optimization steps = 1000
05/14/2020 09:53:00 - INFO - __main__ -    global_step = 1000, average loss = 0.10387736940692412

05/14/2020 10:17:07 - INFO - __main__ -   ***** Running evaluation  *****
05/14/2020 10:17:07 - INFO - __main__ -     Num examples = 52
05/14/2020 10:17:07 - INFO - __main__ -     Batch size = 8
05/14/2020 10:17:07 - INFO - __main__ -   
           precision    recall  f1-score   support

 QUESTION       0.41      0.70      0.52       771
   HEADER       0.00      0.00      0.00       108
   ANSWER       0.39      0.50      0.44       513

micro avg       0.40      0.57      0.47      1392
macro avg       0.37      0.57      0.45      1392

05/14/2020 10:17:07 - INFO - __main__ -   ***** Eval results  *****
05/14/2020 10:17:07 - INFO - __main__ -     f1 = 0.472115668338743
05/14/2020 10:17:07 - INFO - __main__ -     loss = 2.9291565077645436
05/14/2020 10:17:07 - INFO - __main__ -     precision = 0.400600901352028
05/14/2020 10:17:07 - INFO - __main__ -     recall = 0.5747126436781609

The text was updated successfully, but these errors were encountered:

ranpox · 2020-05-14T10:59:46Z

Hi @nv-quan ,
Could you provide your preprocessing command? It seems that the support number in your classification report is incorrect. If the max sequence length is 512, the total number of each entity should be

	support
QUESTION	1071
ANSWER	809
HEADER	119
micro avg	1999
macro avg	1999

nv-quan · 2020-05-14T11:02:52Z

Thank you, here are my preprocessing scripts:
Training:

#!/bin/bash

python scripts/funsd_preprocess.py --data_dir ~/mnt/data/training_data/annotations/\
  --data_split train\
  --output_dir ~/mnt/data\
  --model_name_or_path ~/mnt/model\

cat ~/mnt/data/train.txt | cut -d$'\t' -f 2 | grep -v "^$"| sort | uniq > ~/mnt/data/labels.txt

Testing:

#!/bin/bash

python scripts/funsd_preprocess.py --data_dir ~/mnt/data/testing_data/annotations/\
  --data_split test\
  --output_dir ~/mnt/data\
  --model_name_or_path ~/mnt/model

cat ~/mnt/data/test.txt | cut -d$'\t' -f 2 | grep -v "^$"| sort | uniq > ~/mnt/data/labels.txt

nv-quan · 2020-05-14T11:04:42Z

Also, I can see a lot of "WARNING maximum sequence length exceeded: No prediction for" in the log, is that normal?

ranpox · 2020-05-14T11:37:57Z

I don't think so. The documents longer than 512 should be split into chunks to fit the max sequence length. So these warnings are abnormal. I can correctly generate data with the preprocessing commands you provided. Please check if the commands have been correctly executed.

marythomaa98 · 2020-05-20T10:11:53Z

Hi @nv-quan where you able to resolve this issue?

nv-quan · 2020-05-20T10:20:00Z

@marythomaa98 not yet, I was kind of busy so I didn't look at it yet, but I'll try to fix this bug tomorrow.

marythomaa98 · 2020-05-20T12:15:42Z

@nv-quan okay sure! Do let me know if it works out, I am getting the same support number as you.

nv-quan · 2020-05-21T09:07:19Z

@marythomaa98 The preprocessing is totally fine, but for some reason there are less prediction labels than input

wolfshow · 2020-05-21T09:09:20Z

@nv-quan The dataset contains empty text but with non-empty labels. I think you may need to remove them.

nv-quan · 2020-05-21T09:18:27Z

@wolfshow I'm comparing 2 files output/test_predictions.txt and data/test.txt, everything seems ok until line 181, the test data is still continue for that example_id while in test_predictions it prints '\n' (end of example_id). And the text in the testing data is not empty at all.

ranpox · 2020-05-21T09:42:22Z

Hi @nv-quan ,
It seems that you didn't set max_seq_length during the evaluating stage. So please add --max_seq_length 512 to your testing command and try again.

nv-quan · 2020-05-21T10:08:19Z

@ranpox thank you, now the number of support is correct but the result is still off:

f1 = 0.4204204204204205
loss = 3.160606418337141
precision = 0.3364373685791529
recall = 0.560280140070035

marythomaa98 · 2020-05-21T10:30:11Z

Hi @nv-quan on adding --do_lower_case and --fp16 works for me

nv-quan · 2020-05-21T10:45:43Z

@marythomaa98 thanks a lot, it works when I add --do_lower_case to my test script. And also remove the data/cached_test_model_512

elnazsn1988 · 2020-08-04T23:15:02Z

@marythomaa98 @nv-quan @ranpox can you paste your final command here to predict? I am having a bit of trouble understanding where I place my test input, where i place test output, and where the trained model sits.

Also @ranpox I had to set the max sequence length to 128 or the cuda would run out of memory, is that an issue?

james-griffin-deepsee · 2021-07-21T19:13:27Z

@wolfshow I'm comparing 2 files output/test_predictions.txt and data/test.txt, everything seems ok until line 181, the test data is still continue for that example_id while in test_predictions it prints '\n' (end of example_id). And the text in the testing data is not empty at all.

could you explain the difference between the different labels. I know the difference of Answer vs Header vs Question vs Other. but what does B-ANSWER vs E-ANSWER vs I-ANSWER vs S-ANSWER mean??

nv-quan · 2021-07-22T01:40:36Z

@wolfshow I'm comparing 2 files output/test_predictions.txt and data/test.txt, everything seems ok until line 181, the test data is still continue for that example_id while in test_predictions it prints '\n' (end of example_id). And the text in the testing data is not empty at all.

could you explain the difference between the different labels. I know the difference of Answer vs Header vs Question vs Other. but what does B-ANSWER vs E-ANSWER vs I-ANSWER vs S-ANSWER mean??

As far as I understand, B is beginning, E is end I is in-between (or something similar), S is single.

wolfshow closed this as completed May 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LayoutLM] How to reproduce FUNSD result #134

[LayoutLM] How to reproduce FUNSD result #134

nv-quan commented May 14, 2020 •

edited

ranpox commented May 14, 2020 •

edited

nv-quan commented May 14, 2020

nv-quan commented May 14, 2020 •

edited

ranpox commented May 14, 2020

marythomaa98 commented May 20, 2020

nv-quan commented May 20, 2020

marythomaa98 commented May 20, 2020

nv-quan commented May 21, 2020

wolfshow commented May 21, 2020

nv-quan commented May 21, 2020 •

edited

ranpox commented May 21, 2020

nv-quan commented May 21, 2020

marythomaa98 commented May 21, 2020

nv-quan commented May 21, 2020 •

edited

elnazsn1988 commented Aug 4, 2020 •

edited

james-griffin-deepsee commented Jul 21, 2021

nv-quan commented Jul 22, 2021

[LayoutLM] How to reproduce FUNSD result #134

[LayoutLM] How to reproduce FUNSD result #134

Comments

nv-quan commented May 14, 2020 • edited

ranpox commented May 14, 2020 • edited

nv-quan commented May 14, 2020

nv-quan commented May 14, 2020 • edited

ranpox commented May 14, 2020

marythomaa98 commented May 20, 2020

nv-quan commented May 20, 2020

marythomaa98 commented May 20, 2020

nv-quan commented May 21, 2020

wolfshow commented May 21, 2020

nv-quan commented May 21, 2020 • edited

ranpox commented May 21, 2020

nv-quan commented May 21, 2020

marythomaa98 commented May 21, 2020

nv-quan commented May 21, 2020 • edited

elnazsn1988 commented Aug 4, 2020 • edited

james-griffin-deepsee commented Jul 21, 2021

nv-quan commented Jul 22, 2021

nv-quan commented May 14, 2020 •

edited

ranpox commented May 14, 2020 •

edited

nv-quan commented May 14, 2020 •

edited

nv-quan commented May 21, 2020 •

edited

nv-quan commented May 21, 2020 •

edited

elnazsn1988 commented Aug 4, 2020 •

edited