Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce of checkpoints #15

Closed
kangzhao2 opened this issue Mar 24, 2022 · 1 comment
Closed

Reproduce of checkpoints #15

kangzhao2 opened this issue Mar 24, 2022 · 1 comment

Comments

@kangzhao2
Copy link

kangzhao2 commented Mar 24, 2022

Dear authors:

I download the checkpoints Model checkpoints (~17G) and evaluate the model using the following code:

python tools/run.py --tasks vqa --datasets m4c_textvqa --model m4c_split --config configs/vqa/m4c_textvqa/tap_refine.yml --save_dir save/m4c_split_refine_test --run_type val --resume_file save/finetuned/textvqa_tap_base_best.ckpt

I got the following results:

2022-03-24T11:13:42 INFO: m4c_textvqa: full val:, 41000/24000, val/total_loss: 7.9873, val/m4c_textvqa/m4c_decoding_bce_with_mask: 7.9873, val/m4c_textvqa/textvqa_accuracy: 0.4413

And I found an error prompt during the evaluation:

Token indices sequence length is longer than the specified maximum sequence length for this model (599 > 512). Running this sequence through the model will result in indexing errors

In my opinion, the accuracy should be 0.4991 as shown in the following table:
截屏2022-03-24 11 15 53

What's wrong with my operations? Is there something to do with the error I encounter?

By the way, when I use the OCR-CC checkpoints: save/finetuned/textvqa_tap_ocrcc_best.ckpt, the accuracy is 0.4934 (which should be 0.5471), and I found the same error as mentioned above.

The GPU and PyTorch version is as following:

2022-03-24T11:09:34 INFO: CUDA Device 0 is: Tesla V100-SXM2-16GB
2022-03-24T11:09:37 INFO: Torch version is: 1.4.0

Hope to get your response

Thanks

@zyang-ur
Copy link
Contributor

Hi @kangzhao2 ,

The "sequence length" warning is related to the tokenizer, which we also observed and should not influence the results.

I'm not sure why is that, we use the same GPU/Pytorch setting. One random guess is the used OCR feature? As I remember a ~4% improvement by using MSOCR.

Meanwhile, please feel free to provide more guesses/observations related to this. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants