-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finetuning on Due-Benchmark #71
Comments
I think the main thing to focus on is the prompt. Finetuning from different prompts affect the performance. Properly adding the 2D and 1D position is also important. Anything missing could result in a performance drop. |
Thank you for the quick reply. So is it not possible to get the results reported in the paper by running the published code without any changes? What is the exact prompt used for DocVQA? The prompt used in RVL-CDIP code is different than what is mentioned in the paper. So I am not sure if prompt used for training DocVQA is also the same from the paper. It would be really helpful if you can provide all the details that are required to obtain the results reported in the paper. |
The prompt should be the same as in the paper with "question answering on DocVQA. [question]. [context]". |
I used the same prompt as above. The modifications I made are after here as follows:
I used this script for finetuning. The training always stop around 4 epochs due to early stopping criteria. |
I was using the Unimodal 224. However, from the paper, the performance of the various models vary only between [+2, -2] at the maximum. Anyway, I will try the other models as well. Thanks for the input. |
Hi, I tried the other two variants (512 and dual) as well. These models also did not result in any significant improvement. So far the best score obtained on DocVQA task in due-benchmark is 76.29 with the 512 resolution model. |
Could you please provide the following details?
|
Btw, which checkpoint you used for evaluation, the one with lowest validation loss or the last checkpoint. I am asking because usually loss is not a good reflector of language score and we usually use the last checkpoint. |
I used the last checkpoint ( |
what are the resource requirements in order to finetune on DocVQA task? |
Hi,
I have been trying to finetune the model on due-benchmark using the provided script. However, the performance is quite low compared to the reported numbers. For example, DocVQA results in an ANLS score of 75 instead of the reported 84. I have two main queries.
The text was updated successfully, but these errors were encountered: