-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance on the VizWiz dataset #20
Comments
The default prompt for Vizwiz is "Question: {} Short answer: ". If this still didn't give satisfying performance. Then there could be other problems. Btw, another popular prompt which is employed by LLaVA is "When the provided information is insufficient, respond with ‘Unanswerable’. Answer the question using a single word or phrase". Although we never tried this, it could possibily leads to better performance. |
THANKS for your apply! Have you ever tried to finetune BLIVA on a single dataset? If doing so, shall we use the prompt pool for this single dataset? And do you have any recommended hyperparameters for down-stream fine-tuning? |
No, we didn't finetune on any specific task. But some suggestions are 1) try both keep the same prompt for all questions and various prompts, compare which one is better. 2) for example learning rate can usually start at 2e-5 or 1e-5 and again check which one is suitable for your training setting. |
It seems that fine-tuning BLIVA is similar to fine-tuning BLIP2, maybe I can just use the hyperparameters provided by BLIP2? By the way, in zero-shot inference VQA of BLIP2, they said that they "set the length-penalty to -1". But your default length-penalty is 0. Do you have any experience with this issue? |
There is two stages for pretraining BLIVA. But I can't find many details about the first stage in your paper. I wonder if you use the pretrained weights of BLIP2 or InstructBLIP2 as your stage1 model weights? |
It's the InstructBLIP2 weight as our stage1. |
I load your pretrained model weights and utilize your default parameters to conduct an evaluation on the VizWiz dataset. However, the performance I get without any prompt template is around 28.00. This result is far from the results in your paper. Could you tell me what's wrong here? Maybe the default parameters?
The text was updated successfully, but these errors were encountered: