Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance on the VizWiz dataset #20

Open
qwqwq1445 opened this issue Jan 17, 2024 · 6 comments
Open

performance on the VizWiz dataset #20

qwqwq1445 opened this issue Jan 17, 2024 · 6 comments

Comments

@qwqwq1445
Copy link

I load your pretrained model weights and utilize your default parameters to conduct an evaluation on the VizWiz dataset. However, the performance I get without any prompt template is around 28.00. This result is far from the results in your paper. Could you tell me what's wrong here? Maybe the default parameters?

@gordonhu608
Copy link
Collaborator

The default prompt for Vizwiz is "Question: {} Short answer: ". If this still didn't give satisfying performance. Then there could be other problems. Btw, another popular prompt which is employed by LLaVA is "When the provided information is insufficient, respond with ‘Unanswerable’. Answer the question using a single word or phrase". Although we never tried this, it could possibily leads to better performance.

@qwqwq1445
Copy link
Author

The default prompt for Vizwiz is "Question: {} Short answer: ". If this still didn't give satisfying performance. Then there could be other problems. Btw, another popular prompt which is employed by LLaVA is "When the provided information is insufficient, respond with ‘Unanswerable’. Answer the question using a single word or phrase". Although we never tried this, it could possibily leads to better performance.

THANKS for your apply! Have you ever tried to finetune BLIVA on a single dataset? If doing so, shall we use the prompt pool for this single dataset? And do you have any recommended hyperparameters for down-stream fine-tuning?

@gordonhu608
Copy link
Collaborator

No, we didn't finetune on any specific task. But some suggestions are 1) try both keep the same prompt for all questions and various prompts, compare which one is better. 2) for example learning rate can usually start at 2e-5 or 1e-5 and again check which one is suitable for your training setting.

@qwqwq1445
Copy link
Author

No, we didn't finetune on any specific task. But some suggestions are 1) try both keep the same prompt for all questions and various prompts, compare which one is better. 2) for example learning rate can usually start at 2e-5 or 1e-5 and again check which one is suitable for your training setting.

It seems that fine-tuning BLIVA is similar to fine-tuning BLIP2, maybe I can just use the hyperparameters provided by BLIP2? By the way, in zero-shot inference VQA of BLIP2, they said that they "set the length-penalty to -1". But your default length-penalty is 0. Do you have any experience with this issue?

@qwqwq1445
Copy link
Author

There is two stages for pretraining BLIVA. But I can't find many details about the first stage in your paper. I wonder if you use the pretrained weights of BLIP2 or InstructBLIP2 as your stage1 model weights?

@gordonhu608
Copy link
Collaborator

It's the InstructBLIP2 weight as our stage1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants