performance on the VizWiz dataset #20

qwqwq1445 · 2024-01-17T09:13:38Z

I load your pretrained model weights and utilize your default parameters to conduct an evaluation on the VizWiz dataset. However, the performance I get without any prompt template is around 28.00. This result is far from the results in your paper. Could you tell me what's wrong here? Maybe the default parameters?

gordonhu608 · 2024-01-17T09:21:59Z

The default prompt for Vizwiz is "Question: {} Short answer: ". If this still didn't give satisfying performance. Then there could be other problems. Btw, another popular prompt which is employed by LLaVA is "When the provided information is insufficient, respond with ‘Unanswerable’. Answer the question using a single word or phrase". Although we never tried this, it could possibily leads to better performance.

qwqwq1445 · 2024-01-17T10:52:08Z

The default prompt for Vizwiz is "Question: {} Short answer: ". If this still didn't give satisfying performance. Then there could be other problems. Btw, another popular prompt which is employed by LLaVA is "When the provided information is insufficient, respond with ‘Unanswerable’. Answer the question using a single word or phrase". Although we never tried this, it could possibily leads to better performance.

THANKS for your apply! Have you ever tried to finetune BLIVA on a single dataset? If doing so, shall we use the prompt pool for this single dataset? And do you have any recommended hyperparameters for down-stream fine-tuning?

gordonhu608 · 2024-01-17T23:50:22Z

No, we didn't finetune on any specific task. But some suggestions are 1) try both keep the same prompt for all questions and various prompts, compare which one is better. 2) for example learning rate can usually start at 2e-5 or 1e-5 and again check which one is suitable for your training setting.

qwqwq1445 · 2024-01-18T02:02:15Z

No, we didn't finetune on any specific task. But some suggestions are 1) try both keep the same prompt for all questions and various prompts, compare which one is better. 2) for example learning rate can usually start at 2e-5 or 1e-5 and again check which one is suitable for your training setting.

It seems that fine-tuning BLIVA is similar to fine-tuning BLIP2, maybe I can just use the hyperparameters provided by BLIP2? By the way, in zero-shot inference VQA of BLIP2, they said that they "set the length-penalty to -1". But your default length-penalty is 0. Do you have any experience with this issue?

qwqwq1445 · 2024-01-18T02:06:25Z

There is two stages for pretraining BLIVA. But I can't find many details about the first stage in your paper. I wonder if you use the pretrained weights of BLIP2 or InstructBLIP2 as your stage1 model weights?

gordonhu608 · 2024-01-18T23:04:59Z

It's the InstructBLIP2 weight as our stage1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance on the VizWiz dataset #20

performance on the VizWiz dataset #20

qwqwq1445 commented Jan 17, 2024

gordonhu608 commented Jan 17, 2024

qwqwq1445 commented Jan 17, 2024

gordonhu608 commented Jan 17, 2024

qwqwq1445 commented Jan 18, 2024

qwqwq1445 commented Jan 18, 2024

gordonhu608 commented Jan 18, 2024

performance on the VizWiz dataset #20

performance on the VizWiz dataset #20

Comments

qwqwq1445 commented Jan 17, 2024

gordonhu608 commented Jan 17, 2024

qwqwq1445 commented Jan 17, 2024

gordonhu608 commented Jan 17, 2024

qwqwq1445 commented Jan 18, 2024

qwqwq1445 commented Jan 18, 2024

gordonhu608 commented Jan 18, 2024