Unable to reproduce InstuctBLIP Vicuna GQA and TextVQA Results #575

suraj-nair-tri · 2023-11-05T17:49:23Z

Hello,

I am trying to reproduce the InstructBLIP paper's results on GQA and TextVQA. Using both the HuggingFace and the LAVIS versions of the models, I am consistently getting 5-10% below the reported numbers in the table. Specifically, InstructBLIP Vicuna 7B is 1% worse on GQA, and 10% worse on TextVQA, while InstructBLIP Vicuna 13B is 6% worse on GQA and also 10% worse on Text VQA.

I have made sure to match the prompting strategy described in Appendix E of the paper. I have also tried a number of decoding strategies (number of beams, sampling hyper-parameters) and these only change performance by 1-2% total. The dataset loading and scoring has been evaluated on other open source models which reproduce (e.g. LLaVA).

Do you have any suggestions on missing hyperparameters that could cause this. Or even better would be a script that reproduces the papers numbers for InstructBLIP, I currently cannot find such a script in LAVIS.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to reproduce InstuctBLIP Vicuna GQA and TextVQA Results #575

Unable to reproduce InstuctBLIP Vicuna GQA and TextVQA Results #575

suraj-nair-tri commented Nov 5, 2023

Unable to reproduce InstuctBLIP Vicuna GQA and TextVQA Results #575

Unable to reproduce InstuctBLIP Vicuna GQA and TextVQA Results #575

Comments

suraj-nair-tri commented Nov 5, 2023