Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to reproduce InstuctBLIP Vicuna GQA and TextVQA Results #575

Open
suraj-nair-tri opened this issue Nov 5, 2023 · 0 comments
Open

Comments

@suraj-nair-tri
Copy link

Hello,

I am trying to reproduce the InstructBLIP paper's results on GQA and TextVQA. Using both the HuggingFace and the LAVIS versions of the models, I am consistently getting 5-10% below the reported numbers in the table. Specifically, InstructBLIP Vicuna 7B is 1% worse on GQA, and 10% worse on TextVQA, while InstructBLIP Vicuna 13B is 6% worse on GQA and also 10% worse on Text VQA.

I have made sure to match the prompting strategy described in Appendix E of the paper. I have also tried a number of decoding strategies (number of beams, sampling hyper-parameters) and these only change performance by 1-2% total. The dataset loading and scoring has been evaluated on other open source models which reproduce (e.g. LLaVA).

Do you have any suggestions on missing hyperparameters that could cause this. Or even better would be a script that reproduces the papers numbers for InstructBLIP, I currently cannot find such a script in LAVIS.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant