Thank you for sharing your work on BLIVA! I have been running the provided code locally and evaluating bliva-vicuna on the VSR-Zeroshot-Test dataset. However, I obtained an accuracy of 0.5158 (539/1045) using a strict hard-matching strategy for answers.
To ensure my setup aligns with yours, could you clarify what prompt was used when evaluating accuracy on this dataset?