GQA evaluation #3

wmk897 · 2024-02-08T19:19:36Z

Thank you for the great work! I wanted to reproduce evaluation on GQA, however, I am not sure how I can do that.

I am working with the 1000 samples of GQA that you provided with the code and used gpt-3.5-turbo-0613.
However, I got an accuracy of 33.2, which is more than 10% lower than the reported accuracy.
I used 'results/craft_tools/5_deduplicated_tool.csv' as a toolset and used the default configuration on retrieval_gqa_config.yaml.

Can you help me reproduce the results?
Also, if possible, can you provide the output of the code you got using gpt-3.5?

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GQA evaluation #3

GQA evaluation #3

wmk897 commented Feb 8, 2024

GQA evaluation #3

GQA evaluation #3

Comments

wmk897 commented Feb 8, 2024