How to evaluate SAIL? #8

Luoyang144 · 2024-01-09T02:47:37Z

Hello, I am curious about how SAIL was evaluated, and was it evaluated using GPT4? Did all benchmark data be used for evaluation?

luohongyin · 2024-01-09T17:38:10Z

Hi, thanks for asking! Details about evaluation can be found in https://aclanthology.org/2023.findings-emnlp.242.pdf

Luoyang144 · 2024-01-10T03:07:41Z

Thanks for reply. In section 3.1, the tile is "Automatic Evaluation with GPT-4", but I didn't see the evaluation details. Have you evaluated the results of all the test data? This will require a significant amount of time (and money).

luohongyin · 2024-01-10T20:30:36Z

I see, GPT4 is only used for evaluating with Question-80.

More details can be found here but I believe they have upgraded many things since I used it.

Luoyang144 · 2024-01-12T02:51:35Z

Thanks for reply. I wonder if EM may cause misjudgment during evaluation? This situation seems unavoidable

Luoyang144 changed the title ~~How do you evaluate SAIL?~~ How to evaluate SAIL? Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to evaluate SAIL? #8

How to evaluate SAIL? #8

Luoyang144 commented Jan 9, 2024

luohongyin commented Jan 9, 2024

Luoyang144 commented Jan 10, 2024

luohongyin commented Jan 10, 2024

Luoyang144 commented Jan 12, 2024

How to evaluate SAIL? #8

How to evaluate SAIL? #8

Comments

Luoyang144 commented Jan 9, 2024

luohongyin commented Jan 9, 2024

Luoyang144 commented Jan 10, 2024

luohongyin commented Jan 10, 2024

Luoyang144 commented Jan 12, 2024