Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to evaluate SAIL? #8

Open
Luoyang144 opened this issue Jan 9, 2024 · 4 comments
Open

How to evaluate SAIL? #8

Luoyang144 opened this issue Jan 9, 2024 · 4 comments

Comments

@Luoyang144
Copy link

Hello, I am curious about how SAIL was evaluated, and was it evaluated using GPT4? Did all benchmark data be used for evaluation?

@Luoyang144 Luoyang144 changed the title How do you evaluate SAIL? How to evaluate SAIL? Jan 9, 2024
@luohongyin
Copy link
Owner

Hi, thanks for asking! Details about evaluation can be found in https://aclanthology.org/2023.findings-emnlp.242.pdf

@Luoyang144
Copy link
Author

Thanks for reply. In section 3.1, the tile is "Automatic Evaluation with GPT-4", but I didn't see the evaluation details. Have you evaluated the results of all the test data? This will require a significant amount of time (and money).

@luohongyin
Copy link
Owner

I see, GPT4 is only used for evaluating with Question-80.

More details can be found here but I believe they have upgraded many things since I used it.

@Luoyang144
Copy link
Author

Thanks for reply. I wonder if EM may cause misjudgment during evaluation? This situation seems unavoidable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants