Skip to content

hellaswag results are different between opencompass and lm-evaluation-harness #450

Closed Answered by kirliavc
nlpcat asked this question in Q&A
Discussion options

You must be logged in to vote

The problem is probably due to fewshot inference. The llm leaderboard used 10-shot hellaswag, but the opencompass used 0-shot. Fewshot request is only implemented in mmlu dataset.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by tonysy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants