hellaswag results are different between opencompass and lm-evaluation-harness #450

nlpcat · 2023-09-30T23:34:31Z

if we look at the hellaswag in Open LLM leaderboard and opencompass, llama-65b and llama-30b results are different

Oct 10, 2023

The problem is probably due to fewshot inference. The llm leaderboard used 10-shot hellaswag, but the opencompass used 0-shot. Fewshot request is only implemented in mmlu dataset.

View full answer

kirliavc · 2023-10-10T03:04:36Z

kirliavc
Oct 10, 2023

The problem is probably due to fewshot inference. The llm leaderboard used 10-shot hellaswag, but the opencompass used 0-shot. Fewshot request is only implemented in mmlu dataset.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hellaswag results are different between opencompass and lm-evaluation-harness #450

{{title}}

Replies: 1 comment

{{title}}

Select a reply

hellaswag results are different between opencompass and lm-evaluation-harness #450

nlpcat Sep 30, 2023

Replies: 1 comment

kirliavc Oct 10, 2023

nlpcat
Sep 30, 2023

kirliavc
Oct 10, 2023