Evaluate LLM models like llama/alpaca using evaluate library? 

Hi team, thanks for open source this awesome tool. I am new to the tool and try to ask some questions on LLM evaluation

1. Seems `evaluate` already create some evaluators (Some libs call it tasks I think). Can we use these evaluator for LLM evaluation? 
2. I feel different tasks required different datasets. for LLM evaluation, there're popular datasets like MMLU. I am trying to ask Is there tested paring?  for example, for QA, I can use dataset1, dataset2 for metric1, metric2 evaluation etc
3. What's the difference between huggingface/evaluate and https://github.com/EleutherAI/lm-evaluation-harness? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate LLM models like llama/alpaca using evaluate library? #433

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Evaluate LLM models like llama/alpaca using evaluate library? #433

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions