OpenCompass v0.2.2.rc1
Pre-release
Pre-release
Provide with more parsed datasets:
OpenCompassData-core-20240207.zip
OpenCompassData-complete-20240207.zip
Important updates compared to previous version are as follow:
- Subjective: Add AlignBench, MTBench
- Agent: Add T-Eval
- Medicine: Add MedBench
- Code: Add HumanEval-X, DS-1000
- Finance: Add FinanceIQ
- Law: Update LawBench Evaluation Assets
OpenCompassData-core-20240207.zip
AGIEval | ARC | BBH | ceval | CLUE | cmmlu |
commonsenseqa | drop | FewCLUE | flores_first100 | GAOKAO-BENCH | gsm8k |
hellaswag | humaneval | lambada | LCSTS | math | mbpp |
mmlu | nq | openbookqa | piqa | race | siqa |
strategyqa | summedits | SuperGLUE | TheoremQA | triviaqa | tydiqa |
winogrande | xstory_cloze | Xsum |
OpenCompassData-complete-20240207.zip
AGIEval | anli | ARC | BBH | CDME | ceval |
cibench_dataset | cleva | clozeTest-maxmin | CLUE | CMB | cmmlu |
commonsenseqa | commonsenseqa_cn | crowspairs_cn | drop | ds1000_data | FewCLUE |
FinanceIQ | flores200_dataset | flores_first100 | FunctionalMT | game24 | GAOKAO-BENCH |
gpqa | gsm8k | hellaswag | humaneval | humaneval_cn | humaneval_multipl-e |
humanevalx | HungarianExamMath | InfiniteBench | lambada | lanQ | lawbench |
LCSTS | math | math401 | mbpp | mbpp_cn | mbpp_plus |
MedBench | mmlu | MNIST | NPHardEval | nq | nq_cn |
nq-open | openbookqa | piqa | py150 | qabench | race |
scibench | siqa | SQuAD2.0 | strategyqa | alignment_bench | mtbench |
summedits | SuperGLUE | svamp | teval | TheoremQA | triviaqa |
tydiqa | winogrande | xiezhi | xlsum | xstory_cloze | Xsum |