-
Notifications
You must be signed in to change notification settings - Fork 716
[Dataset] Add R-Bench (ICML 2025) #2091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Please update the lint |
I've updated it, can you help run CI? |
|
Hi, have you tried using OpenCompass to reproduce your reported performance? |
|
Also please check the pre-commit again. Thanks. |
Yes, we conducted the experiment using opencompass and reproduced the previous results on this pr. |
tonysy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
MaiziXiao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested. LGTM
* [Dataset] Add R-Bench (ICML 2025) * fixed lint * format rbench.py by isort * rbench fix * r-bench fix * update --------- Co-authored-by: leoyizhang <leoyizhang@tencent.com> Co-authored-by: Myhs-phz <demarcia2014@126.com> Co-authored-by: MaiziXiao <xxllcc1993@gmail.com>
R-Bench PR Description
Motivation
This PR adds support for the R-Bench dataset to OpenCompass. R-Bench is a graduate-level multi-disciplinary benchmark designed to evaluate complex reasoning capabilities of both language models (LLMs) and multimodal language models (MLLMs). By incorporating R-Bench into OpenCompass, we enable comprehensive evaluation of model performance on challenging reasoning tasks across 19 academic disciplines and over 100 subjects, available in both English and Chinese.
Modification
This PR adds the configuration file
opencompass/configs/datasets/R-Bench/R-Bench.mdwhich includes:The file follows the standard OpenCompass dataset documentation format, similar to other benchmark configurations like QuALITY.
Use cases
R-Bench can be used to:
Checklist
Before PR:
After PR: