Skip to content

v1.0.2

Choose a tag to compare

@Yunnglin Yunnglin released this 23 Sep 09:30

新增功能

  • 代码评测基准(HumanEval, LiveCodeBench)支持在沙箱环境中运行,要使用该功能需先安装ms-enclave
  • 新增支持RealWorldQA、AI2D、MMStar、MMBench、OmniBench等图文多模态评测基准,和Multi-IF、HealthBench、AMC等纯文本评测基准。

New Features

  • Code evaluation benchmarks (HumanEval, LiveCodeBench) now support execution in a sandbox environment. To utilize this feature, you must first install ms-enclave.
  • Added support for various image-text multimodal evaluation benchmarks such as RealWorldQA, AI2D, MMStar, MMBench, OmniBench, as well as pure text evaluation benchmarks like Multi-IF, HealthBench, and AMC.

What's Changed

New Contributors

Full Changelog: v1.0.1...v1.0.2