Release v1.0.0 · modelscope/evalscope

新版本

版本 1.0 对评测框架进行了重大重构，在 evalscope/api 下建立了全新的、更模块化且易扩展的 API 层。主要改进包括：为基准、样本和结果引入了标准化数据模型；对基准和指标等组件采用注册表式设计；并重写了核心评测器以协同新架构。现有的基准已迁移到这一 API，实现更加简洁、一致且易于维护。

不兼容的更新请参考。

New version

Version 1.0 introduces a major overhaul of the evaluation framework, establishing a new, more modular and extensible API layer under evalscope/api. Key improvements include standardized data models for benchmarks, samples, and results; a registry-based design for components such as benchmarks and metrics; and a rewritten core evaluator that orchestrates the new architecture. Existing benchmark adapters have been migrated to this API, resulting in cleaner, more consistent, and easier-to-maintain implementations.

What's Changed

[Feature] Add image edit evaluation by @Yunnglin in #725
[Doc] add tau-bench doc by @Yunnglin in #730
[Fix] ragas local model by @Yunnglin in #732
[Doc] Add qwen-code best practice doc by @Yunnglin in #734
Fix: Incorrect keyword argument in call to csv_to_list() by @Zhuzhenghao in #745
Add SECURITY.md by @wangxingjun778 in #750
Update SECURITY.md by @wangxingjun778 in #752
updata faq file by @mushenL in #744
[Refactor] v1.0 by @Yunnglin in #739

New Contributors

@Zhuzhenghao made their first contribution in #745

Full Changelog: v0.17.1...v1.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.0.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

新版本

New version

What's Changed

New Contributors

Contributors

Uh oh!