v1.0.0
新版本
版本 1.0 对评测框架进行了重大重构,在 evalscope/api 下建立了全新的、更模块化且易扩展的 API 层。主要改进包括:为基准、样本和结果引入了标准化数据模型;对基准和指标等组件采用注册表式设计;并重写了核心评测器以协同新架构。现有的基准已迁移到这一 API,实现更加简洁、一致且易于维护。
不兼容的更新请参考。
New version
Version 1.0 introduces a major overhaul of the evaluation framework, establishing a new, more modular and extensible API layer under evalscope/api. Key improvements include standardized data models for benchmarks, samples, and results; a registry-based design for components such as benchmarks and metrics; and a rewritten core evaluator that orchestrates the new architecture. Existing benchmark adapters have been migrated to this API, resulting in cleaner, more consistent, and easier-to-maintain implementations.
What's Changed
- [Feature] Add image edit evaluation by @Yunnglin in #725
- [Doc] add tau-bench doc by @Yunnglin in #730
- [Fix] ragas local model by @Yunnglin in #732
- [Doc] Add qwen-code best practice doc by @Yunnglin in #734
- Fix: Incorrect keyword argument in call to csv_to_list() by @Zhuzhenghao in #745
- Add SECURITY.md by @wangxingjun778 in #750
- Update SECURITY.md by @wangxingjun778 in #752
- updata faq file by @mushenL in #744
- [Refactor] v1.0 by @Yunnglin in #739
New Contributors
- @Zhuzhenghao made their first contribution in #745
Full Changelog: v0.17.1...v1.0.0