Release v1.8.1 · modelscope/evalscope

中文版

RAG Evaluation: Refactored RAG Eval with MTEB 2.x, RAGAS 0.4.x, and Pydantic configs
SWE-Bench: Added support for custom DockerHub namespace for SWE-Bench images
Image Quality Evaluation: Added full-reference image quality metrics

Fixed SciCode assistant text block parsing
Fixed Terminal-Bench Docker CLI check before trials
Fixed forwarding reasoning_content as a top-level field in multi-turn conversations
Fixed missing perf dependencies in service optional-dependencies
Fixed truncation for texts exceeding max_seq_length in RAG API encoder/reranker
Fixed stdout whitespace preservation in agent bash tool to prevent patch corruption
Fixed possible Windows PermissionError when writing cache files

fix(scicode): read assistant text blocks by @he-yufeng in #1381
add seed_tts_eval benchmark, solve #1360 by @haoruilee in #1379
feat(benchmarks): add ACEBench support, fix #1025 by @haoruilee in #1386
fix(terminal_bench): check docker cli before trials by @Li-Bailiang in #1389
refactor(rag_eval): MTEB 2.x + RAGAS 0.4.x + Pydantic configs by @Yunnglin in #1383
add Maritime-OCR-Bench support by @K-zhy in #1388
fix(models): forward reasoning_content as top-level field in multi-turn by @Yunnglin in #1396
fix: include perf deps in service optional-dependencies by @Blackteaxx in #1398
Add caption benchmarks by @haoruilee in #1402
fix(rag): truncate texts exceeding max_seq_length in API encoder/reranker by @Yunnglin in #1407
fix(agent): preserve stdout whitespace in bash tool to prevent patch corruption by @Yunnglin in #1409
fix(cache): use persistent jsonl writer to avoid Windows PermissionError by @Yunnglin in #1410
feat: allow custom DockerHub namespace for SWE-Bench images by @Yunnglin in #1417
feat(metric): add full-reference image quality metrics by @haoruilee in #1412

Full Changelog: v1.8.0...v1.8.1