Skip to content

v1.6.1

Choose a tag to compare

@Yunnglin Yunnglin released this 24 Apr 04:45

中文版

基准测试数据集

  • 新增 TIR-Bench 基准测试

功能增强

  • Tokenize Prompt: 新增 tokenize prompt 开关,支持灵活控制 prompt 的 tokenize 行为
  • 多轮性能测试: 新增多轮对话性能测试 (multi turn perf) 支持
  • 自定义多轮性能测试: 新增自定义多轮性能测试 (custom multi_turn perf) 能力
  • 评测集成性能测试: 在评测流程中集成性能测试 (perf in eval)
  • 投机解码指标: 新增投机解码 (speculative decoding) 性能指标

问题修复

  • 修复加载默认本地数据集的问题
  • 修复 tokenize-prompt 长度语义问题
  • 修复 tokenize 模板问题
  • 更新 plot CDN 地址,避免网络加速后访问异常

English Version

Benchmark Datasets

  • Added TIR-Bench benchmark

Feature Enhancements

  • Tokenize Prompt: Added tokenize prompt switch for flexible prompt tokenization control
  • Percentile Metrics: Added support for P50, P90 percentile statistics
  • Multi-turn Performance: Added multi-turn conversation performance testing (multi turn perf)
  • Custom Multi-turn Performance: Added custom multi-turn performance testing (custom multi_turn perf)
  • Perf in Evaluation: Integrated performance testing in evaluation workflow (perf in eval)
  • Speculative Metrics: Added speculative decoding performance metrics

Bug Fixes

  • Fixed loading default local dataset issue
  • Fixed tokenize-prompt length semantics issue
  • Fixed tokenize template issue
  • Updated plot CDN address to avoid access issues after network acceleration

What's Changed

New Contributors

Full Changelog: v1.6.0...v1.6.1