v0.27.0
What's New in v0.27.0
New Features
-
tool_callsgrader (#187) — Validate which tools the agent called during execution. Supportsrequired_tools,forbidden_tools,min_calls, andmax_callsconstraints with partial scoring. (@JasonYeMSFT) -
output_contains_anyexpectation (#137) — New YAML field that passes if ANY of the listed strings appear in output (OR logic), complementing the existingoutput_contains(AND logic) andoutput_not_contains. (@LarryOsterman) -
max_response_time_msbehavior rule (#136) — Enforce response time limits on eval tasks. Fails the behavior check if execution exceeds the configured threshold. (@LarryOsterman) -
prompt_filefor task prompts (#157) — Load task prompts from external files instead of inline YAML. Supportsprompt_file: path/to/prompt.mdwith path traversal protection. (@LarryOsterman)
Bug Fixes
- Windows CI fix (#204) — Webserver test now skips gracefully when frontend assets aren't built, fixing the persistent
windows-latestCI failure that blocked all PRs today. - Cross-platform test fix — Absolute path test in suggest package uses
runtime.GOOSfor Windows compatibility.
Documentation
All 4 new features include updated docs:
- Graders guide (
graders.mdx) —tool_callssection added - Eval YAML guide (
eval-yaml.mdx) —output_contains_any,max_response_time_ms,prompt_filedocumented - Schema reference (
schema.mdx) — All new fields added
Full Changelog: v0.26.0...v0.27.0