v0.27.0

spboyer released this 21 Apr 18:29

· 98 commits to main since this release

7fc7f07

What's New in v0.27.0

New Features

tool_calls grader (#187) — Validate which tools the agent called during execution. Supports required_tools, forbidden_tools, min_calls, and max_calls constraints with partial scoring. (@JasonYeMSFT)
output_contains_any expectation (#137) — New YAML field that passes if ANY of the listed strings appear in output (OR logic), complementing the existing output_contains (AND logic) and output_not_contains. (@LarryOsterman)
max_response_time_ms behavior rule (#136) — Enforce response time limits on eval tasks. Fails the behavior check if execution exceeds the configured threshold. (@LarryOsterman)
prompt_file for task prompts (#157) — Load task prompts from external files instead of inline YAML. Supports prompt_file: path/to/prompt.md with path traversal protection. (@LarryOsterman)

Bug Fixes

Windows CI fix (#204) — Webserver test now skips gracefully when frontend assets aren't built, fixing the persistent windows-latest CI failure that blocked all PRs today.
Cross-platform test fix — Absolute path test in suggest package uses runtime.GOOS for Windows compatibility.

Documentation

All 4 new features include updated docs:

Graders guide (graders.mdx) — tool_calls section added
Eval YAML guide (eval-yaml.mdx) — output_contains_any, max_response_time_ms, prompt_file documented
Schema reference (schema.mdx) — All new fields added

Full Changelog: v0.26.0...v0.27.0

Contributors

LarryOsterman and JasonYeMSFT

Assets 2