Skip to content

v0.27.0

Choose a tag to compare

@spboyer spboyer released this 21 Apr 18:29
· 98 commits to main since this release
7fc7f07

What's New in v0.27.0

New Features

  • tool_calls grader (#187) — Validate which tools the agent called during execution. Supports required_tools, forbidden_tools, min_calls, and max_calls constraints with partial scoring. (@JasonYeMSFT)

  • output_contains_any expectation (#137) — New YAML field that passes if ANY of the listed strings appear in output (OR logic), complementing the existing output_contains (AND logic) and output_not_contains. (@LarryOsterman)

  • max_response_time_ms behavior rule (#136) — Enforce response time limits on eval tasks. Fails the behavior check if execution exceeds the configured threshold. (@LarryOsterman)

  • prompt_file for task prompts (#157) — Load task prompts from external files instead of inline YAML. Supports prompt_file: path/to/prompt.md with path traversal protection. (@LarryOsterman)

Bug Fixes

  • Windows CI fix (#204) — Webserver test now skips gracefully when frontend assets aren't built, fixing the persistent windows-latest CI failure that blocked all PRs today.
  • Cross-platform test fix — Absolute path test in suggest package uses runtime.GOOS for Windows compatibility.

Documentation

All 4 new features include updated docs:

  • Graders guide (graders.mdx) — tool_calls section added
  • Eval YAML guide (eval-yaml.mdx) — output_contains_any, max_response_time_ms, prompt_file documented
  • Schema reference (schema.mdx) — All new fields added

Full Changelog: v0.26.0...v0.27.0