Skip to content

Consolidating the keyword and regex graders into a single text grader.#41

Merged
richardpark-msft merged 11 commits intomicrosoft:mainfrom
richardpark-msft:wz-update-graders
Mar 3, 2026
Merged

Consolidating the keyword and regex graders into a single text grader.#41
richardpark-msft merged 11 commits intomicrosoft:mainfrom
richardpark-msft:wz-update-graders

Conversation

@richardpark-msft
Copy link
Member

Consolidating similar graders just to reduce the # of top level graders people have to figure out.

Overall, this should be a bit easier for people - instead of having to think about which grader to use, it's more about which text matching type they want to use, with a single grader to handle them all.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates the existing regex and keyword graders into a single text grader, and updates schemas, tests, examples, and documentation to reflect the new unified configuration surface.

Changes:

  • Introduces a new text grader implementation (substring + regex checks) and removes the standalone regex/keyword grader implementations.
  • Updates Go tests, web E2E fixtures, and example eval/task YAML to use type: text and regex_match / regex_not_match (and related substring keys).
  • Updates JSON schemas and documentation to reflect the new grader type and configuration keys.

Reviewed changes

Copilot reviewed 55 out of 55 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
web/e2e/run-detail.spec.ts Updates E2E assertions to expect the new text grader badge.
web/e2e/fixtures/mock-data.ts Updates mocked run detail grader types from regex to text.
skills/waza/SKILL.md Updates skill example task grader type from regex to text.
skills/waza-interactive/tests/eval.yaml Updates interactive skill eval validators to use type: text with regex_match.
site/src/content/docs/reference/schema.mdx Updates schema reference examples toward text grader usage.
site/src/content/docs/index.mdx Updates landing-page YAML snippet to type: text.
site/src/content/docs/guides/graders.mdx Replaces Regex/Keyword sections with a consolidated Text grader section and updates examples.
site/src/content/docs/guides/eval-yaml.mdx Updates eval.yaml guide examples to use type: text.
site/src/content/docs/getting-started.mdx Updates getting-started examples to use type: text.
site/src/content/docs/about.mdx Updates about-page YAML snippet to show type: text.
schemas/task.schema.json Replaces regex with text in the task schema grader type enum and config refs; adds textGraderConfig.
schemas/eval.schema.json Replaces regex/keyword with text in the eval schema and adds textGraderConfig.
internal/webapi/additional_test.go Updates web API test data to use GraderKindText.
internal/transcript/transcript_test.go Updates transcript tests to expect GraderKindText.
internal/suggest/suggest_test.go Updates suggestion-related tests to use text grader types in prompts/parsing.
internal/suggest/suggest.go Keeps suggestion validation flow intact while consuming updated grader kinds.
internal/suggest/prompt.go Updates embedded prompt/schema examples and grader-type lists to use text.
internal/suggest/grader_docs.go Updates grader summaries to describe the unified text grader.
internal/session/session_test.go Updates session event rendering tests to use text grader identifiers/types.
internal/scaffold/scaffold_test.go Updates scaffold tests to expect type: text in generated eval.yaml.
internal/scaffold/scaffold.go Updates scaffolded eval.yaml template to use type: text and regex_match.
internal/reporting/junit_test.go Updates JUnit formatting tests to reflect new grader types/names.
internal/orchestration/runner_orchestration_test.go Updates orchestration tests and inline YAML snippets to use text + regex_match.
internal/models/spec_test.go Updates spec parsing/weight tests to use type: text with regex_match.
internal/models/outcome.go Replaces GraderKindRegex/GraderKindKeyword constants with GraderKindText.
internal/graders/text_grader_test.go Adds comprehensive unit tests for the new TextGrader.
internal/graders/text_grader.go Adds new TextGrader implementation (contains/not_contains + regex match/not-match).
internal/graders/regex_grader_test.go Removes old regex grader tests.
internal/graders/regex_grader.go Removes old regex grader implementation.
internal/graders/keyword_grader_test.go Removes old keyword grader tests.
internal/graders/keyword_grader.go Removes old keyword grader implementation.
internal/graders/grader.go Updates factory to construct TextGrader instead of RegexGrader/KeywordGrader.
internal/cache/cache_test.go Updates cache tests to use GraderKindText.
examples/grader-showcase/tasks/skill-invocation-example.yaml Updates example to use type: text with regex_match.
examples/grader-showcase/tasks/regex-task.yaml Renames regex demo task to text demo and updates grader type/config keys.
examples/grader-showcase/eval.yaml Updates global safety grader to type: text with regex_not_match.
examples/grader-showcase/README.md Updates grader showcase docs/examples toward text.
examples/code-explainer/eval.yaml Updates example eval to use type: text with regex_not_match.
docs/research/waza-msbench-integration-design.md Updates research doc references to type: text.
docs/research/waza-eval-registry-design.md Updates research doc YAML examples to type: text and regex_* keys.
docs/graders/text.md Adds new standalone docs page for the text grader.
docs/graders/regex.md Removes old regex grader docs page.
docs/graders/keyword.md Removes old keyword grader docs page.
docs/graders/file.md Updates file grader docs examples (content pattern key names).
docs/graders/README.md Updates grader reference examples toward text.
docs/TUTORIAL.md Updates tutorial snippets from regex to text with regex_* keys.
docs/SKILLS_CI_INTEGRATION.md Updates CI integration docs to use type: text and regex_match.
docs/INTEGRATION-TESTING.md Updates integration testing docs to use type: text and regex_match.
docs/GUIDE.md Updates guide snippets referencing regex to text.
docs/GETTING-STARTED.md Updates getting-started doc snippet to use type: text.
cmd/waza/reporter_test.go Updates CLI reporter tests to use GraderKindText.
cmd/waza/cmd_run_suggest_test.go Updates run-suggest tests to use GraderKindText.
cmd/waza/cmd_new_test.go Updates waza new tests to expect type: text.
README.md Updates README examples and grader table entry from regex to text.
AGENTS.md Updates repository overview/examples to reference text grader instead of regex.
Comments suppressed due to low confidence (7)

internal/suggest/suggest_test.go:193

  • The text grader config in this eval YAML sample uses must_include, which isn’t a supported text grader option (it will be ignored by mapstructure and the grader will end up with zero checks, always passing). Update this example to use contains / not_contains and/or regex_match / regex_not_match so it actually exercises the text grader.
    schemas/task.schema.json:222
  • validatorInline.type no longer allows keyword (removed from the enum), but the schema still contains an if/then branch for type: keyword (and a $defs/keywordGraderConfig). This makes the schema internally inconsistent and leaves dead/unreachable validation rules. Remove the keyword branch/defs (or map it to the new text grader) to keep the task schema aligned with the consolidated grader types.
    site/src/content/docs/guides/eval-yaml.mdx:29
  • This eval.yaml example uses type: text but config still uses pattern. pattern is not part of the text grader config (supported keys are contains/not_contains/*_cs and regex_match/regex_not_match). Update the example to use regex_match (list) or contains so it matches the new consolidated grader.

This issue also appears on line 94 of the same file.
site/src/content/docs/index.mdx:85

  • In this file the grader type is updated to text, but the config still uses pattern, which isn’t a supported field for the text grader (it will be ignored). Replace pattern with regex_match (list) and/or contains depending on what you want to validate.
    site/src/content/docs/getting-started.mdx:195
  • This getting-started snippet switches to type: text but still uses pattern, which isn’t supported by the text grader (supported keys are contains/not_contains/*_cs and regex_match/regex_not_match). Update the example to use regex_match (list) or contains so it matches the implementation.
    site/src/content/docs/guides/eval-yaml.mdx:112
  • In the “Graders Section” example, type: text is paired with pattern and later keywords/must_include_all. None of those fields are part of the new text grader config (and will be ignored), so the docs won’t produce the expected validation behavior. Update these examples to use contains/not_contains and/or regex_match/regex_not_match.
    docs/GETTING-STARTED.md:191
  • This docs example updates the grader to type: text but keeps pattern, which the text grader doesn’t recognize (so the check would be skipped). Replace pattern with regex_match (array) or contains / not_contains depending on intent.
  - type: text
    name: explains_concepts
    config:
      pattern: "(?i)(function|variable|parameter|return|logic)"
  

@codecov-commenter
Copy link

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

@richardpark-msft richardpark-msft merged commit 0cdf5ec into microsoft:main Mar 3, 2026
6 checks passed
@richardpark-msft richardpark-msft deleted the wz-update-graders branch March 3, 2026 02:03
spboyer added a commit that referenced this pull request Mar 3, 2026
* feat: implement trigger accuracy metrics (#41)

* feat: implement behavior quality metrics (#42)

* feat: implement waza compare command (#27)

* feat: implement parallel task execution (#45)
spboyer pushed a commit that referenced this pull request Mar 3, 2026
#41)

Consolidating the keyword and regex graders into a single text grader.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants