Consolidating the keyword and regex graders into a single text grader. by richardpark-msft · Pull Request #41 · microsoft/waza

richardpark-msft · 2026-03-03T01:18:44Z

Consolidating similar graders just to reduce the # of top level graders people have to figure out.

Overall, this should be a bit easier for people - instead of having to think about which grader to use, it's more about which text matching type they want to use, with a single grader to handle them all.

Copilot

Pull request overview

This PR consolidates the existing regex and keyword graders into a single text grader, and updates schemas, tests, examples, and documentation to reflect the new unified configuration surface.

Changes:

Introduces a new text grader implementation (substring + regex checks) and removes the standalone regex/keyword grader implementations.
Updates Go tests, web E2E fixtures, and example eval/task YAML to use type: text and regex_match / regex_not_match (and related substring keys).
Updates JSON schemas and documentation to reflect the new grader type and configuration keys.

Reviewed changes

Copilot reviewed 55 out of 55 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
web/e2e/run-detail.spec.ts	Updates E2E assertions to expect the new `text` grader badge.
web/e2e/fixtures/mock-data.ts	Updates mocked run detail grader types from `regex` to `text`.
skills/waza/SKILL.md	Updates skill example task grader type from `regex` to `text`.
skills/waza-interactive/tests/eval.yaml	Updates interactive skill eval validators to use `type: text` with `regex_match`.
site/src/content/docs/reference/schema.mdx	Updates schema reference examples toward `text` grader usage.
site/src/content/docs/index.mdx	Updates landing-page YAML snippet to `type: text`.
site/src/content/docs/guides/graders.mdx	Replaces `Regex`/`Keyword` sections with a consolidated `Text` grader section and updates examples.
site/src/content/docs/guides/eval-yaml.mdx	Updates eval.yaml guide examples to use `type: text`.
site/src/content/docs/getting-started.mdx	Updates getting-started examples to use `type: text`.
site/src/content/docs/about.mdx	Updates about-page YAML snippet to show `type: text`.
schemas/task.schema.json	Replaces `regex` with `text` in the task schema grader type enum and config refs; adds `textGraderConfig`.
schemas/eval.schema.json	Replaces `regex`/`keyword` with `text` in the eval schema and adds `textGraderConfig`.
internal/webapi/additional_test.go	Updates web API test data to use `GraderKindText`.
internal/transcript/transcript_test.go	Updates transcript tests to expect `GraderKindText`.
internal/suggest/suggest_test.go	Updates suggestion-related tests to use `text` grader types in prompts/parsing.
internal/suggest/suggest.go	Keeps suggestion validation flow intact while consuming updated grader kinds.
internal/suggest/prompt.go	Updates embedded prompt/schema examples and grader-type lists to use `text`.
internal/suggest/grader_docs.go	Updates grader summaries to describe the unified `text` grader.
internal/session/session_test.go	Updates session event rendering tests to use `text` grader identifiers/types.
internal/scaffold/scaffold_test.go	Updates scaffold tests to expect `type: text` in generated eval.yaml.
internal/scaffold/scaffold.go	Updates scaffolded eval.yaml template to use `type: text` and `regex_match`.
internal/reporting/junit_test.go	Updates JUnit formatting tests to reflect new grader types/names.
internal/orchestration/runner_orchestration_test.go	Updates orchestration tests and inline YAML snippets to use `text` + `regex_match`.
internal/models/spec_test.go	Updates spec parsing/weight tests to use `type: text` with `regex_match`.
internal/models/outcome.go	Replaces `GraderKindRegex`/`GraderKindKeyword` constants with `GraderKindText`.
internal/graders/text_grader_test.go	Adds comprehensive unit tests for the new `TextGrader`.
internal/graders/text_grader.go	Adds new `TextGrader` implementation (contains/not_contains + regex match/not-match).
internal/graders/regex_grader_test.go	Removes old regex grader tests.
internal/graders/regex_grader.go	Removes old regex grader implementation.
internal/graders/keyword_grader_test.go	Removes old keyword grader tests.
internal/graders/keyword_grader.go	Removes old keyword grader implementation.
internal/graders/grader.go	Updates factory to construct `TextGrader` instead of `RegexGrader`/`KeywordGrader`.
internal/cache/cache_test.go	Updates cache tests to use `GraderKindText`.
examples/grader-showcase/tasks/skill-invocation-example.yaml	Updates example to use `type: text` with `regex_match`.
examples/grader-showcase/tasks/regex-task.yaml	Renames regex demo task to text demo and updates grader type/config keys.
examples/grader-showcase/eval.yaml	Updates global safety grader to `type: text` with `regex_not_match`.
examples/grader-showcase/README.md	Updates grader showcase docs/examples toward `text`.
examples/code-explainer/eval.yaml	Updates example eval to use `type: text` with `regex_not_match`.
docs/research/waza-msbench-integration-design.md	Updates research doc references to `type: text`.
docs/research/waza-eval-registry-design.md	Updates research doc YAML examples to `type: text` and `regex_*` keys.
docs/graders/text.md	Adds new standalone docs page for the `text` grader.
docs/graders/regex.md	Removes old regex grader docs page.
docs/graders/keyword.md	Removes old keyword grader docs page.
docs/graders/file.md	Updates file grader docs examples (content pattern key names).
docs/graders/README.md	Updates grader reference examples toward `text`.
docs/TUTORIAL.md	Updates tutorial snippets from `regex` to `text` with `regex_*` keys.
docs/SKILLS_CI_INTEGRATION.md	Updates CI integration docs to use `type: text` and `regex_match`.
docs/INTEGRATION-TESTING.md	Updates integration testing docs to use `type: text` and `regex_match`.
docs/GUIDE.md	Updates guide snippets referencing `regex` to `text`.
docs/GETTING-STARTED.md	Updates getting-started doc snippet to use `type: text`.
cmd/waza/reporter_test.go	Updates CLI reporter tests to use `GraderKindText`.
cmd/waza/cmd_run_suggest_test.go	Updates run-suggest tests to use `GraderKindText`.
cmd/waza/cmd_new_test.go	Updates `waza new` tests to expect `type: text`.
README.md	Updates README examples and grader table entry from `regex` to `text`.
AGENTS.md	Updates repository overview/examples to reference `text` grader instead of `regex`.

Comments suppressed due to low confidence (7)

internal/suggest/suggest_test.go:193

The text grader config in this eval YAML sample uses must_include, which isn’t a supported text grader option (it will be ignored by mapstructure and the grader will end up with zero checks, always passing). Update this example to use contains / not_contains and/or regex_match / regex_not_match so it actually exercises the text grader.
schemas/task.schema.json:222
validatorInline.type no longer allows keyword (removed from the enum), but the schema still contains an if/then branch for type: keyword (and a $defs/keywordGraderConfig). This makes the schema internally inconsistent and leaves dead/unreachable validation rules. Remove the keyword branch/defs (or map it to the new text grader) to keep the task schema aligned with the consolidated grader types.
site/src/content/docs/guides/eval-yaml.mdx:29
This eval.yaml example uses type: text but config still uses pattern. pattern is not part of the text grader config (supported keys are contains/not_contains/*_cs and regex_match/regex_not_match). Update the example to use regex_match (list) or contains so it matches the new consolidated grader.

This issue also appears on line 94 of the same file.
site/src/content/docs/index.mdx:85

In this file the grader type is updated to text, but the config still uses pattern, which isn’t a supported field for the text grader (it will be ignored). Replace pattern with regex_match (list) and/or contains depending on what you want to validate.
site/src/content/docs/getting-started.mdx:195
This getting-started snippet switches to type: text but still uses pattern, which isn’t supported by the text grader (supported keys are contains/not_contains/*_cs and regex_match/regex_not_match). Update the example to use regex_match (list) or contains so it matches the implementation.
site/src/content/docs/guides/eval-yaml.mdx:112
In the “Graders Section” example, type: text is paired with pattern and later keywords/must_include_all. None of those fields are part of the new text grader config (and will be ignored), so the docs won’t produce the expected validation behavior. Update these examples to use contains/not_contains and/or regex_match/regex_not_match.
docs/GETTING-STARTED.md:191
This docs example updates the grader to type: text but keeps pattern, which the text grader doesn’t recognize (so the check would be skipped). Replace pattern with regex_match (array) or contains / not_contains depending on intent.

  - type: text
    name: explains_concepts
    config:
      pattern: "(?i)(function|variable|parameter|return|logic)"

site/src/content/docs/guides/graders.mdx

docs/GUIDE.md

docs/research/waza-msbench-integration-design.md

site/src/content/docs/guides/graders.mdx

docs/graders/README.md

docs/graders/file.md

examples/grader-showcase/README.md

site/src/content/docs/about.mdx

cmd/waza/cmd_run_suggest_test.go

site/src/content/docs/reference/schema.mdx

codecov-commenter · 2026-03-03T01:38:41Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

…ial part of the test.

* feat: implement trigger accuracy metrics (#41) * feat: implement behavior quality metrics (#42) * feat: implement waza compare command (#27) * feat: implement parallel task execution (#45)

#41) Consolidating the keyword and regex graders into a single text grader.

Consolidating the keyword and regex graders into a single text grader.

7906f7c

Overall, this should be a bit easier for people - instead of having to think about which grader to use, it's more about which text matching type they want to use, with a single grader to handle them all.

richardpark-msft requested review from chlowell and spboyer as code owners March 3, 2026 01:18

richardpark-msft requested review from Copilot and removed request for chlowell and spboyer March 3, 2026 01:18

Copilot started reviewing on behalf of richardpark-msft March 3, 2026 01:20 View session

gofmt

f0dbcfb

Copilot AI reviewed Mar 3, 2026

View reviewed changes

Richard Park added 9 commits March 3, 2026 01:41

Fixing map to use the real attribute - it looks like it's not a mater…

4759836

…ial part of the test.

Fixing ToC for the grader reference.

e232a7e

Went too far with search and replace!

062c4cd

regex_not_match

2e2a0af

Some more overzealous changes.

ba86d8a

Yet another spot where we referencing keyword/regex.

29469d9

Fixing graders ToC

022900a

Some more overzealous changes.

55067f3

Missed another spot using this key

f80ad0f

richardpark-msft merged commit 0cdf5ec into microsoft:main Mar 3, 2026
6 checks passed

richardpark-msft deleted the wz-update-graders branch March 3, 2026 02:03

spboyer pushed a commit that referenced this pull request Mar 3, 2026

Consolidating the keyword and regex graders into a single text grader. (

33c7bc7

#41) Consolidating the keyword and regex graders into a single text grader.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consolidating the keyword and regex graders into a single text grader.#41

Consolidating the keyword and regex graders into a single text grader.#41
richardpark-msft merged 11 commits intomicrosoft:mainfrom
richardpark-msft:wz-update-graders

richardpark-msft commented Mar 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

richardpark-msft commented Mar 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Mar 3, 2026

Welcome to Codecov 🎉

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants