Consolidating the keyword and regex graders into a single text grader.#41
Merged
richardpark-msft merged 11 commits intomicrosoft:mainfrom Mar 3, 2026
Merged
Conversation
Overall, this should be a bit easier for people - instead of having to think about which grader to use, it's more about which text matching type they want to use, with a single grader to handle them all.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR consolidates the existing regex and keyword graders into a single text grader, and updates schemas, tests, examples, and documentation to reflect the new unified configuration surface.
Changes:
- Introduces a new
textgrader implementation (substring + regex checks) and removes the standaloneregex/keywordgrader implementations. - Updates Go tests, web E2E fixtures, and example eval/task YAML to use
type: textandregex_match/regex_not_match(and related substring keys). - Updates JSON schemas and documentation to reflect the new grader type and configuration keys.
Reviewed changes
Copilot reviewed 55 out of 55 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| web/e2e/run-detail.spec.ts | Updates E2E assertions to expect the new text grader badge. |
| web/e2e/fixtures/mock-data.ts | Updates mocked run detail grader types from regex to text. |
| skills/waza/SKILL.md | Updates skill example task grader type from regex to text. |
| skills/waza-interactive/tests/eval.yaml | Updates interactive skill eval validators to use type: text with regex_match. |
| site/src/content/docs/reference/schema.mdx | Updates schema reference examples toward text grader usage. |
| site/src/content/docs/index.mdx | Updates landing-page YAML snippet to type: text. |
| site/src/content/docs/guides/graders.mdx | Replaces Regex/Keyword sections with a consolidated Text grader section and updates examples. |
| site/src/content/docs/guides/eval-yaml.mdx | Updates eval.yaml guide examples to use type: text. |
| site/src/content/docs/getting-started.mdx | Updates getting-started examples to use type: text. |
| site/src/content/docs/about.mdx | Updates about-page YAML snippet to show type: text. |
| schemas/task.schema.json | Replaces regex with text in the task schema grader type enum and config refs; adds textGraderConfig. |
| schemas/eval.schema.json | Replaces regex/keyword with text in the eval schema and adds textGraderConfig. |
| internal/webapi/additional_test.go | Updates web API test data to use GraderKindText. |
| internal/transcript/transcript_test.go | Updates transcript tests to expect GraderKindText. |
| internal/suggest/suggest_test.go | Updates suggestion-related tests to use text grader types in prompts/parsing. |
| internal/suggest/suggest.go | Keeps suggestion validation flow intact while consuming updated grader kinds. |
| internal/suggest/prompt.go | Updates embedded prompt/schema examples and grader-type lists to use text. |
| internal/suggest/grader_docs.go | Updates grader summaries to describe the unified text grader. |
| internal/session/session_test.go | Updates session event rendering tests to use text grader identifiers/types. |
| internal/scaffold/scaffold_test.go | Updates scaffold tests to expect type: text in generated eval.yaml. |
| internal/scaffold/scaffold.go | Updates scaffolded eval.yaml template to use type: text and regex_match. |
| internal/reporting/junit_test.go | Updates JUnit formatting tests to reflect new grader types/names. |
| internal/orchestration/runner_orchestration_test.go | Updates orchestration tests and inline YAML snippets to use text + regex_match. |
| internal/models/spec_test.go | Updates spec parsing/weight tests to use type: text with regex_match. |
| internal/models/outcome.go | Replaces GraderKindRegex/GraderKindKeyword constants with GraderKindText. |
| internal/graders/text_grader_test.go | Adds comprehensive unit tests for the new TextGrader. |
| internal/graders/text_grader.go | Adds new TextGrader implementation (contains/not_contains + regex match/not-match). |
| internal/graders/regex_grader_test.go | Removes old regex grader tests. |
| internal/graders/regex_grader.go | Removes old regex grader implementation. |
| internal/graders/keyword_grader_test.go | Removes old keyword grader tests. |
| internal/graders/keyword_grader.go | Removes old keyword grader implementation. |
| internal/graders/grader.go | Updates factory to construct TextGrader instead of RegexGrader/KeywordGrader. |
| internal/cache/cache_test.go | Updates cache tests to use GraderKindText. |
| examples/grader-showcase/tasks/skill-invocation-example.yaml | Updates example to use type: text with regex_match. |
| examples/grader-showcase/tasks/regex-task.yaml | Renames regex demo task to text demo and updates grader type/config keys. |
| examples/grader-showcase/eval.yaml | Updates global safety grader to type: text with regex_not_match. |
| examples/grader-showcase/README.md | Updates grader showcase docs/examples toward text. |
| examples/code-explainer/eval.yaml | Updates example eval to use type: text with regex_not_match. |
| docs/research/waza-msbench-integration-design.md | Updates research doc references to type: text. |
| docs/research/waza-eval-registry-design.md | Updates research doc YAML examples to type: text and regex_* keys. |
| docs/graders/text.md | Adds new standalone docs page for the text grader. |
| docs/graders/regex.md | Removes old regex grader docs page. |
| docs/graders/keyword.md | Removes old keyword grader docs page. |
| docs/graders/file.md | Updates file grader docs examples (content pattern key names). |
| docs/graders/README.md | Updates grader reference examples toward text. |
| docs/TUTORIAL.md | Updates tutorial snippets from regex to text with regex_* keys. |
| docs/SKILLS_CI_INTEGRATION.md | Updates CI integration docs to use type: text and regex_match. |
| docs/INTEGRATION-TESTING.md | Updates integration testing docs to use type: text and regex_match. |
| docs/GUIDE.md | Updates guide snippets referencing regex to text. |
| docs/GETTING-STARTED.md | Updates getting-started doc snippet to use type: text. |
| cmd/waza/reporter_test.go | Updates CLI reporter tests to use GraderKindText. |
| cmd/waza/cmd_run_suggest_test.go | Updates run-suggest tests to use GraderKindText. |
| cmd/waza/cmd_new_test.go | Updates waza new tests to expect type: text. |
| README.md | Updates README examples and grader table entry from regex to text. |
| AGENTS.md | Updates repository overview/examples to reference text grader instead of regex. |
Comments suppressed due to low confidence (7)
internal/suggest/suggest_test.go:193
- The text grader config in this eval YAML sample uses
must_include, which isn’t a supportedtextgrader option (it will be ignored by mapstructure and the grader will end up with zero checks, always passing). Update this example to usecontains/not_containsand/orregex_match/regex_not_matchso it actually exercises the text grader.
schemas/task.schema.json:222 validatorInline.typeno longer allowskeyword(removed from the enum), but the schema still contains anif/thenbranch fortype: keyword(and a$defs/keywordGraderConfig). This makes the schema internally inconsistent and leaves dead/unreachable validation rules. Remove the keyword branch/defs (or map it to the newtextgrader) to keep the task schema aligned with the consolidated grader types.
site/src/content/docs/guides/eval-yaml.mdx:29- This eval.yaml example uses
type: textbut config still usespattern.patternis not part of the text grader config (supported keys arecontains/not_contains/*_csandregex_match/regex_not_match). Update the example to useregex_match(list) orcontainsso it matches the new consolidated grader.
This issue also appears on line 94 of the same file.
site/src/content/docs/index.mdx:85
- In this file the grader
typeis updated totext, but the config still usespattern, which isn’t a supported field for the text grader (it will be ignored). Replacepatternwithregex_match(list) and/orcontainsdepending on what you want to validate.
site/src/content/docs/getting-started.mdx:195 - This getting-started snippet switches to
type: textbut still usespattern, which isn’t supported by the text grader (supported keys arecontains/not_contains/*_csandregex_match/regex_not_match). Update the example to useregex_match(list) orcontainsso it matches the implementation.
site/src/content/docs/guides/eval-yaml.mdx:112 - In the “Graders Section” example,
type: textis paired withpatternand laterkeywords/must_include_all. None of those fields are part of the new text grader config (and will be ignored), so the docs won’t produce the expected validation behavior. Update these examples to usecontains/not_containsand/orregex_match/regex_not_match.
docs/GETTING-STARTED.md:191 - This docs example updates the grader to
type: textbut keepspattern, which the text grader doesn’t recognize (so the check would be skipped). Replacepatternwithregex_match(array) orcontains/not_containsdepending on intent.
- type: text
name: explains_concepts
config:
pattern: "(?i)(function|variable|parameter|return|logic)"
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. Thanks for integrating Codecov - We've got you covered ☂️ |
added 9 commits
March 3, 2026 01:41
…ial part of the test.
spboyer
added a commit
that referenced
this pull request
Mar 3, 2026
spboyer
pushed a commit
that referenced
this pull request
Mar 3, 2026
#41) Consolidating the keyword and regex graders into a single text grader.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Consolidating similar graders just to reduce the # of top level graders people have to figure out.