Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new trigger heuristic grader type to validate whether a prompt should (or should not) activate a skill, closing #80.
Changes:
- Introduces
triggergrader implementation + unit tests. - Wires
triggerinto grader type registries, creation factory, and JSON schemas. - Adds end-user documentation for configuring and interpreting trigger grading output.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| schemas/task.schema.json | Adds trigger grader type and triggerGraderConfig definition for task schema validation. |
| schemas/eval.schema.json | Adds trigger grader type and documented triggerGraderConfig definition for eval schema validation. |
| internal/suggest/suggest.go | Includes trigger in the set of grader types surfaced by suggest. |
| internal/suggest/grader_docs.go | Adds a short description for the trigger grader in suggest docs. |
| internal/models/outcome.go | Defines GraderKindTrigger constant for the new grader kind. |
| internal/graders/trigger_grader.go | Implements the trigger heuristic grader (keyword + phrase scoring). |
| internal/graders/trigger_grader_test.go | Adds tests for constructor validation, mode behavior, thresholds, and factory creation. |
| internal/graders/grader.go | Extends Create factory to construct the trigger grader from config. |
| docs/graders/trigger.md | Documents trigger grader configuration, scoring behavior, and output details. |
| docs/graders/README.md | Links the new trigger grader docs from the graders index. |
| .squad/log/2026-03-05T00-36-issue-assignment-pipeline.md | Adds session log entry (process documentation). |
| .squad/log/2026-03-05T00-26-rusty-token-diff-design.md | Adds session log entry (process documentation). |
| .squad/decisions.md | Records decision notes (process documentation). |
| .squad/agents/linus/history.md | Records learnings summary for #80 (process documentation). |
Comments suppressed due to low confidence (4)
internal/graders/trigger_grader.go:163
bestPhraseScorere-tokenizes the prompt even thoughscorePromptalready computedpromptTokens. Consider passingpromptTokens(and/or a precomputed token set + lowercased prompt) intobestPhraseScoreto avoid duplicate tokenization and allocation on every grade call.
func (g *triggerHeuristicGrader) scorePrompt(prompt string) (score float64, phraseScore float64, matched []string) {
promptTokens := tokenize(prompt)
if len(promptTokens) == 0 {
return 0, 0, nil
}
seen := make(map[string]bool)
for _, token := range promptTokens {
if _, ok := g.keywords[token]; ok && !seen[token] {
seen[token] = true
matched = append(matched, token)
}
}
tokenScore := float64(len(matched)) / float64(len(promptTokens))
phraseScore = g.bestPhraseScore(prompt)
if phraseScore > tokenScore {
return phraseScore, phraseScore, matched
}
return tokenScore, phraseScore, matched
}
func (g *triggerHeuristicGrader) bestPhraseScore(prompt string) float64 {
if len(g.triggerPhrases) == 0 {
return 0
}
promptLower := strings.ToLower(prompt)
promptTokenSet := make(map[string]struct{})
for _, token := range tokenize(prompt) {
promptTokenSet[token] = struct{}{}
}
internal/graders/trigger_grader.go:203
- The error messages hardcode
SKILL.md, butskillPathmay be any filename/path (even though docs recommend SKILL.md). Consider changing these messages to reference “skill file” (or include the provided path without assuming the basename) to avoid misleading users during debugging.
func loadTriggerHeuristicData(skillPath string) (map[string]struct{}, []string, error) {
data, err := os.ReadFile(skillPath)
if err != nil {
return nil, nil, fmt.Errorf("reading SKILL.md %s: %w", skillPath, err)
}
var sk skill.Skill
if err := sk.UnmarshalText(data); err != nil {
return nil, nil, fmt.Errorf("parsing SKILL.md %s: %w", skillPath, err)
}
schemas/task.schema.json:868
triggerGraderConfigintask.schema.jsonis missing thedescriptionfields that were added ineval.schema.json(for the object and its properties). Adding the same descriptions here would keep the schemas consistent and improve generated docs / editor intellisense for task configs.
"triggerGraderConfig": {
"type": "object",
"required": [
"skill_path",
"mode"
],
"additionalProperties": false,
"properties": {
"skill_path": {
"type": "string",
"minLength": 1
},
"mode": {
"type": "string",
"enum": [
"positive",
"negative"
]
},
"threshold": {
"type": "number",
"minimum": 0,
"maximum": 1,
"default": 0.6
}
}
},
internal/graders/trigger_grader_test.go:141
- Prefer
require.GreaterOrEqual(t, result.Score, threshold)for clearer failure output and consistency with other assertions in this test file.
require.True(t, result.Score >= threshold)
require.True(t, result.Passed)
|
I like the consistency of having everything be a grader instead of having graders and trigger tests defined separately as we do today (for example). Do you want to remove the existing trigger test feature in this PR? |
spboyer
left a comment
There was a problem hiding this comment.
Rusty (Opus 4.6) —
The trigger heuristic grader implementation is solid architecturally — clean grader pattern, constructor validation, 4 test functions, \docs/graders/trigger.md, schema updates. I want to approve this.
Blocker: Merge conflict. GitHub reports \mergeable: CONFLICTING. Only 1 CI check (CLA) ran — the full Go CI matrix didn't execute.
Action needed:
- Rebase onto main to resolve the merge conflict (likely in \internal/models/outcome.go\ where fields overlap with PR #91)
- Verify all 7 CI checks pass
- Push the rebased branch
Once CI is green, I'll approve immediately.
7d59e0e to
671e8a9
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #90 +/- ##
=======================================
Coverage ? 72.27%
=======================================
Files ? 129
Lines ? 14409
Branches ? 0
=======================================
Hits ? 10414
Misses ? 3219
Partials ? 776
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
671e8a9 to
9fec0b1
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
9fec0b1 to
3f422a9
Compare
Closes #80