Skip to content

feat: Add trigger heuristic grader#90

Open
spboyer wants to merge 1 commit intomicrosoft:mainfrom
spboyer:squad/80-trigger-heuristic-grader
Open

feat: Add trigger heuristic grader#90
spboyer wants to merge 1 commit intomicrosoft:mainfrom
spboyer:squad/80-trigger-heuristic-grader

Conversation

@spboyer
Copy link
Member

@spboyer spboyer commented Mar 5, 2026

Closes #80

@spboyer spboyer requested a review from chlowell as a code owner March 5, 2026 01:35
Copilot AI review requested due to automatic review settings March 5, 2026 01:35
@spboyer spboyer self-assigned this Mar 5, 2026
@github-actions github-actions bot enabled auto-merge (squash) March 5, 2026 01:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new trigger heuristic grader type to validate whether a prompt should (or should not) activate a skill, closing #80.

Changes:

  • Introduces trigger grader implementation + unit tests.
  • Wires trigger into grader type registries, creation factory, and JSON schemas.
  • Adds end-user documentation for configuring and interpreting trigger grading output.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.

Show a summary per file
File Description
schemas/task.schema.json Adds trigger grader type and triggerGraderConfig definition for task schema validation.
schemas/eval.schema.json Adds trigger grader type and documented triggerGraderConfig definition for eval schema validation.
internal/suggest/suggest.go Includes trigger in the set of grader types surfaced by suggest.
internal/suggest/grader_docs.go Adds a short description for the trigger grader in suggest docs.
internal/models/outcome.go Defines GraderKindTrigger constant for the new grader kind.
internal/graders/trigger_grader.go Implements the trigger heuristic grader (keyword + phrase scoring).
internal/graders/trigger_grader_test.go Adds tests for constructor validation, mode behavior, thresholds, and factory creation.
internal/graders/grader.go Extends Create factory to construct the trigger grader from config.
docs/graders/trigger.md Documents trigger grader configuration, scoring behavior, and output details.
docs/graders/README.md Links the new trigger grader docs from the graders index.
.squad/log/2026-03-05T00-36-issue-assignment-pipeline.md Adds session log entry (process documentation).
.squad/log/2026-03-05T00-26-rusty-token-diff-design.md Adds session log entry (process documentation).
.squad/decisions.md Records decision notes (process documentation).
.squad/agents/linus/history.md Records learnings summary for #80 (process documentation).
Comments suppressed due to low confidence (4)

internal/graders/trigger_grader.go:163

  • bestPhraseScore re-tokenizes the prompt even though scorePrompt already computed promptTokens. Consider passing promptTokens (and/or a precomputed token set + lowercased prompt) into bestPhraseScore to avoid duplicate tokenization and allocation on every grade call.
func (g *triggerHeuristicGrader) scorePrompt(prompt string) (score float64, phraseScore float64, matched []string) {
	promptTokens := tokenize(prompt)
	if len(promptTokens) == 0 {
		return 0, 0, nil
	}

	seen := make(map[string]bool)
	for _, token := range promptTokens {
		if _, ok := g.keywords[token]; ok && !seen[token] {
			seen[token] = true
			matched = append(matched, token)
		}
	}

	tokenScore := float64(len(matched)) / float64(len(promptTokens))
	phraseScore = g.bestPhraseScore(prompt)
	if phraseScore > tokenScore {
		return phraseScore, phraseScore, matched
	}
	return tokenScore, phraseScore, matched
}

func (g *triggerHeuristicGrader) bestPhraseScore(prompt string) float64 {
	if len(g.triggerPhrases) == 0 {
		return 0
	}

	promptLower := strings.ToLower(prompt)
	promptTokenSet := make(map[string]struct{})
	for _, token := range tokenize(prompt) {
		promptTokenSet[token] = struct{}{}
	}

internal/graders/trigger_grader.go:203

  • The error messages hardcode SKILL.md, but skillPath may be any filename/path (even though docs recommend SKILL.md). Consider changing these messages to reference “skill file” (or include the provided path without assuming the basename) to avoid misleading users during debugging.
func loadTriggerHeuristicData(skillPath string) (map[string]struct{}, []string, error) {
	data, err := os.ReadFile(skillPath)
	if err != nil {
		return nil, nil, fmt.Errorf("reading SKILL.md %s: %w", skillPath, err)
	}

	var sk skill.Skill
	if err := sk.UnmarshalText(data); err != nil {
		return nil, nil, fmt.Errorf("parsing SKILL.md %s: %w", skillPath, err)
	}

schemas/task.schema.json:868

  • triggerGraderConfig in task.schema.json is missing the description fields that were added in eval.schema.json (for the object and its properties). Adding the same descriptions here would keep the schemas consistent and improve generated docs / editor intellisense for task configs.
    "triggerGraderConfig": {
      "type": "object",
      "required": [
        "skill_path",
        "mode"
      ],
      "additionalProperties": false,
      "properties": {
        "skill_path": {
          "type": "string",
          "minLength": 1
        },
        "mode": {
          "type": "string",
          "enum": [
            "positive",
            "negative"
          ]
        },
        "threshold": {
          "type": "number",
          "minimum": 0,
          "maximum": 1,
          "default": 0.6
        }
      }
    },

internal/graders/trigger_grader_test.go:141

  • Prefer require.GreaterOrEqual(t, result.Score, threshold) for clearer failure output and consistency with other assertions in this test file.
	require.True(t, result.Score >= threshold)
	require.True(t, result.Passed)

@chlowell
Copy link
Member

chlowell commented Mar 5, 2026

I like the consistency of having everything be a grader instead of having graders and trigger tests defined separately as we do today (for example). Do you want to remove the existing trigger test feature in this PR?

Copy link
Member Author

@spboyer spboyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rusty (Opus 4.6) — ⚠️ Blocked: Merge Conflict

The trigger heuristic grader implementation is solid architecturally — clean grader pattern, constructor validation, 4 test functions, \docs/graders/trigger.md, schema updates. I want to approve this.

Blocker: Merge conflict. GitHub reports \mergeable: CONFLICTING. Only 1 CI check (CLA) ran — the full Go CI matrix didn't execute.

Action needed:

  1. Rebase onto main to resolve the merge conflict (likely in \internal/models/outcome.go\ where fields overlap with PR #91)
  2. Verify all 7 CI checks pass
  3. Push the rebased branch

Once CI is green, I'll approve immediately.

@spboyer spboyer force-pushed the squad/80-trigger-heuristic-grader branch from 7d59e0e to 671e8a9 Compare March 5, 2026 15:18
@codecov-commenter
Copy link

codecov-commenter commented Mar 5, 2026

Codecov Report

❌ Patch coverage is 77.70701% with 35 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@a75477e). Learn more about missing BASE report.

Files with missing lines Patch % Lines
internal/graders/trigger_grader.go 78.14% 22 Missing and 11 partials ⚠️
internal/graders/grader.go 60.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main      #90   +/-   ##
=======================================
  Coverage        ?   72.27%           
=======================================
  Files           ?      129           
  Lines           ?    14409           
  Branches        ?        0           
=======================================
  Hits            ?    10414           
  Misses          ?     3219           
  Partials        ?      776           
Flag Coverage Δ
go-implementation 72.27% <77.70%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@spboyer spboyer force-pushed the squad/80-trigger-heuristic-grader branch from 671e8a9 to 9fec0b1 Compare March 5, 2026 17:12
Copilot AI review requested due to automatic review settings March 5, 2026 17:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer force-pushed the squad/80-trigger-heuristic-grader branch from 9fec0b1 to 3f422a9 Compare March 5, 2026 17:46
chlowell pushed a commit to chlowell/waza that referenced this pull request Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add trigger heuristic grader

4 participants