feat: sensei scoring parity — WHEN: triggers, spec-security, Invalid level, advisory checks 16-18 by spboyer · Pull Request #79 · microsoft/waza

spboyer · 2026-03-04T23:38:26Z

Summary

Brings waza's scoring engine in line with spboyer/sensei v1.3.0. Adds 7 features across scoring and checks:

Changes

Issue	Feature	Type
Closes #72	`WHEN:` trigger pattern recognition	Scoring
Closes #73	`spec-security` check (XML tags, reserved name prefixes)	Spec compliance
Closes #74	`Invalid` score level for >1024 char descriptions	Scoring
Closes #75	Cross-model description density check (advisory 16)	Advisory
Closes #76	Body structure quality check (advisory 17)	Advisory
Closes #77	Progressive disclosure check (advisory 18)	Advisory
Closes #78	Context-dependent anti-trigger risk assessment	Scoring

Files Changed (777 additions, 12 deletions)

internal/checks/advisory_checks.go — New: CrossModelDensityChecker, BodyStructureChecker, ProgressiveDisclosureChecker
internal/checks/advisory_checks_test.go — Tests for all 3 advisory checkers
internal/checks/score_checkers.go — Register new checkers in pipeline
internal/checks/spec_checks.go — New: SpecSecurityChecker
internal/checks/spec_checks_test.go — Tests for security checker
internal/scoring/scoring.go — WHEN: trigger, Invalid level, context-dependent anti-triggers
internal/scoring/scoring_test.go — Tests for all scoring changes

Review History

Linus (implementation) → Rusty rejected (dead code + panic risk)
Turk (fixes) → Rusty approved ✅

All tests pass. go vet clean.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Copilot

Pull request overview

Aligns waza’s scoring and compliance checks with sensei v1.3.0 by extending the heuristic scoring model and adding new spec/advisory checkers.

Changes:

Adds AdherenceInvalid and short-circuits scoring when description length exceeds 1024 characters; adds WHEN: to trigger detection and introduces catalog-size-based anti-trigger risk warnings.
Introduces SpecSecurityChecker and registers it in the spec checker pipeline.
Adds and registers advisory checkers for cross-model description density, body structure quality, and progressive disclosure, with corresponding tests.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
internal/scoring/scoring.go	Adds `Invalid` adherence, `WHEN:` trigger detection, and context-dependent anti-trigger risk logic.
internal/scoring/scoring_test.go	Adds/updates tests for `Invalid`, `WHEN:`, and anti-trigger risk behavior.
internal/checks/spec_checks.go	Adds `SpecSecurityChecker` implementation.
internal/checks/spec_checks_test.go	Adds tests for `SpecSecurityChecker`.
internal/checks/score_checkers.go	Registers the new spec-security and advisory checkers in the pipeline.
internal/checks/advisory_checks.go	Adds advisory checkers (density, body structure, progressive disclosure).
internal/checks/advisory_checks_test.go	Adds tests for the three new advisory checkers.

Comments suppressed due to low confidence (3)

internal/checks/advisory_checks.go:476

regexp.MustCompile is called inside BodyStructureChecker.Check, meaning the regex is recompiled on every check run. Since this runs per skill, consider hoisting this regex to a package-level var (similar to other patterns in this file) to avoid repeated compilation and keep the checker cheaper to run.

This issue also appears on line 544 of the same file.

	hasCodeBlocks := strings.Contains(content, "```")
	hasNumberedSteps := regexp.MustCompile(`(?m)^\s*\d+\.\s+`).MatchString(content)

internal/checks/advisory_checks.go:546

codeBlockPattern := regexp.MustCompile(...) is created inside ProgressiveDisclosureChecker.Check, so it recompiles for every skill. Consider making it a package-level var so the regex is compiled once and reused across checks.

	// Count large code blocks (>50 lines)
	codeBlockPattern := regexp.MustCompile("(?s)```[^`]*```")
	blocks := codeBlockPattern.FindAllString(sk.RawContent, -1)

internal/checks/advisory_checks.go:543

ProgressiveDisclosureChecker counts lines and code blocks over sk.RawContent, which includes YAML frontmatter. The advisory definition is about the SKILL.md body, so this can over-count and incorrectly trigger warnings. Consider running these checks over sk.Body (or otherwise excluding the frontmatter block) so the thresholds apply to the body content only.

func (*ProgressiveDisclosureChecker) Check(sk skill.Skill) (*CheckResult, error) {
	lines := strings.Split(sk.RawContent, "\n")
	bodyLines := len(lines)

internal/checks/advisory_checks.go

internal/checks/spec_checks.go

internal/scoring/scoring.go

internal/checks/advisory_checks.go

…rsing, slice traversal, WHEN: count, body field, summary format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Orchestration logs: - Linus: Implementation of 7 sensei scoring parity features - Rusty (review): Initial code review — 2 must-fix issues identified - Turk: Fixes applied — checker registration, panic guard - Rusty (re-review): Approval — all issues resolved Session log: - /Users/shboyer/github/waza/.squad/log/2026-03-04T2320-sensei-parity.md - Gap analysis, 7 issues created (#72-#78) - Implementation, rejection, fix, and approval workflow - PR #79 ready for merge Decision merges from inbox: - User directive: Code in GPT-5.3-Codex, reviews in Opus 4.6 - Sensei parity code review (initial + re-review) - Inbox files cleaned Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Orchestration log: Linus fixed all 6 PR review comments - Session log: All threads resolved, tests passing - Model: gpt-5.3-codex on sync execution

…rsing, slice traversal, WHEN: count, body field, summary format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Orchestration logs: - Linus: Implementation of 7 sensei scoring parity features - Rusty (review): Initial code review — 2 must-fix issues identified - Turk: Fixes applied — checker registration, panic guard - Rusty (re-review): Approval — all issues resolved Session log: - /Users/shboyer/github/waza/.squad/log/2026-03-04T2320-sensei-parity.md - Gap analysis, 7 issues created (#72-#78) - Implementation, rejection, fix, and approval workflow - PR #79 ready for merge Decision merges from inbox: - User directive: Code in GPT-5.3-Codex, reviews in Opus 4.6 - Sensei parity code review (initial + re-review) - Inbox files cleaned Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Orchestration log: Linus fixed all 6 PR review comments - Session log: All threads resolved, tests passing - Model: gpt-5.3-codex on sync execution

…rsing, slice traversal, WHEN: count, body field, summary format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

internal/checks/advisory_checks.go:477

BodyStructureChecker recompiles the numbered-steps regexp on every Check() call via regexp.MustCompile(...). Since this checker can run across many skills, consider moving this to a package-level precompiled regexp (e.g., var numberedStepsRE = regexp.MustCompile(...)) to avoid repeated compilation overhead.

	hasCodeBlocks := strings.Contains(content, "```")
	hasNumberedSteps := regexp.MustCompile(`(?m)^\s*\d+\.\s+`).MatchString(content)

internal/scoring/scoring.go

spboyer

✅ Reviewed by Rusty — LGTM (third review). All 7 sensei parity features (#72-#78) implemented correctly. Checkers registered, panic guard in place, Invalid adherence level, WHEN: triggers, context-dependent risk. 777 additions with comprehensive test coverage. Turk's fixes resolved both must-fix issues. CI green across all checks. Ship it. Ready to merge (can't self-approve since you authored this).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…evel, advisory checks 16-18 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Co-authored-by: Richard Park <ripark@microsoft.com>

wbreza

Code Review: PR #79 - feat: sensei scoring parity

What Looks Good

All 8 prior Copilot review comments addressed - thorough iteration
Comprehensive test coverage - 487 lines of new tests with boundary cases
Clean architecture - all checkers follow ComplianceChecker interface
sectionForHeader() isolates sections correctly - solves trigger/anti-trigger overlap
Invalid adherence level short-circuit - well-designed
Context-dependent anti-trigger risk fires only when anti-triggers are present
skillBodyContent() prefers sk.Body over raw parsing with test verification

Suggestions (non-blocking)

sectionForHeader() repeated strings.ToUpper() - Consider pre-computing once. Minor perf.
errorHandlingPatterns includes error - Broad match. Consider heading-level patterns.

Summary

Priority	Count
Critical	0
High	0
Medium	2
Low	2

Overall Assessment: Approve - solid feature PR with comprehensive testing.

internal/scoring/scoring.go

internal/checks/advisory_checks.go

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 4, 2026 23:38

spboyer requested review from chlowell and richardpark-msft as code owners March 4, 2026 23:38

spboyer added the sensei-parity Parity with spboyer/sensei scoring label Mar 4, 2026

github-actions bot enabled auto-merge (squash) March 4, 2026 23:38

Copilot started reviewing on behalf of spboyer March 4, 2026 23:39 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

spboyer added a commit that referenced this pull request Mar 4, 2026

fix: address PR #79 review feedback — punctuation stripping, fence pa…

fd138c1

…rsing, slice traversal, WHEN: count, body field, summary format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer added a commit that referenced this pull request Mar 4, 2026

docs(squad): log PR #79 feedback fix session

d838cc8

- Orchestration log: Linus fixed all 6 PR review comments - Session log: All threads resolved, tests passing - Model: gpt-5.3-codex on sync execution

spboyer added a commit that referenced this pull request Mar 4, 2026

fix: address PR #79 review feedback — punctuation stripping, fence pa…

e80ddc7

…rsing, slice traversal, WHEN: count, body field, summary format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer force-pushed the squad/sensei-parity branch from fd138c1 to e80ddc7 Compare March 4, 2026 23:59

spboyer added a commit that referenced this pull request Mar 5, 2026

docs(squad): log PR #79 feedback fix session

398180e

- Orchestration log: Linus fixed all 6 PR review comments - Session log: All threads resolved, tests passing - Model: gpt-5.3-codex on sync execution

spboyer added a commit that referenced this pull request Mar 5, 2026

fix: address PR #79 review feedback — punctuation stripping, fence pa…

e9dc6f2

…rsing, slice traversal, WHEN: count, body field, summary format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 5, 2026 00:18

spboyer force-pushed the squad/sensei-parity branch from e80ddc7 to c30d412 Compare March 5, 2026 00:18

Copilot AI mentioned this pull request Mar 5, 2026

fix: update docs link to GitHub Pages URL #87

Open

Copilot started reviewing on behalf of spboyer March 5, 2026 00:19 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

internal/scoring/scoring.go Outdated Show resolved Hide resolved

internal/scoring/scoring.go Show resolved Hide resolved

spboyer commented Mar 5, 2026

View reviewed changes

spboyer force-pushed the squad/sensei-parity branch from c30d412 to e8a1573 Compare March 5, 2026 16:00

spboyer added a commit that referenced this pull request Mar 5, 2026

fix: address review feedback on PR #79

da4aa80

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 5, 2026 17:32

spboyer and others added 2 commits March 5, 2026 12:46

feat: sensei scoring parity — WHEN triggers, spec-security, Invalid l…

d7c33dc

…evel, advisory checks 16-18 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: address review feedback on PR #79

ad8b723

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer force-pushed the squad/sensei-parity branch from da4aa80 to ad8b723 Compare March 5, 2026 17:46

Copilot AI reviewed Mar 5, 2026

View reviewed changes

chlowell pushed a commit to chlowell/waza that referenced this pull request Mar 5, 2026

remove python tests folder as well (microsoft#79)

ad328f8

Co-authored-by: Richard Park <ripark@microsoft.com>

wbreza previously approved these changes Mar 5, 2026

View reviewed changes

internal/scoring/scoring.go Show resolved Hide resolved

internal/checks/advisory_checks.go Show resolved Hide resolved

fix: tighten sensei scoring/header checks for PR79

6bceac7

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

spboyer dismissed wbreza’s stale review via 6bceac7 March 5, 2026 23:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: sensei scoring parity — WHEN: triggers, spec-security, Invalid level, advisory checks 16-18#79

feat: sensei scoring parity — WHEN: triggers, spec-security, Invalid level, advisory checks 16-18#79
spboyer wants to merge 3 commits intomainfrom
squad/sensei-parity

spboyer commented Mar 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

spboyer left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

wbreza left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

spboyer commented Mar 4, 2026

Summary

Changes

Files Changed (777 additions, 12 deletions)

Review History

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

spboyer left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

wbreza left a comment

Choose a reason for hiding this comment

Code Review: PR #79 - feat: sensei scoring parity

What Looks Good

Suggestions (non-blocking)

Summary

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants