feat: sensei scoring parity — WHEN: triggers, spec-security, Invalid level, advisory checks 16-18#79
feat: sensei scoring parity — WHEN: triggers, spec-security, Invalid level, advisory checks 16-18#79
Conversation
There was a problem hiding this comment.
Pull request overview
Aligns waza’s scoring and compliance checks with sensei v1.3.0 by extending the heuristic scoring model and adding new spec/advisory checkers.
Changes:
- Adds
AdherenceInvalidand short-circuits scoring when description length exceeds 1024 characters; addsWHEN:to trigger detection and introduces catalog-size-based anti-trigger risk warnings. - Introduces
SpecSecurityCheckerand registers it in the spec checker pipeline. - Adds and registers advisory checkers for cross-model description density, body structure quality, and progressive disclosure, with corresponding tests.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/scoring/scoring.go | Adds Invalid adherence, WHEN: trigger detection, and context-dependent anti-trigger risk logic. |
| internal/scoring/scoring_test.go | Adds/updates tests for Invalid, WHEN:, and anti-trigger risk behavior. |
| internal/checks/spec_checks.go | Adds SpecSecurityChecker implementation. |
| internal/checks/spec_checks_test.go | Adds tests for SpecSecurityChecker. |
| internal/checks/score_checkers.go | Registers the new spec-security and advisory checkers in the pipeline. |
| internal/checks/advisory_checks.go | Adds advisory checkers (density, body structure, progressive disclosure). |
| internal/checks/advisory_checks_test.go | Adds tests for the three new advisory checkers. |
Comments suppressed due to low confidence (3)
internal/checks/advisory_checks.go:476
regexp.MustCompileis called insideBodyStructureChecker.Check, meaning the regex is recompiled on every check run. Since this runs per skill, consider hoisting this regex to a package-levelvar(similar to other patterns in this file) to avoid repeated compilation and keep the checker cheaper to run.
This issue also appears on line 544 of the same file.
hasCodeBlocks := strings.Contains(content, "```")
hasNumberedSteps := regexp.MustCompile(`(?m)^\s*\d+\.\s+`).MatchString(content)
internal/checks/advisory_checks.go:546
codeBlockPattern := regexp.MustCompile(...)is created insideProgressiveDisclosureChecker.Check, so it recompiles for every skill. Consider making it a package-levelvarso the regex is compiled once and reused across checks.
// Count large code blocks (>50 lines)
codeBlockPattern := regexp.MustCompile("(?s)```[^`]*```")
blocks := codeBlockPattern.FindAllString(sk.RawContent, -1)
internal/checks/advisory_checks.go:543
ProgressiveDisclosureCheckercounts lines and code blocks oversk.RawContent, which includes YAML frontmatter. The advisory definition is about the SKILL.md body, so this can over-count and incorrectly trigger warnings. Consider running these checks oversk.Body(or otherwise excluding the frontmatter block) so the thresholds apply to the body content only.
func (*ProgressiveDisclosureChecker) Check(sk skill.Skill) (*CheckResult, error) {
lines := strings.Split(sk.RawContent, "\n")
bodyLines := len(lines)
…rsing, slice traversal, WHEN: count, body field, summary format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Orchestration logs: - Linus: Implementation of 7 sensei scoring parity features - Rusty (review): Initial code review — 2 must-fix issues identified - Turk: Fixes applied — checker registration, panic guard - Rusty (re-review): Approval — all issues resolved Session log: - /Users/shboyer/github/waza/.squad/log/2026-03-04T2320-sensei-parity.md - Gap analysis, 7 issues created (#72-#78) - Implementation, rejection, fix, and approval workflow - PR #79 ready for merge Decision merges from inbox: - User directive: Code in GPT-5.3-Codex, reviews in Opus 4.6 - Sensei parity code review (initial + re-review) - Inbox files cleaned Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Orchestration log: Linus fixed all 6 PR review comments - Session log: All threads resolved, tests passing - Model: gpt-5.3-codex on sync execution
…rsing, slice traversal, WHEN: count, body field, summary format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
fd138c1 to
e80ddc7
Compare
Orchestration logs: - Linus: Implementation of 7 sensei scoring parity features - Rusty (review): Initial code review — 2 must-fix issues identified - Turk: Fixes applied — checker registration, panic guard - Rusty (re-review): Approval — all issues resolved Session log: - /Users/shboyer/github/waza/.squad/log/2026-03-04T2320-sensei-parity.md - Gap analysis, 7 issues created (#72-#78) - Implementation, rejection, fix, and approval workflow - PR #79 ready for merge Decision merges from inbox: - User directive: Code in GPT-5.3-Codex, reviews in Opus 4.6 - Sensei parity code review (initial + re-review) - Inbox files cleaned Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Orchestration log: Linus fixed all 6 PR review comments - Session log: All threads resolved, tests passing - Model: gpt-5.3-codex on sync execution
…rsing, slice traversal, WHEN: count, body field, summary format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
e80ddc7 to
c30d412
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
internal/checks/advisory_checks.go:477
- BodyStructureChecker recompiles the numbered-steps regexp on every Check() call via regexp.MustCompile(...). Since this checker can run across many skills, consider moving this to a package-level precompiled regexp (e.g.,
var numberedStepsRE = regexp.MustCompile(...)) to avoid repeated compilation overhead.
hasCodeBlocks := strings.Contains(content, "```")
hasNumberedSteps := regexp.MustCompile(`(?m)^\s*\d+\.\s+`).MatchString(content)
spboyer
left a comment
There was a problem hiding this comment.
✅ Reviewed by Rusty — LGTM (third review). All 7 sensei parity features (#72-#78) implemented correctly. Checkers registered, panic guard in place, Invalid adherence level, WHEN: triggers, context-dependent risk. 777 additions with comprehensive test coverage. Turk's fixes resolved both must-fix issues. CI green across all checks. Ship it. Ready to merge (can't self-approve since you authored this).
c30d412 to
e8a1573
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…evel, advisory checks 16-18 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
da4aa80 to
ad8b723
Compare
Co-authored-by: Richard Park <ripark@microsoft.com>
wbreza
left a comment
There was a problem hiding this comment.
Code Review: PR #79 - feat: sensei scoring parity
What Looks Good
- All 8 prior Copilot review comments addressed - thorough iteration
- Comprehensive test coverage - 487 lines of new tests with boundary cases
- Clean architecture - all checkers follow ComplianceChecker interface
- sectionForHeader() isolates sections correctly - solves trigger/anti-trigger overlap
- Invalid adherence level short-circuit - well-designed
- Context-dependent anti-trigger risk fires only when anti-triggers are present
- skillBodyContent() prefers sk.Body over raw parsing with test verification
Suggestions (non-blocking)
- sectionForHeader() repeated strings.ToUpper() - Consider pre-computing once. Minor perf.
- errorHandlingPatterns includes error - Broad match. Consider heading-level patterns.
Summary
| Priority | Count |
|---|---|
| Critical | 0 |
| High | 0 |
| Medium | 2 |
| Low | 2 |
Overall Assessment: Approve - solid feature PR with comprehensive testing.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Brings waza's scoring engine in line with spboyer/sensei v1.3.0. Adds 7 features across scoring and checks:
Changes
WHEN:trigger pattern recognitionspec-securitycheck (XML tags, reserved name prefixes)Invalidscore level for >1024 char descriptionsFiles Changed (777 additions, 12 deletions)
internal/checks/advisory_checks.go— New: CrossModelDensityChecker, BodyStructureChecker, ProgressiveDisclosureCheckerinternal/checks/advisory_checks_test.go— Tests for all 3 advisory checkersinternal/checks/score_checkers.go— Register new checkers in pipelineinternal/checks/spec_checks.go— New: SpecSecurityCheckerinternal/checks/spec_checks_test.go— Tests for security checkerinternal/scoring/scoring.go— WHEN: trigger, Invalid level, context-dependent anti-triggersinternal/scoring/scoring_test.go— Tests for all scoring changesReview History
All tests pass.
go vetclean.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com