Context
From the Scout analysis of issue #25311, this workflow addresses Problem 4: False Comfort from Tests.
"Tests created a similar false comfort. Having 500+ tests felt reassuring... there were several times in the vibe-coding phase where a new test case revealed that the design of some component was completely wrong."
High test counts create an illusion of safety. The real signal is whether tests cover behavioral contracts and design invariants — not just happy-path implementations.
Objective
Create a new gh-aw agentic workflow called test-quality-sentinel that analyzes test quality beyond code coverage percentages on every PR.
Workflow Prompt
Create an agentic GitHub Actions workflow called "test-quality-sentinel" that
analyzes test quality beyond code coverage percentages.
The workflow must:
1. On every PR, analyze new and changed tests to detect:
- Tests that only test implementation details (mocking internal functions
rather than testing observable behavior)
- Tests that lack assertions about error/edge cases (only testing the
happy path)
- Test files that grew proportionally faster than the code they test
(possible "test inflation" — quantity without quality)
- Duplicated test logic that suggests tests are generated without intent
2. Use an AI agent to review the tests in the PR diff and answer:
a. "What design invariant does this test enforce?"
b. "What would break in the system if this test were deleted?"
c. "Does this test cover a behavioral contract or just an implementation detail?"
3. Post a PR comment with:
- A "Test Quality Score" (0–100) based on the above criteria
- Specific tests flagged for review with AI-generated improvement suggestions
- A distinction between "design tests" (high value) vs "implementation tests"
(low value, prone to false assurance)
4. Fail the check if >30% of new tests are classified as low-value
implementation tests
Use AST analysis + AI review. Support pytest (Python) and #[test] blocks (Rust).
Files to Create
.github/workflows/test-quality-sentinel.md — the workflow markdown file
Acceptance Criteria
Generated by Plan Command for issue #25311 · ● 394.6K · ◷
Context
From the Scout analysis of issue #25311, this workflow addresses Problem 4: False Comfort from Tests.
High test counts create an illusion of safety. The real signal is whether tests cover behavioral contracts and design invariants — not just happy-path implementations.
Objective
Create a new gh-aw agentic workflow called
test-quality-sentinelthat analyzes test quality beyond code coverage percentages on every PR.Workflow Prompt
Files to Create
.github/workflows/test-quality-sentinel.md— the workflow markdown fileAcceptance Criteria
#[test]).lock.ymlgenerated viamake recompileRelated to Blog analysis #25311