Skip to content

[plan] Create "test-quality-sentinel" agentic workflow #25320

@github-actions

Description

@github-actions

Context

From the Scout analysis of issue #25311, this workflow addresses Problem 4: False Comfort from Tests.

"Tests created a similar false comfort. Having 500+ tests felt reassuring... there were several times in the vibe-coding phase where a new test case revealed that the design of some component was completely wrong."

High test counts create an illusion of safety. The real signal is whether tests cover behavioral contracts and design invariants — not just happy-path implementations.

Objective

Create a new gh-aw agentic workflow called test-quality-sentinel that analyzes test quality beyond code coverage percentages on every PR.

Workflow Prompt

Create an agentic GitHub Actions workflow called "test-quality-sentinel" that
analyzes test quality beyond code coverage percentages.

The workflow must:
1. On every PR, analyze new and changed tests to detect:
   - Tests that only test implementation details (mocking internal functions
     rather than testing observable behavior)
   - Tests that lack assertions about error/edge cases (only testing the
     happy path)
   - Test files that grew proportionally faster than the code they test
     (possible "test inflation" — quantity without quality)
   - Duplicated test logic that suggests tests are generated without intent
2. Use an AI agent to review the tests in the PR diff and answer:
   a. "What design invariant does this test enforce?"
   b. "What would break in the system if this test were deleted?"
   c. "Does this test cover a behavioral contract or just an implementation detail?"
3. Post a PR comment with:
   - A "Test Quality Score" (0–100) based on the above criteria
   - Specific tests flagged for review with AI-generated improvement suggestions
   - A distinction between "design tests" (high value) vs "implementation tests"
     (low value, prone to false assurance)
4. Fail the check if >30% of new tests are classified as low-value
   implementation tests

Use AST analysis + AI review. Support pytest (Python) and #[test] blocks (Rust).

Files to Create

  • .github/workflows/test-quality-sentinel.md — the workflow markdown file

Acceptance Criteria

  • Workflow triggers on every PR
  • Detects implementation-detail tests, happy-path-only tests, test inflation, and duplication
  • AI agent answers the 3 quality questions per test
  • Posts "Test Quality Score" (0–100) comment with per-test feedback
  • Distinguishes "design tests" vs "implementation tests"
  • Fails check if >30% of new tests are low-value
  • Supports Python (pytest) and Rust (#[test])
  • Compiled .lock.yml generated via make recompile
    Related to Blog analysis #25311

Generated by Plan Command for issue #25311 · ● 394.6K ·

  • expires on Apr 10, 2026, 1:57 PM UTC

Metadata

Metadata

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions