-
Notifications
You must be signed in to change notification settings - Fork 0
Detector: Test Coherence
Jacob Centner edited this page Apr 10, 2026
·
1 revision
Compares test functions against their implementation counterparts using an LLM to detect stale or meaningless tests.
| Property | Value |
|---|---|
| Name | test-coherence |
| Tier | LLM_ASSISTED |
| Languages | Python |
| External tool | None |
| LLM required |
Yes — minimum basic capability tier |
| Confidence | 0.60 (basic), 0.75 (enhanced) |
Tests that no longer meaningfully validate the code they claim to test:
- Test assertions that don't match the current function behavior
- Tests that test an old API signature
- Tests that are trivially always-passing
- Tests that duplicate other tests without adding coverage
- Finds test files matching
test_*.pyor*_test.py - Pairs test files with implementation files via:
- Naming convention:
test_config.py→config.py - Import analysis: follows imports in the test file
- Naming convention:
- Extracts matched (test function, implementation function) pairs via AST
- Sends each pair to the LLM for coherence assessment
- LLM judges whether the test meaningfully validates the current implementation
| Parameter | Value |
|---|---|
| Max pairs per file | 5 (prevents runaway LLM costs) |
| Min function lines | 3 (skips trivial functions) |
| Truncation (basic) | 1500 chars |
| Truncation (enhanced) | 3000 chars |
| Tier | Behavior |
|---|---|
basic |
Binary coherent/stale judgment, smaller context |
standard+ |
Nuanced assessment with severity levels, larger context |
[TEST-COHERENCE] tests/test_config.py:test_load_config → src/sentinel/config.py:load_config
Test checks for 'model' field default of 'llama2' but the implementation
default was changed to 'qwen3.5:4b'. Test is stale.
Severity: MEDIUM, Confidence: 0.60
[sentinel]
model_capability = "basic" # minimum for this detector
skip_llm = false # must be false- Python-only
- Requires a healthy LLM provider
- Quality depends on model capability
- Only pairs tests via naming convention and imports — misses indirect test coverage
- Not yet validated at scale against real-world repos
- Max 5 pairs per file may miss issues in files with many functions
Local Repo Sentinel · MIT License
Getting Started
Reference
Detectors
- Detector: Todo Scanner
- Detector: Complexity
- Detector: Dead Code
- Detector: Dep Audit
- Detector: Docs Drift
- Detector: Unused Deps
- Detector: Lint Runner
- Detector: ESLint Runner
- Detector: Go Linter
- Detector: Rust Clippy
- Detector: Git Hotspots
- Detector: Stale Env
- Detector: Semantic Drift
- Detector: Test Coherence
- Detector: CI/CD Drift
- Detector: Architecture Drift
- Detector: Inline Comment Drift
- Detector: Intent Comparison
Advanced
Workflow