Skip to content

Detector: Intent Comparison

Jacob Centner edited this page Apr 13, 2026 · 1 revision

Detector: Intent Comparison

Multi-artifact triangulation — compares code, docstring, tests, and documentation for contradictions.

Property Value
Name intent-comparison
Tier LLM_ASSISTED
Languages Python
External tool None
LLM required Yes — minimum advanced capability tier (frontier-class model recommended)
Confidence 0.55 (basic), 0.70 (enhanced)

What it detects

Contradictions across up to 4 independent sources of intent for the same symbol:

  1. Code body — what the function actually does
  2. Docstring — what the author says it does
  3. Test functions — what the tests expect it to do
  4. Documentation sections — what the docs tell users it does

Catches inconsistencies that pairwise detectors (semantic-drift, test-coherence, inline-comment-drift) miss — e.g., tests expect one behavior, docs describe another, and the code does a third thing.

How it works

  1. Discovers Python implementation files (excludes test files)
  2. Extracts symbols (functions/classes with ≥3 code lines)
  3. Builds lookup tables for test functions (test_<name> matching) and doc sections (backtick references in .md/.rst)
  4. For each symbol, gathers available artifacts. Only proceeds if ≥3 artifacts are available (triangulation requirement — code always counts as 1, so needs 2+ of: docstring, tests, doc sections)
  5. Sends all artifacts in a single prompt asking for contradictions between any pair
  6. Creates one finding per contradiction

Limits

Parameter Value
Min artifacts 3 (code + 2 others)
Max per file 10 symbols
Max per scan 50 symbols
Max test functions per symbol 3
Max doc sections per symbol 2
Default num_ctx 4096

Capability tiers

Tier Behavior
basic Concise "find factual contradictions" prompt, shorter context, confidence 0.55
advanced Detailed role-based prompt with explicit ignore list (style, verbosity, coverage), larger context, confidence 0.70

Example finding

[INTENT-COMPARISON] src/sentinel/config.py:parse_config (line 22)
  Contradiction between test vs documentation: Test expects parse_config()
  to raise ValueError on missing keys, but the architecture doc says it
  returns defaults for missing keys.
  Severity: MEDIUM, Confidence: 0.55

Model recommendation

This detector requires frontier-class models. The multi-artifact prompts are significantly larger than pairwise detectors, and reliable cross-reference reasoning needs stronger models. Use gpt-5.4-nano or better. Configure via per-detector providers:

[sentinel]
model = "qwen3.5:4b"   # main model for other detectors

[sentinel.detector_providers.intent-comparison]
provider = "openai"
model = "gpt-5.4-nano"
api_base = "https://api.openai.com/v1"
api_key_env = "OPENAI_API_KEY"
model_capability = "advanced"

Clone this wiki locally