Skip to content

Detector: Semantic Drift

Jacob Centner edited this page Apr 10, 2026 · 1 revision

Detector: Semantic Drift

Compares documentation sections against referenced source code using an LLM to detect semantic inconsistencies.

Property Value
Name semantic-drift
Tier LLM_ASSISTED
Languages All (documentation vs. any source code)
External tool None
LLM required Yes — minimum basic capability tier
Confidence 0.60 (basic), 0.75 (enhanced)

What it detects

Documentation sections that describe code behavior inaccurately — functions with changed signatures, outdated API descriptions, examples that no longer work, configuration docs that don't match the actual schema.

This is Sentinel's core differentiating detector — it finds issues that no static tool can catch.

How it works

  1. Parses markdown files into heading-delimited sections
  2. Finds file and function references:
    • Backtick paths: `src/config.py`
    • Markdown links: [config](src/config.py)
    • Prose references: the config module
    • Backtick symbols: `load_config()`
  3. Extracts relevant code from referenced files (Python AST for functions, regex for others)
  4. Sends (doc section, code excerpt) pairs to the LLM
  5. LLM judges whether the documentation accurately describes the code

Capability tiers

Tier Behavior
basic Binary comparison (accurate/inaccurate), truncated to 800/2000 chars
standard+ Enhanced prompts, larger context (1500/3000 chars), nuanced severity

Key docs scanned

Only scans important documentation files: README, CONTRIBUTING, API, GUIDE, ARCHITECTURE, SETUP, INSTALL, CHANGELOG, MIGRATION.

Example finding

[SEMANTIC-DRIFT] docs/api-guide.md → src/sentinel/config.py:load_config
  Documentation says load_config accepts a string path, but the function
  signature now accepts str | Path. The return type documentation is also
  missing the new skip_llm field.
  Severity: MEDIUM, Confidence: 0.60

Configuration

[sentinel]
model_capability = "basic"     # minimum for this detector
skip_llm = false              # must be false

Or use a dedicated model for this detector:

[sentinel.detector_providers.semantic-drift]
provider = "openai"
model = "gpt-4o-mini"
api_base = "https://api.openai.com/v1"
api_key_env = "OPENAI_API_KEY"
model_capability = "standard"

Known limitations

  • Requires a healthy LLM provider
  • Quality depends on model capability — small models may miss subtle inconsistencies
  • Only scans key documentation files (not every .md in the repo)
  • Not yet validated at scale against real-world repos

Clone this wiki locally