- π€ LLM usage: $0.5083 (6 commits)
- π€ Human dev: ~$424 (4.2h @ $100/h, 30min dedup)
Generated on 2026-04-20 using openrouter/qwen/qwen3-coder-next
Validate and refactor Markdown documentation against source code β detect outdated, orphaned, duplicate, and invalid docs using heuristics + optional LLM.
docs/ βββ chunk by heading βββ heuristic checks βββ cross-ref with code βββ (optional) LLM βββ report/fix
Three validation layers, each progressively deeper:
- Heuristic validator (fast, free) β empty sections, broken internal links, TODO/FIXME markers, duplicate detection via
difflib, stale version references, archive path detection, explicit deprecation markers - Cross-reference validator (fast, free) β checks that backtick-quoted symbols (
ClassName,function_name), import paths in code blocks, and CLI commands actually exist in the project source - LLM validator (optional, paid) β semantic validation via
litellmfor chunks that heuristics couldn't resolve with high confidence
pip install docvalWith LLM support:
pip install docval[llm]From source:
git clone https://github.com/wronai/docval.git
cd docval
pip install -e ".[dev]"docval scan docs/
docval scan docs/ --project /path/to/repo -v
docval scan docs/ -o report.md
docval scan docs/ -o report.jsondocval fix docs/ # preview changes
docval fix docs/ --no-dry-run # apply fixes
docval fix docs/ --no-dry-run --llm # with LLM validationdocval patch docs/ -o fixes.txt
docval patch docs/ --llm --model gpt-4o -o fixes.txtdocval stats docs/export OPENAI_API_KEY=sk-...
docval scan docs/ --llm --model gpt-4o-mini
docval scan docs/ --llm --model anthropic/claude-sonnet-4-20250514
docval scan docs/ --llm --model groq/llama-3.3-70b-versatileAny model supported by litellm works.
from pathlib import Path
from docval.pipeline import scan
from docval.reporters import ConsoleReporter, MarkdownReporter
# Run validation
result = scan(
docs_dir=Path("docs/"),
project_root=Path("."),
use_llm=False,
)
# Print to console
ConsoleReporter(verbose=True).report(result)
# Write markdown report
MarkdownReporter().report(result, Path("validation-report.md"))from docval.chunker import chunk_directory
from docval.context import build_context
from docval.validators import HeuristicValidator, CrossRefValidator
# Chunk docs
doc_files = chunk_directory(Path("docs/"))
# Build project context
ctx = build_context(Path("."))
# Run heuristics
heuristic = HeuristicValidator(ctx=ctx)
heuristic.validate(doc_files)
# Cross-reference check
crossref = CrossRefValidator(ctx=ctx)
crossref.validate(doc_files)
# Inspect results
for f in doc_files:
for chunk in f.chunks:
if chunk.issues:
print(f"{f.relative_path}:{chunk.line_start} [{chunk.status.value}] {chunk.heading}")
for issue in chunk.issues:
print(f" {issue.severity.value}: {issue.message}")| Check | Layer | Example |
|---|---|---|
| Empty sections | Heuristic | Heading with no body text |
| Broken internal links | Heuristic | [guide](./deleted-file.md) |
| Deprecated markers | Heuristic | DEPRECATED, OBSOLETE, DO NOT USE |
| Archive path | Heuristic | Files in docs/archive/ directories |
| Stale versions | Heuristic | References to v1.x when project is v3.x |
| Duplicates | Heuristic | >80% similar content across files |
| TODO/FIXME | Heuristic | Unfinished documentation markers |
| Orphaned code refs | CrossRef | `NonExistentClass` in backticks |
| Broken imports | CrossRef | from mypackage.deleted import X in code blocks |
| Semantic accuracy | LLM | Content that doesn't match actual project behavior |
src/docval/
βββ cli.py # Click CLI: scan, fix, patch, stats
βββ pipeline.py # Orchestrates: discover β chunk β validate β report
βββ models.py # Data models: DocChunk, DocFile, ValidationResult
βββ chunker.py # MD β heading-based semantic chunks
βββ context.py # Build project context (AST, git, .toon files)
βββ validators/
β βββ heuristic.py # Rule-based checks (free, fast)
β βββ crossref.py # Code β docs cross-reference
β βββ llm_validator.py # Semantic validation via litellm
βββ actions/
β βββ executor.py # Apply fixes: delete, archive, patch
βββ reporters/
βββ console.py # Rich CLI output
βββ markdown_report.py # .md report
βββ json_report.py # .json for CI/CD
docval understands .toon.yaml files from the code2llm ecosystem. When present, it extracts module names, class names, and exported functions for cross-referencing, giving more accurate orphaned-reference detection.
Licensed under Apache-2.0.
Last updated by taskill at 2026-04-25 13:37 UTC
| Metric | Value |
|---|---|
| HEAD | 4fba32f |
| Coverage | β |
| Failing tests | β |
| Commits in last cycle | 6 |
Add markdown output for documentation generation (docs feature). Commit also added inclusion of commit messages in the markdown output.