rag-lint

A linter for RAG corpus access controls. Catches three classification-boundary bugs before you index a document folder.

Runs offline. No LLM calls at runtime.

What it checks

Rule	Severity	Flags
R001 · `missing-classification`	high	Document has no `classification` in frontmatter.
R002 · `near-duplicate-across-classifications`	medium	Two docs at different classifications are ≥30% near-identical paragraphs (cosine ≥0.90).
R003 · `cross-classification-semantic-overlap`	high	Two docs at different classifications share a paragraph (cosine ≥0.75). Suppressed when R002 fires for the same pair.

R001 catches "never labeled." R002 catches "wrong label on the whole doc." R003 catches "wrong label on one paragraph."

Threat model

Per-document ACLs work per-document. Documents have relationships — missing labels, misclassifications, cross-classification paragraph overlap — that per-document ACLs don't see. rag-lint runs before indexing and catches the three corpus-shape conditions that make ACL-based safety fragile.

It catches the precondition for a paraphrase leak across classifications. It does not prove a leak has occurred.

Install

Python 3.11+, uv.

git clone https://github.com/efuu/rag-lint
cd rag-lint
uv sync

First run downloads MiniLM (~90 MB). Subsequent runs are offline.

Usage

# stdout (default)
uv run rag-lint path/to/corpus

# HTML report with side-by-side hero view
uv run rag-lint path/to/corpus --format html -o report.html

Exit codes: 0 clean, 1 findings present, 2 usage error.

Frontmatter per document:

---
classification: restricted   # public | internal | confidential | restricted
---

Example

A 14-document financial-services corpus ships under data/corpus/acme_financial/:

uv run rag-lint data/corpus/acme_financial

Expect three findings (one per rule). Baselines committed in runs/fixtures/.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data/corpus/acme_financial		data/corpus/acme_financial
examples		examples
runs/fixtures		runs/fixtures
src/rag_lint		src/rag_lint
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rag-lint

What it checks

Threat model

Install

Usage

Example

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

rag-lint

What it checks

Threat model

Install

Usage

Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages