GitHub - lipex360x/sourcemap-indexer: Codebase indexer powered by local LLM

sourcemap-indexer

Index any codebase into SQLite and enrich file metadata via an LLM — so an AI assistant can understand large projects through SQL queries instead of reading every file.

Index

#	Section
1	How it works
2	Prerequisites
3	Installation
4	Quickstart
5	Commands
6	Configuration
7	Ignoring files
8	Custom layers
9	Project metadata
10	AI assistant skill
11	Post-commit hook
12	SQLite schema
13	Dev setup
14	Code quality

1. How it works

sourcemap-indexer runs in three phases. Each phase writes into the same SQLite database, adding a new layer of information on top of what the previous phase produced:

flowchart LR
    A["sourcemap init<br/><i>one-time</i>"] --> B[("index.db<br/>empty schema")]
    B --> C["sourcemap walk<br/><i>after code changes</i>"]
    C --> D[("index.db<br/>paths, languages,<br/>hashes, line counts")]
    D --> E["sourcemap enrich<br/><i>calls an LLM</i>"]
    E --> F[("index.db<br/>+ purpose, layer,<br/>tags, side effects")]

Note

init and walk are fully offline — no LLM required. Only enrich calls an external model.

Phase 1 — `sourcemap init`

Creates the directory structure needed by the other commands:

your-project/
├── .sourcemap/
│   ├── index.db          ← SQLite database (all metadata lives here)
│   ├── index.yaml        ← YAML snapshot of the last walk (intermediate file)
│   ├── layers.yaml       ← user-defined layer names (optional)
│   ├── project.yaml      ← project metadata shown by brief (optional)
│   └── logs/             ← LLM debug logs (only when SOURCEMAP_LLM_LOG=1)
└── .sourcemapignore      ← gitignore-syntax exclusion rules

Note

The output directory defaults to .sourcemap/ and can be changed via SOURCEMAP_MAPS_DIR. See Environment variables.

Note

init is idempotent — safe to run multiple times. It never overwrites an existing .sourcemapignore or database.

Phase 2 — `sourcemap walk`

Scans the project tree and updates the database in three internal steps:

Scan — traverses all files (respecting .gitignore and .sourcemapignore), collects path, language, line count, size, content hash, and last-modified timestamp. On the second and subsequent runs, files whose mtime and size match the SQLite record are skipped entirely — only changed files are read and re-hashed. This makes walk scale to large codebases: a 10 000-file tree with 5 changed files reads 5 files instead of 10 000.
Write — serializes the result to index.yaml inside .sourcemap/ (human-readable snapshot of every tracked file; planned for removal in a future release once SQLite becomes the sole source of truth)
Sync — reads index.yaml and reconciles the SQLite database:
- New file → inserted with needs_llm = true
- File changed (hash diff) → updated with needs_llm = true
- File removed → soft-deleted (kept in DB with deleted_at timestamp)
- File unchanged → skipped

What index.yaml looks like

version: 1
generated_at: 1745000000
root: /path/to/your-project
files:
  - path: src/auth/login.ts
    language: ts
    lines: 82
    size_bytes: 2104
    content_hash: a3f1...
    last_modified: 1744900000
  - path: src/auth/logout.ts
    ...

This file is checked in to source control optionally — it gives a plain-text audit trail of what was indexed.

What you get without an LLM

After walk, the database already holds language, line count, size, and hash for every file. Run sourcemap stats to see the structural breakdown:

╭─ Sync ──────────────────────────────────────────────────────────╮
│ Inserted: 298   Updated: 0   Soft-deleted: 0                    │
╰─────────────────────────────────────────────────────────────────╯
╭─ Stats ─────────────────────────────────────────────────────────╮
│ Root   /your/project                                            │
│ LLM    not configured                                           │
│ Total: 298      Enriched: 0      Pending: 298                   │
│ ○○○○○○○○○○○○○○○○○○○○  0%                                        │
╰─────────────────────────────────────────────────────────────────╯
╭─ By layer ──────────────────────────────────────────────────────╮
│   unknown   298  ○○○○○○○○○○○○○○○○○○○○                           │
╰─────────────────────────────────────────────────────────────────╯
╭─ By language ───────────────────────────────────────────────────╮
│   py      114  ○○○○○○○○○○○○○○○○○○○○                             │
│   tsx      46  ○○○○○○○○                                         │
│   ts       43  ○○○○○○○○                                         │
│   md        9  ○○                                               │
│   sql       9  ○○                                               │
│   yaml      8  ○                                                │
│   json      7  ○                                                │
╰─────────────────────────────────────────────────────────────────╯
              ● all enriched  |  ● has pending  |  ○ not yet enriched

All files start at layer unknown — layers are assigned by the LLM during enrich. Language detection is immediate and requires no enrichment.

Phase 3 — `sourcemap enrich`

For every file marked needs_llm = true, enrichment:

Reads the file content from disk
Sends path + language + content to the LLM with a structured prompt
Stores the LLM response back into SQLite:

Field	What it contains
`purpose`	One-sentence description of what the file does
`layer`	Architectural layer (`domain`, `infra`, `application`, `cli`, `lib`, …)
`stability`	`core`, `stable`, `experimental`, or `deprecated`
`tags`	Semantic keywords (e.g. `authentication`, `rate-limiting`)
`side_effects`	I/O boundaries (`network`, `writes_fs`, `git`, `spawns_process`)
`invariants`	Key behavioral contracts the file upholds

After enrichment, needs_llm is cleared and llm_hash is set to the content hash at the time of enrichment — so future walks can detect drift.

Important

Enrichment calls the LLM for every pending file. For large codebases, use --limit N to process in batches and avoid timeouts or rate limits.

Note

Set SOURCEMAP_LLM_LOG=1 to record every LLM request and response to a timestamped YAML file. Logs land in .sourcemap/logs/ by default (or inside the directory set by SOURCEMAP_MAPS_DIR). Each enrich session produces one file (llm-YYYYMMDD-HHMMSSffffff.yaml) containing one YAML document per enriched file — useful for debugging prompts or auditing model output.

↑ back to top

2. Prerequisites

Requirement	Version	Notes
uv	any	Used for installation and tool management
Python	3.11+	Managed automatically by `uv tool install`
An OpenAI-compatible LLM	—	Required only for `sourcemap enrich`

Note

uv tool install pulls the correct Python version automatically. You do not need to install Python separately.

Important

sourcemap enrich calls an LLM. Without a reachable endpoint (SOURCEMAP_LLM_URL), walk and stats work fine — only enrichment is blocked.

Installing uv

macOS

curl -LsSf https://astral.sh/uv/install.sh | sh

Or via Homebrew:

brew install uv

Linux

curl -LsSf https://astral.sh/uv/install.sh | sh

Add ~/.local/bin to your PATH if not already present (the installer will prompt you).

Windows

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Or via WinGet:

winget install --id=astral-sh.uv -e

After installation, restart your terminal and verify with uv --version.

↑ back to top

3. Installation

uv tool install "git+https://github.com/lipex360x/sourcemap-indexer.git@main"

To upgrade:

uv tool upgrade sourcemap-indexer

To uninstall:

uv tool uninstall sourcemap-indexer

The binary lives at ~/.local/bin/sourcemap. The tool environment is at ~/.local/share/uv/tools/sourcemap-indexer/.

↑ back to top

4. Quickstart

cd <your-project>
sourcemap init    # create .sourcemap/, .sourcemapignore, index.db
sourcemap walk    # scan files and sync into SQLite
sourcemap enrich  # call LLM to annotate each file
sourcemap stats   # auto-walks first, then shows totals and pending files

Note

sourcemap stats automatically runs walk before displaying data — no need to run walk manually before stats.

↑ back to top

5. Commands

All commands are invoked as sourcemap <command>.

Setup

Command	Description
`init`	Create the maps directory, `.sourcemapignore`, and `index.db`
`walk`	Scan files and sync metadata into SQLite

Enrichment

Command	Description
`enrich`	Send pending files to the LLM
`stale`	List files whose content changed since the last enrich run

enrich flags:

Flag	Description
`--limit N`	Process at most N files per run
`--force`	Re-enrich already enriched files
`--file <path>`	Target a single specific file
`--layer <layer>`	Filter by architectural layer
`--language <lang>`	Filter by language
`-m "<msg>"`	Inject an extra instruction into the LLM prompt
`--with-context`	Inject depth-1 import context from indexed dependencies into the prompt (Python, TypeScript, JavaScript, TSX; off by default)
`--export-llm-prompt`	Write the active prompt to a `.md` file before running (defaults to `maps dir/prompt.md`)
`--output <path>`	Destination `.md` file for `--export-llm-prompt`

--with-context resolves each file's direct imports (depth 1 only), looks up their purpose from the SQLite index, and prepends a context block to the LLM prompt:

Context from direct imports:
- src/domain/cart.py: validates cart items and calculates totals
- src/infra/payment.py: handles Stripe API calls

Pending files are automatically sorted by their dependency graph (topological order) before enrichment — leaf files are processed first. This means --with-context produces non-empty context blocks in a single pass, even on a fresh index.

Supported languages: Python, TypeScript, JavaScript, TSX. For TS/JS/TSX, the extractor returns extension candidates (.ts, .tsx, .js, .jsx, index.ts, index.tsx) and the index disambiguates automatically. export … from re-exports and tsconfig path aliases are not resolved.

Constraints: depth 1 only (no transitive traversal); context is capped at 2000 characters — imports beyond the budget are dropped silently; unknown languages and imports not yet indexed produce no context (silent degradation).

Exploration

Command	Description
`brief`	Single-call project briefing — architecture, domain files, tags, side effects, risk areas (includes project metadata when `.sourcemap/project.yaml` is present)
`brief --verbose` (or `-v`)	Same as `brief` plus a Files by layer section listing every enriched file with its 1-line `purpose`, grouped by layer — use when aggregate counts hide the concept you are looking for (common on documentation-heavy projects)
`chapters`	Table of contents — enriched files grouped by layer and sorted by path (ideal for documentation-heavy projects)
`contracts`	Invariants grouped by layer and file — the semantic contracts captured during enrichment
`validate`	CI gate — verify every file on disk is indexed. Outputs `PASS:sourcemap-db` (exit 0) or one `MISSING:path` per unindexed file (exit 1). Run after `walk` in pre-commit hooks
`profile`	Language distribution, inferred layers, test ratio, top files by size
`stats`	Auto-runs walk; counts by layer and language; bar width = relative file count; green = enriched, yellow = pending
`overview`	Layer × language matrix
`domain`	Enriched domain-layer files with their purpose
`effects`	Files with network or git side effects
`tags`	Top 30 semantic tags by frequency
`unstable`	Experimental or deprecated files
`find`	Search files by tag, layer, or language
`show <path>`	Full metadata for a specific file
`query "<sql>"`	Free-form SQL against the index database

chapters and contracts flags:

Flag	Description
`--layer L`	Restrict the output to a single layer

stats flags:

Flag	Description
`--files`	List pending files below the counts
`--page N`	Paginate the pending list (requires `--files`)

find flags:

Flag	Description
`--tag T`	Filter by semantic tag
`--layer L`	Filter by architectural layer
`--language L`	Filter by language

Maintenance

Command	Description
`reset`	Delete the index (offers a timestamped backup before wiping)
`restore`	Restore `index.db` from a previously saved `.bak` file
`install-skill`	Copy the skill file to your AI assistant's skills directory (`--target <dir>`)

↑ back to top

6. Configuration

sourcemap enrich reads a .env file from the project root before resolving env vars. Variables already present in the shell environment take precedence.

Pick a provider

Provider	Auth	Cost	Best for
`http`	API key (Bearer)	per-token	OpenAI / OpenRouter / z.ai / local LM Studio / any OpenAI-compatible endpoint
`claude-cli`	Claude Code login	Claude.ai subscription	Claude Code subscribers
`opencode`	OpenCode CLI config	depends on configured backend	OpenCode users routing through their existing setup
`gemini-cli`	Google OAuth	free tier (~1 000 req/day)	Gemini free-tier users

Set SOURCEMAP_LLM_PROVIDER to one of the above (default: http).

Common variables

These apply to every provider:

Variable	Default	Description
`SOURCEMAP_LLM_PROVIDER`	`http`	Backend selector — see table above
`SOURCEMAP_LLM_LOG`	(off)	`1` writes LLM request/response logs to `logs/` inside the maps directory
`SOURCEMAP_PAGE_SIZE`	`20`	Number of pending files shown per page in `stats`
`SOURCEMAP_MAPS_DIR`	`.sourcemap`	Output directory for `index.db`, `index.yaml`, `layers.yaml`, and logs — relative to project root or absolute
`SOURCEMAP_IMPORT_LLM_PROMPT`	(off)	Path to a `.md` file — `enrich` reads it and sends its contents as the system prompt instead of the built-in default. Must have `.md` extension

Tip

Typical workflow: run sourcemap enrich --export-llm-prompt once to dump the default prompt, edit the generated file, then set SOURCEMAP_IMPORT_LLM_PROMPT to its path for subsequent runs.

After choosing a provider, copy the matching .env template below and run:

sourcemap enrich --limit 10

`http` — OpenAI-compatible endpoint (default)

Provider-specific variables:

Variable	Required	Description
`SOURCEMAP_LLM_URL`	yes	Endpoint URL (any OpenAI-compatible chat completions API)
`SOURCEMAP_LLM_MODEL`	yes	Model name passed to the endpoint
`SOURCEMAP_LLM_API_KEY`	depends	Bearer token — required by hosted providers, not needed for local servers

.env — OpenAI:

# .env  (add to .gitignore)
SOURCEMAP_LLM_PROVIDER=http
SOURCEMAP_LLM_URL=https://api.openai.com/v1/chat/completions
SOURCEMAP_LLM_MODEL=gpt-4o
SOURCEMAP_LLM_API_KEY=sk-...

.env — OpenRouter (free tier available):

# .env  (add to .gitignore)
SOURCEMAP_LLM_PROVIDER=http
SOURCEMAP_LLM_URL=https://openrouter.ai/api/v1/chat/completions
SOURCEMAP_LLM_MODEL=openai/gpt-oss-120b:free
SOURCEMAP_LLM_API_KEY=sk-or-v1-...
SOURCEMAP_LLM_LOG=1

.env — z.ai:

# .env  (add to .gitignore)
SOURCEMAP_LLM_PROVIDER=http
SOURCEMAP_LLM_URL=https://api.z.ai/api/coding/paas/v4/chat/completions
SOURCEMAP_LLM_MODEL=glm-5.1
SOURCEMAP_LLM_API_KEY=your-api-key

.env — local (LM Studio, no API key needed):

# .env  (add to .gitignore)
SOURCEMAP_LLM_PROVIDER=http
SOURCEMAP_LLM_URL=http://localhost:1234/v1/chat/completions
SOURCEMAP_LLM_MODEL=your-loaded-model-name

`claude-cli` — Claude Code subscription

Note

Runs via claude -p (Claude Code CLI). Requires Claude Code installed and authenticated — does not work with other claude CLI tools.

When SOURCEMAP_LLM_PROVIDER=claude-cli, the SOURCEMAP_LLM_URL, SOURCEMAP_LLM_MODEL, and SOURCEMAP_LLM_API_KEY variables are ignored — you can keep them in .env without conflict.

Provider-specific variables:

Variable	Default	Description
`SOURCEMAP_LLM_CLI_MODEL`	(Claude default)	Model — e.g. `claude-haiku-4-5-20251001`, `claude-sonnet-4-6`, `claude-opus-4-7`
`SOURCEMAP_LLM_CLI_EFFORT`	(Claude default)	Thinking budget — `low`, `medium`, `high`, `xhigh`, `max`

Setup:

npm install -g @anthropic-ai/claude-code
claude auth login

.env:

# .env  (add to .gitignore)
SOURCEMAP_LLM_PROVIDER=claude-cli
SOURCEMAP_LLM_CLI_MODEL=claude-sonnet-4-6
SOURCEMAP_LLM_CLI_EFFORT=high
SOURCEMAP_LLM_LOG=1

`opencode` — OpenCode CLI

Note

Runs via opencode run. Requires OpenCode installed and configured with at least one model provider.

When SOURCEMAP_LLM_PROVIDER=opencode, the SOURCEMAP_LLM_URL, SOURCEMAP_LLM_MODEL, SOURCEMAP_LLM_API_KEY, and SOURCEMAP_LLM_CLI_EFFORT variables are ignored.

Provider-specific variables:

Variable	Default	Description
`SOURCEMAP_LLM_CLI_MODEL`	(OpenCode default)	Any model ID recognised by your OpenCode config — e.g. `anthropic/claude-sonnet-4-6`, `openrouter/openai/gpt-oss-120b:free`

Setup:

npm install -g opencode-ai

.env:

# .env  (add to .gitignore)
SOURCEMAP_LLM_PROVIDER=opencode
SOURCEMAP_LLM_CLI_MODEL=openrouter/openai/gpt-oss-120b:free
SOURCEMAP_LLM_LOG=1

Note

When routing through OpenRouter, set your API key in OpenCode's own provider config — sourcemap passes the prompt to opencode run and does not forward SOURCEMAP_LLM_API_KEY to it.

`gemini-cli` — Google Gemini free tier

Note

Runs via gemini -p (Gemini CLI) authenticated with a personal Google account. The default model gemini-3-flash-preview is recommended — gemini-2.5-pro exists but the free tier exhausts quota quickly under enrichment workloads.

When SOURCEMAP_LLM_PROVIDER=gemini-cli, the SOURCEMAP_LLM_URL, SOURCEMAP_LLM_MODEL, SOURCEMAP_LLM_API_KEY, and SOURCEMAP_LLM_CLI_EFFORT variables are ignored.

Provider-specific variables:

Variable	Default	Description
`SOURCEMAP_LLM_CLI_MODEL`	(Gemini default — `gemini-3-flash-preview`)	Model — e.g. `gemini-3-flash-preview`, `gemini-2.5-pro`

Setup:

brew install gemini-cli
gemini   # interactive — sign in with your Google account on first run

.env:

# .env  (add to .gitignore)
SOURCEMAP_LLM_PROVIDER=gemini-cli
SOURCEMAP_LLM_LOG=1

Note

sourcemap always passes --skip-trust to gemini so headless runs are not blocked by the trusted-folder gate. Stderr noise from gemini (housekeeping warnings, IDE-companion errors) is ignored — only the response field of the JSON output is used. Quota errors (exhausted your capacity) short-circuit to gemini-cli-quota-exhausted instead of waiting for the binary's internal retry loop.

↑ back to top

7. Ignoring files

.sourcemapignore uses the same syntax as .gitignore. Both files are read automatically — no extra config needed.

Built-in defaults (always excluded)

node_modules/   .git/         .venv/        __pycache__/
dist/           build/        .next/        .turbo/
coverage/       .sourcemap/   *.pyc         *.min.js
*.lock          *.db          *.sqlite      *.map

If you change SOURCEMAP_MAPS_DIR, add your custom directory here too so it is not indexed.

Add project-specific patterns to .sourcemapignore:

# exclude by extension
*.png
*.jpg
*.svg
*.ico
*.woff2

# exclude directories
secrets/
storybook-static/
public/assets/

# exclude specific files
src/generated/schema.ts

Pattern rules:

Pattern	Effect
`*.png`	All `.png` files anywhere in the tree
`assets/`	Entire directory (trailing slash = directory)
`src/generated/`	Subdirectory under a specific path
`#` at line start	Comment — line is ignored

↑ back to top

8. Custom layers

By default, the LLM assigns one of the built-in layers (domain, infra, application, cli, lib, config, hook, doc, test, unknown). If your project uses a different architecture, you can declare additional layer names in .sourcemap/layers.yaml:

layers:
  - presentation
  - gateway
  - jobs

sourcemap init creates the file with a commented-out example. Any layer name listed here is treated as valid — the LLM can assign it and find --layer will match it.

Note

Built-in layers always remain valid. layers.yaml only adds names; it does not replace the defaults.

Tip

After adding new layers, run sourcemap enrich --force --layer unknown to re-classify files that were previously unrecognised.

Documentation-heavy projects

Default layers are code-oriented. For projects that are primarily documentation (blueprints, standards, specifications), declare layers that match the document taxonomy so brief, chapters, and find --layer work at the right granularity. Example for a governance/blueprint repository:

layers:
  - foundations    # principles, philosophy, testing mandates
  - enforcement    # tools, scripts, cognitive-load limits
  - operations     # logging, error contracts, roadmap
  - shared         # stack-invariant configuration
  - stacks         # per-stack tool mappings
  - meta           # manifest-like documents

After declaring the layers, run sourcemap enrich --force so the LLM reclassifies every file using the richer taxonomy.

Note

The system prompt is extended automatically when custom layers are declared. The LLM is told to prefer a user-defined layer over a generic default (doc, config, unknown) whenever the file's top-level directory name matches a custom layer.

If a mismatch slips through anyway — e.g. the model chose doc for a file under foundations/ — sourcemap enrich prints a highlighted Layer mismatches section at the end of the run. No need to open logs or query the DB: the warning lists each path with its chosen and expected layer. Re-run sourcemap enrich --force --file <path> to retry.

↑ back to top

9. Project metadata

brief can display a short metadata header when .sourcemap/project.yaml exists. All fields are optional — any field left out is skipped from the output:

name: engineering-blueprint
version: 1
purpose: Language-agnostic foundation for building software with Claude-assisted workflows.
audience:
  - claude
  - engineer
license: MIT

audience accepts a list or a single string. version accepts any scalar (string, integer, etc.) and is rendered verbatim.

When the file is absent, brief renders without a project section — no regressions for existing projects.

Note

project.yaml is purely informational. It does not change enrichment behaviour and is never sent to the LLM.

↑ back to top

10. AI assistant skill

Install the bundled skill file so your AI assistant can query the index directly:

# Claude Code
sourcemap install-skill --target ~/.claude/skills

# Any other tool — point to its skills directory
sourcemap install-skill --target <your-tool-skills-dir>

↑ back to top

11. Post-commit hook (auto-walk on every commit)

bash scripts/bash/install-hook.sh

Installs a post-commit hook that runs sourcemap walk after every commit, keeping the index current.

Note

Enrichment is not automatic — it calls the LLM and can be slow. Run sourcemap enrich manually when you want updated metadata.

↑ back to top

12. SQLite schema

One core table (items) holds a row per file. Three satellite tables store the multi-valued LLM output (a file has many tags, many side effects, many invariants):

erDiagram
    items ||--o{ tags : has
    items ||--o{ side_effects : has
    items ||--o{ invariants : has

    items {
        int id PK
        string path
        string name
        string language
        string layer
        string stability
        string purpose
        int lines
        int size_bytes
        string content_hash
        string llm_hash
        bool needs_llm
        timestamp deleted_at
        timestamp llm_at
    }
    tags {
        int item_id FK
        string tag
    }
    side_effects {
        int item_id FK
        string effect
    }
    invariants {
        int item_id FK
        string invariant
    }

Walk fills: path, name, language, lines, size_bytes, content_hash, needs_llm, deleted_at. Enrich fills: purpose, layer, stability, llm_hash, llm_at, plus rows in tags / side_effects / invariants.

Side effects: writes_fs | spawns_process | network | git | environ

↑ back to top

13. Dev setup

git clone https://github.com/lipex360x/sourcemap-indexer.git
cd sourcemap-indexer
uv sync
uv run pytest

↑ back to top

14. Code quality

Every commit passes a pre-commit pipeline that enforces the following gates:

Using `validate` in pre-commit hooks

sourcemap validate is designed as a CI gate: it checks that every file on disk was indexed by the last walk run. Pair them in a pre-commit hook:

sourcemap walk --root "$PROJECT_ROOT"
sourcemap validate --root "$PROJECT_ROOT"

Exit codes: 0 = all files indexed, 1 = one or more files missing from the index. Output is machine-parseable: PASS:sourcemap-db on success, MISSING:<path> per unindexed file on failure.

Automated gates (pre-commit / pre-push)

Tool	What it checks	Config
ruff	Style, imports, simplification (`SIM`), returns (`RET`), bugbear (`B`), upgrades (`UP`), security (`S`)	`pyproject.toml [tool.ruff.lint]`
ruff format	Consistent formatting (replaces Black)	`pyproject.toml [tool.ruff]`
McCabe complexity	No function exceeds cyclomatic complexity 5 (`C901`)	`pyproject.toml [tool.ruff.lint.mccabe]`
mypy	Full strict type checking — no `Any`, no untyped functions	`pyproject.toml [tool.mypy]`
bandit	Deep security scan — severity/confidence filtering, broader rule set	`pyproject.toml [tool.bandit]`
vulture	Dead code detection — unused functions and variables	—
pylint C0103	Naming convention enforcement — no abbreviations (`msg`, `cfg`, `err`, …)	`pyproject.toml [tool.pylint]`
pytest + coverage	Test suite must pass at ≥ 95% line coverage	`pyproject.toml [tool.pytest]`

Testing strategy

TDD mandatory — every behaviour is covered by a test written before the implementation (Red → Green)
No mocks on persistence — tests hit a real in-memory SQLite database (":memory:"); concurrency tests use a file-based DB via tmp_path (:memory: cannot be shared between threads)
No mocks on the filesystem — tests use tmp_path fixtures with real files
Integration tests run the full CLI via typer.testing.CliRunner end-to-end
Coverage minimum: 95% — enforced both by pytest and by the pre-push hook

Design decisions

Decision	Why
`Either[str, T]` monad	Explicit error propagation without exceptions — every fallible function returns `Left(error_token)` or `Right(value)`. No hidden control flow.
`Layer = str` (not StrEnum)	User-defined layers loaded from `layers.yaml` are unknown at import time. A `str` alias accepts any value; validation happens at the application boundary in `run_enrich`.
No comments in source	Names carry meaning. Comments that explain what code does rot as code evolves; the only permitted comments are for non-obvious why — hidden constraints, workarounds, subtle invariants.
Single output directory (`.sourcemap/`)	Config (`layers.yaml`, `ignore`) and data (`index.db`, `index.yaml`, `logs/`) live under one root. No two directories for the same concern.
`_DEFAULT_LAYERS \| user_layers`	The full valid-layer set is the union of built-in defaults and user-defined additions, computed at startup and passed through to `run_enrich` and `LlmClient`.
`BEGIN IMMEDIATE` + WAL in `init_db`	Migration apply is wrapped in a `BEGIN IMMEDIATE` transaction so two concurrent processes (e.g. parallel `walk` + `enrich`) cannot both pass the "already applied?" check and duplicate a migration. WAL journal mode reduces `SQLITE_BUSY` errors under concurrent readers.

↑ back to top

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
.docs/issues/approved		.docs/issues/approved
.github/workflows		.github/workflows
.semgrep/rules		.semgrep/rules
scripts		scripts
src/sourcemap_indexer		src/sourcemap_indexer
tests		tests
.gitignore		.gitignore
.sourcemapignore		.sourcemapignore
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

sourcemap-indexer

Index

1. How it works

Phase 1 — sourcemap init

Phase 2 — sourcemap walk

What you get without an LLM

Phase 3 — sourcemap enrich

2. Prerequisites

Installing uv

3. Installation

4. Quickstart

5. Commands

Setup

Enrichment

Exploration

Maintenance

6. Configuration

Pick a provider

Common variables

http — OpenAI-compatible endpoint (default)

claude-cli — Claude Code subscription

opencode — OpenCode CLI

gemini-cli — Google Gemini free tier

7. Ignoring files

8. Custom layers

Documentation-heavy projects

9. Project metadata

10. AI assistant skill

11. Post-commit hook (auto-walk on every commit)

12. SQLite schema

13. Dev setup

14. Code quality

Using validate in pre-commit hooks

Automated gates (pre-commit / pre-push)

Testing strategy

Design decisions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Phase 1 — `sourcemap init`

Phase 2 — `sourcemap walk`

Phase 3 — `sourcemap enrich`

`http` — OpenAI-compatible endpoint (default)

`claude-cli` — Claude Code subscription

`opencode` — OpenCode CLI

`gemini-cli` — Google Gemini free tier

Using `validate` in pre-commit hooks

Packages