docs: add LLM integration feature planning document

github-actions[bot] · github-actions[bot] · commit 3ac51388258b · 2026-02-23T14:56:07.000-07:00
diff --git a/docs/llm-integration.md b/docs/llm-integration.md
@@ -0,0 +1,137 @@
+# LLM Integration — Feature Planning
+
+> **Core principle:** Compute once at build time, serve compressed at query time. The graph tells you what's connected, the LLM tells you what it means, and the consuming AI gets both without reading raw code.
+
+## Architecture
+
+Two layers:
+
+1. **Build-time LLM enrichment** — during `codegraph build`, an LLM annotates each function/class with semantic metadata (summaries, purpose, side effects, etc.) and stores it in the graph DB.
+2. **Query-time token savings** — the consuming AI model (via MCP) gets pre-digested context instead of raw source code.
+
+```
+Code changes → codegraph build (+ LLM enrichment) → SQLite DB with semantic metadata
+                                                          ↓
+                                              AI model queries via MCP
+                                                          ↓
+                                              Gets structured summaries,
+                                              not raw code → saves tokens
+```
+
+---
+
+## Features by Category
+
+### Understanding & Documentation
+
+#### "What problem does this function solve?"
+- `summaries` table — LLM-generated one-liner per node, stored at build time
+- MCP tool: `explain_purpose <name>` — returns summary + caller context ("it's called by X to do Y")
+
+#### "Summarize this module in plain English"
+- Module-level rollup summaries — aggregate function summaries + dependency direction into a module narrative
+- MCP tool: `explain_module <file>` — returns module purpose, key exports, role in the system
+
+#### "Auto-generate meaningful docstrings"
+- `docstrings` column on nodes — LLM-generated, aware of callers/callees/types
+- CLI command: `codegraph annotate` — generates or updates docstrings for changed functions
+- Diff-aware: only regenerate for functions whose code or dependencies changed
+
+---
+
+### Code Review & Quality
+
+#### "Is this function doing too much?"
+- `complexity_notes` column — LLM assessment stored at build time: responsibility count, cohesion rating
+- Graph metrics feed into the assessment: fan-in, fan-out, edge count
+- MCP tool: `assess <name>` — returns complexity rating + specific concerns
+
+#### "Are there naming inconsistencies?"
+- `naming_conventions` metadata per module — detected patterns (camelCase, snake_case, verb-first, etc.)
+- CLI command: `codegraph lint-names` — LLM compares names against detected conventions, flags outliers
+
+#### "Smart PR review"
+- `diff-review` command — takes a diff, walks the graph for affected nodes, fetches their summaries
+- Returns: what changed, what's affected, risk assessment, suggested review focus areas
+- MCP tool: `review_diff <ref>` — structured review the consuming AI can relay to the user
+
+---
+
+### Refactoring Assistance
+
+#### "Can I safely split this file?"
+- `split_analysis <file>` — graph identifies clusters of tightly-coupled functions within the file, LLM suggests groupings
+- Returns: proposed split, edges that would cross file boundaries, risk of circular imports
+
+#### "Which functions are extraction candidates?"
+- `extraction_candidates` query — find functions called from multiple modules (high fan-in, low internal coupling)
+- LLM ranks them by utility: "this is a pure helper" vs "this has side effects, risky to move"
+
+#### "Suggest backward-compatible signature change"
+- `signature_impact <name>` — graph provides all call sites, LLM reads each one
+- Returns: suggested new signature, adapter pattern if needed, list of call sites that need updating
+
+---
+
+### Architecture & Design
+
+#### "Why does module A depend on module B?"
+- `dependency_path <A> <B>` — graph finds shortest path(s), LLM narrates each hop
+- Returns: "A imports X from B because A needs to validate tokens, and B owns the token schema"
+
+#### "What's the most fragile part of the codebase?"
+- `fragility_report` — combines graph metrics (high fan-in + high fan-out + on many paths) with LLM reasoning
+- `risk_score` column per node — computed at build time from graph centrality + LLM complexity assessment
+- CLI command: `codegraph hotspots` — ranked list of riskiest nodes with explanations
+
+#### "Suggest better module boundaries"
+- `boundary_analysis` — graph clustering algorithm identifies tightly-coupled groups that span modules
+- LLM suggests reorganization: "these 4 functions in 3 different files all deal with auth, consider consolidating"
+
+---
+
+### Onboarding & Navigation
+
+#### "Where should I start reading?"
+- `entry_points` query — graph finds roots (high fan-out, low fan-in) + LLM ranks by importance
+- `onboarding_guide` command — generates a reading order based on dependency layers
+- MCP tool: `get_started` — returns ordered list: "start here, then read this, then this"
+
+#### "What's the flow when a user clicks submit?"
+- `trace_flow <entry_point>` — graph walks the call chain, LLM narrates each step
+- Returns sequential narrative: "1. handler validates input → 2. calls createOrder → 3. writes to DB → 4. emits event"
+- `flow_narratives` table — pre-computed for key entry points at build time
+
+#### "What would I need to change to add feature X?"
+- `change_plan <description>` — LLM reads the description, graph identifies relevant modules, LLM maps out touch points
+- Returns: files to modify, functions to change, new functions needed, test coverage gaps
+
+---
+
+### Bug Investigation
+
+#### "What upstream functions could cause this bug?"
+- `trace_upstream <name>` — graph walks callers recursively, LLM reads each and flags suspects
+- `side_effects` column per node — pre-computed: "mutates state", "writes DB", "calls external service"
+- Returns ranked list: "most likely cause is X because it modifies the same state"
+
+#### "What are the side effects of calling this function?"
+- `effect_analysis <name>` — graph walks the full callee tree, aggregates `side_effects` from every descendant
+- Returns: "calling X will: write to DB (via Y), send email (via Z), log to file (via W)"
+- Pre-computed at build time, invalidated when any descendant changes
+
+---
+
+## New Infrastructure Required
+
+| What | Where | When computed |
+|------|-------|---------------|
+| `summaries` — one-line purpose per node | `nodes` table column | Build time, incremental |
+| `side_effects` — mutation/IO tags | `nodes` table column | Build time, incremental |
+| `complexity_notes` — risk assessment | `nodes` table column | Build time, incremental |
+| `risk_score` — fragility metric | `nodes` table column | Build time, from graph + LLM |
+| `flow_narratives` — traced call stories | New table | Build time for entry points |
+| `module_summaries` — file-level rollups | New table | Build time, re-rolled on change |
+| `naming_conventions` — detected patterns | Metadata table | Build time per module |
+| LLM provider abstraction | `llm.js` | Config: local/API/none |
+| Cascade invalidation | `builder.js` | When a node changes, mark dependents for re-enrichment |