docs: reposition around always-fresh graph + optional LLM enhancement

github-actions[bot] · github-actions[bot] · commit a403accbbe97 · 2026-02-22T03:35:43.000-07:00
Shift core messaging from "zero cloud / fully local" to "always-fresh
incremental graph at zero cost, optionally enhanced with your LLM."

- README: new hero tagline, add "Why most tools can't keep up" section,
  reframe feature comparison around rebuild speed and LLM-optional mode
- COMPETITIVE_ANALYSIS: lead with incremental builds and dual-mode as
  top differentiators, add LLM provider integration to Tier 2 roadmap
- FOUNDATION: principle 1 becomes "graph is always current", principle 4
  becomes "zero-cost core, LLM-enhanced when you choose", update
  competitive position around the three questions no competitor answers
diff --git a/COMPETITIVE_ANALYSIS.md b/COMPETITIVE_ANALYSIS.md
@@ -17,7 +17,7 @@ Ranked by weighted score across 6 dimensions (each 1–5):
 | 4 | 3.9 | [harshkedia177/axon](https://github.com/harshkedia177/axon) | 29 | Python | None | 11-phase pipeline, KuzuDB, Leiden community detection, dead code, change coupling |
 | 5 | 3.8 | [anrgct/autodev-codebase](https://github.com/anrgct/autodev-codebase) | 111 | TypeScript | None | 40+ languages, 7 embedding providers, Cytoscape.js visualization, LLM reranking |
 | 6 | 3.7 | [Anandb71/arbor](https://github.com/Anandb71/arbor) | 85 | Rust | MIT | Native GUI, confidence scoring, architectural role classification, fuzzy search, MCP |
-| **7** | **3.6** | **[@optave/codegraph](https://github.com/optave/codegraph)** | — | **JS/Rust** | **Apache-2.0** | **Dual engine (native Rust + WASM), 11 languages, SQLite, MCP, semantic search, zero-cloud** |
+| **7** | **3.6** | **[@optave/codegraph](https://github.com/optave/codegraph)** | — | **JS/Rust** | **Apache-2.0** | **Sub-second incremental rebuilds, dual engine (native Rust + WASM), 11 languages, MCP, zero-cost core + optional LLM enhancement** |
 | 8 | 3.4 | [Durafen/Claude-code-memory](https://github.com/Durafen/Claude-code-memory) | 72 | Python | None | Memory Guard quality gate, persistent codebase memory, Voyage AI + Qdrant |
 | 9 | 3.3 | [NeuralRays/codexray](https://github.com/NeuralRays/codexray) | 2 | TypeScript | MIT | 16 MCP tools, TF-IDF semantic search (~50MB), dead code, complexity, path finding |
 | 10 | 3.2 | [al1-nasir/codegraph-cli](https://github.com/al1-nasir/codegraph-cli) | 11 | Python | MIT | CrewAI multi-agent system, 6 LLM providers, browser explorer, DOCX export |
@@ -77,11 +77,12 @@ Ranked by weighted score across 6 dimensions (each 1–5):
 
 | Strength | Details |
 |----------|---------|
-| **Zero-dependency deployment** | `npm install` and done. No Docker, no cloud, no API keys needed. Most competitors require Docker (Memgraph, Neo4j, Dgraph, Qdrant) or cloud APIs |
+| **Always-fresh graph (incremental rebuilds)** | File-level MD5 hashing means only changed files are re-parsed. Change 1 file in a 3,000-file project → rebuild in under a second. No other tool in this space offers this. Competitors re-index everything from scratch — making them unusable in commit hooks, watch mode, or agent-driven loops |
+| **Zero-cost core, LLM-enhanced when you choose** | The full graph pipeline (parse, resolve, query, impact analysis) runs with no API keys, no cloud, no cost. LLM features (richer embeddings, semantic search) are an optional layer on top — using whichever provider the user already works with. Competitors either require cloud APIs for core features (code-graph-rag, autodev-codebase) or offer no AI enhancement at all (CKB, axon). Nobody else offers both modes in one tool |
+| **Data goes only where you send it** | Your code reaches exactly one place: the AI agent you already chose (via MCP). No additional third-party services, no surprise cloud calls. Competitors like code-graph-rag, autodev-codebase, and Claude-code-memory send your code to additional AI providers beyond the agent you're using |
 | **Dual engine architecture** | Only project with native Rust (napi-rs) + automatic WASM fallback. Others are pure Rust OR pure JS/Python — never both |
 | **Single-repo MCP isolation** | Security-conscious default: tools have no `repo` property unless `--multi-repo` is explicitly enabled. Most competitors default to exposing everything |
-| **Incremental builds** | File-hash-based skip of unchanged files. Some competitors re-index everything |
-| **Platform binaries** | Published `@optave/codegraph-{platform}-{arch}` optional packages — true npm-native distribution |
+| **Zero-dependency deployment** | `npm install` and done. No Docker, no external databases, no Python, no SCIP toolchains. Published platform-specific binaries (`@optave/codegraph-{platform}-{arch}`) resolve automatically |
 | **Import resolution depth** | 6-level priority system with confidence scoring — more sophisticated than most competitors' resolution |
 
 ---
@@ -135,6 +136,7 @@ Ranked by weighted score across 6 dimensions (each 1–5):
 ### Tier 2: High impact, medium effort
 | Feature | Inspired by | Why |
 |---------|------------|-----|
+| **Optional LLM provider integration** | code-graph-rag, autodev-codebase | Bring-your-own provider (OpenAI, etc.) for richer embeddings and AI-powered search. Enhancement layer only — core graph never depends on it. No other tool offers both zero-cost local and LLM-enhanced modes in one package |
 | **Compound MCP tools** | CKB | `explore`/`understand` meta-tools that batch deps + fn + map into single responses. Biggest token-savings opportunity |
 | **Token counting on responses** | glimpse, arbor | tiktoken-based counts so agents know context budget consumed |
 | **Node classification** | arbor | Auto-tag Entry Point / Core / Utility / Adapter from in-degree/out-degree patterns |
@@ -153,10 +155,10 @@ Ranked by weighted score across 6 dimensions (each 1–5):
 | Feature | Why skip |
 |---------|----------|
 | Memgraph/Neo4j/KuzuDB | Our SQLite = zero Docker, simpler deployment. Query gap matters less than simplicity |
-| Multi-provider AI | We're deliberately cloud-free — that's a feature, not a limitation |
 | SCIP indexing | Would require maintaining SCIP toolchains per language. Tree-sitter + native Rust is the right bet |
 | CrewAI multi-agent | Overengineered for a code analysis tool. Keep the scope focused |
 | Clipboard/LLM-dump mode | Different product category (glimpse). We're a graph tool, not a context-packer |
+| Cloud APIs for core features | We will add LLM provider support, but as an **optional enhancement layer** — the core graph must always work with zero API keys and zero cost. This is the opposite of code-graph-rag's approach where cloud APIs are required for core functionality |
 
 ---
 
diff --git a/FOUNDATION.md b/FOUNDATION.md
@@ -8,27 +8,27 @@
 
 ## Why Codegraph Exists
 
-There are 20+ code analysis and code graph tools in the open-source ecosystem. Most require Docker, Python environments, cloud API keys, or external databases. None of them ship as a single npm package with native performance.
+There are 20+ code analysis and code graph tools in the open-source ecosystem. They all force a choice: **fast local analysis with no AI, or powerful AI features that require full re-indexing through cloud APIs on every change.** None of them give you an always-current graph that you can rebuild on every commit and optionally enhance with the LLM provider you already use.
 
-Codegraph exists to be **the code intelligence engine for the JavaScript ecosystem** — the one you `npm install` and it just works, on every platform, with nothing else to set up.
+Codegraph exists to be **the code intelligence engine that keeps up with your commits** — an always-fresh graph that works at zero cost out of the box, with optional LLM enhancement through the provider you choose. Your code only goes where you send it.
 
 ---
 
 ## Core Principles
 
 These principles define what codegraph is and is not. Every feature decision, PR review, and architectural choice should be measured against them.
 
-### 1. Zero-infrastructure deployment
+### 1. The graph is always current
 
-**Codegraph must never require anything beyond `npm install`.**
+**Codegraph must rebuild fast enough to run on every commit, every save, in every agent loop.**
 
-No Docker. No external databases. No cloud accounts. No API keys for core functionality. No Python. No Go toolchain. No manual compilation steps.
+This is our single most important differentiator. Every competitor in this space either re-indexes from scratch on every change (making them unusable in tight loops) or requires cloud API calls baked into the rebuild pipeline (making them slow and costly to run frequently).
 
-SQLite is our database because it's embedded. WASM grammars are our fallback because they run everywhere Node.js runs. Optional dependencies (`@huggingface/transformers`, `@modelcontextprotocol/sdk`) are lazy-loaded and degrade gracefully.
+File-level MD5 hashing means only changed files are re-parsed. Change one file in a 3,000-file project → rebuild in under a second. This makes commit hooks, watch mode, and AI-agent-triggered rebuilds practical. The graph is never stale.
 
-This is our single most important differentiator. Every competitor that adds Docker to their install instructions loses users we should capture.
+The core pipeline is pure local computation — tree-sitter + SQLite. No API calls, no network latency, no cost. This isn't about being anti-cloud. It's about being fast enough that the graph can stay current without waiting on anything external.
 
-*Test: can a developer on a fresh machine run `npm install @optave/codegraph && codegraph build .` with zero prior setup? If not, we broke this principle.*
+*Test: after changing one file in a 1000-file project, does `codegraph build .` complete in under 500ms? Can it run in a commit hook without the developer noticing?*
 
 ### 2. Native speed, universal reach
 
@@ -52,15 +52,17 @@ This principle extends beyond import resolution. When we add features — dead c
 
 *Test: does every query result include enough context for the consumer to judge its reliability?*
 
-### 4. Incremental by default
+### 4. Zero-cost core, LLM-enhanced when you choose
 
-**Never re-parse what hasn't changed.**
+**The full graph works with no API keys. AI features are an optional layer on top.**
 
-File-level MD5 hashing tracks what changed between builds. Only modified files get re-parsed, and their stale nodes/edges are cleaned before re-insertion. This makes watch-mode and AI-agent loops practical — rebuilds drop from seconds to milliseconds.
+The core pipeline — parse, resolve, store, query, impact analysis — runs entirely locally with zero cost. No accounts, no API keys, no cloud calls. This is the mode that runs on every commit.
 
-This is not a feature flag. It's the default behavior. The graph is always fresh with minimum work.
+LLM-powered features (richer embeddings, semantic search, AI-enhanced analysis) are an optional enhancement layer. When enabled, they use whichever provider the user already works with (OpenAI, etc.). Your code goes to exactly one place: the provider you chose. No additional third-party services, no surprise cloud calls.
 
-*Test: after changing one file in a 1000-file project, does `codegraph build .` complete in under 500ms?*
+This dual-mode approach is unique in the competitive landscape. Competitors either require cloud APIs for core functionality (code-graph-rag, autodev-codebase) or offer no AI enhancement at all (CKB, axon, arbor). Nobody else offers both modes in one tool.
+
+*Test: does every core command (`build`, `query`, `fn`, `deps`, `impact`, `diff-impact`, `cycles`, `map`) work with zero API keys? Are LLM features additive, never blocking?*
 
 ### 5. Embeddable first, CLI second
 
@@ -116,34 +118,37 @@ Staying in our lane means we can be embedded inside tools that do those things 
 - Features that improve **result quality**: fuzzy search, confidence scoring, node classification, compound queries that reduce agent round-trips
 - Features that improve **speed**: faster native parsing, smarter incremental builds, lighter-weight search alternatives (FTS5/TF-IDF alongside full embeddings)
 - Features that improve **embeddability**: better programmatic API, streaming results, output format options
+- **Optional LLM provider integration**: bring-your-own provider (OpenAI, etc.) for richer embeddings, AI-powered search, and enhanced analysis — always as an additive layer that never blocks the core pipeline (Principle 4)
 
 ### We will not build
 
-- External database backends (Memgraph, Neo4j, Qdrant, etc.) — violates Principle 1
-- Cloud API integrations for core functionality — violates Principle 1
+- External database backends (Memgraph, Neo4j, Qdrant, etc.) — violates Principle 1 (speed) and zero-infrastructure goal
+- Cloud API calls in the core pipeline — violates Principle 1 (the graph must always rebuild in under a second) and Principle 4 (zero-cost core)
 - AI-powered code generation or editing — violates Principle 8
 - Multi-agent orchestration — violates Principle 8
 - Native desktop GUI — outside our lane; we're a library
-- Features that require non-npm dependencies — violates Principle 1
+- Features that require non-npm dependencies — keeps deployment simple
 
 ---
 
 ## Competitive Position
 
 As of February 2026, codegraph is **#7 out of 22** in the code intelligence tool space (see [COMPETITIVE_ANALYSIS.md](./COMPETITIVE_ANALYSIS.md)).
 
-Six tools rank above us on feature breadth and community size. But none of them occupy our niche: **the npm-native, zero-config, dual-engine code intelligence library.**
+Six tools rank above us on feature breadth and community size. But none of them can answer yes to all three questions:
+
+1. **Can you rebuild the graph on every commit in a large codebase?** — Only codegraph has incremental builds. Everyone else re-indexes from scratch.
+2. **Does the core pipeline work with zero API keys and zero cost?** — Tools like code-graph-rag and autodev-codebase require cloud APIs for core features. Codegraph's full graph pipeline is local and costless.
+3. **Can you optionally enhance with your LLM provider?** — Local-only tools (CKB, axon, arbor) have no AI enhancement path. Cloud-dependent tools force it. Only codegraph makes it optional.
 
-| What competitors need | What codegraph needs |
-|-----------------------|----------------------|
-| Docker (Memgraph, Neo4j, Qdrant, Dgraph) | Nothing |
-| Python environment | Nothing |
-| Cloud API keys (OpenAI, Gemini, Voyage AI) | Nothing |
-| Manual Rust/Go compilation | Nothing |
-| External secret management setup | Nothing |
-| `npm install @optave/codegraph` | That's it |
+| What competitors force you to choose | What codegraph gives you |
+|--------------------------------------|--------------------------|
+| Fast local analysis **or** AI-powered features | Both — zero-cost core + optional LLM layer |
+| Full re-index on every change **or** stale graph | Always-current graph via incremental builds |
+| Code goes to multiple cloud services **or** no AI at all | Code goes only to the one provider you chose |
+| Docker + Python + external DB **or** nothing works | `npm install` and done |
 
-Our path to #1 is not feature parity with every competitor. It's making codegraph **the obvious default for any JavaScript developer or tool that needs code intelligence** — because it's the only one that doesn't ask them to leave the npm ecosystem.
+Our path to #1 is not feature parity with every competitor. It's being **the only code intelligence tool where the graph is always current, works at zero cost, and optionally gets smarter with the LLM you already use.**
 
 ---
 
diff --git a/README.md b/README.md