Unified IDE context engine that merges semantic codebase search with episodic project memory into a single MCP server.
Every time you start a new AI coding session, your agent starts from zero. It doesn't remember the bug you fixed yesterday, the architectural decision you made last week, or even what files exist in your project. You end up re-explaining context, watching it hallucinate stale assumptions, and losing momentum to the "goldfish memory" problem.
Krusch Context MCP fixes this. It gives your AI coding agent persistent, searchable memory across every session — paired with semantic search over your entire codebase — so your agent always knows what your code does, why you built it that way, and what went wrong last time.
A single Model Context Protocol server exposing 26 tools to any MCP-compatible IDE agent (Cursor, Claude Code, Windsurf, Gemini CLI, etc.):
| Capability | What It Provides |
|---|---|
| 🔍 Semantic Codebase Search | Search the meaning of your code, not just filenames. "How do we handle auth?" returns the actual implementation. |
| 🧠 Episodic Memory | Bugs, decisions, and lessons persist across sessions, retrieved by semantic relevance with temporal decay. |
| 💎 Steering Nudges | Lightweight key-value facts (preferences, conventions) give the agent behavioral continuity without re-prompting. |
| 📖 Documentation Search | Ingested external docs are searchable locally — your agent references your versions, not its training data. |
| 🌍 Zero-Trust Deep Search | One tool call cross-references codebase reality with historical memory to verify understanding before acting. |
🛡️ Everything stays on your hardware — All embeddings via local Ollama (bge-large + llama3.2). Storage is PostgreSQL + pgvector + SQLite. Zero API costs, full data sovereignty.
🔄 Switch models without losing context — Memory is decoupled from the reasoning engine. Swap between Gemini, Claude, GPT-4o, or local models mid-project — every model inherits the same context.
⚡ One server, not three — Codebase search, episodic memory, and steering nuggets in a single process with shared connection pool and embedding pipeline.
Prerequisites: Node.js 22+ · Ollama with bge-large and llama3.2 · PostgreSQL with pgvector
# 1. Install [PG-Git-MCP](https://github.com/kruschdev/pg-git-mcp) (codebase ingestion engine)
npm install -g pg-git-mcp
# 2. Clone and install
git clone https://github.com/kruschdev/krusch-context-mcp.git
cd krusch-context-mcp
npm install
cp .env.example .env # Configure your database connection
# 3. Start
npm startAdd to your IDE MCP settings (e.g., .cursor/mcp.json, claude_desktop_config.json):
{
"mcpServers": {
"krusch-context-mcp": {
"command": "node",
"args": ["/path/to/krusch-context-mcp/src/index.js"]
}
}
}Restart your IDE — your agent now has access to all 26 tools.
Upgrading?
git pull origin main && npm install && npm start— idempotent migrations run on startup.
graph TD;
A[Agent Tool Call] --> B{Krusch Context MCP};
B -- Semantic Code Search --> C[(PG-Git: blobs)];
B -- Read/Write --> D[(SQLite Compute Cache)];
B -- Read/Write --> E[(Postgres Object Storage)];
D -. Async Pull/Push .-> E;
B -- Deep Search --> C;
B -- Deep Search --> D;
B -- Deep Search --> E;
F[Ollama Fleet] -. embeddings .-> B;
| Component | Details |
|---|---|
| Storage | Hybrid: Local SQLite (per-project) + PostgreSQL (global & codebase) |
| Embeddings | Ollama bge-large @ 1024 dims, fleet load-balanced |
| Tagging | SpectralQuant KV Compression for automatic keyword extraction |
| Temporal Decay | score = similarity × e^(-0.01 × age_days) — relevance drops ~26% after 30 days |
- Lakebase Architecture — Local SQLite for zero-latency reads, async write-behind to durable PostgreSQL. A
+0.3local scoring bias mitigates Ebbinghaus forgetting as the global corpus grows. Inspired by Neon. - Hybrid Retrieval — Auto-tagged via SpectralQuant KV Cache Compression to address pure-cosine failure modes (negation, numeric, role-swap) while maintaining massive context windows without OOM. Per Sentra.
- Consolidation — Semantic dedup via L2-normalized centroid averaging without re-embedding. From Geometry of Consolidation.
- Holographic Nuggets — Lightweight steering facts adapted from NeoVertex1/nuggets.
- Resilient Multi-Tier Tagging (v1.1) — Implements robust failover from SpectralQuant KV compression to local Ollama endpoints (running
llama3.2:1b), with fuzzy parsing of varied list index styles to guarantee 100% keyword extraction reliability under offline or proxy down states.
Implements the three-layer organizational memory model from the Sentra "Company Brain" research:
- Factual Memory — Raw codebase state + episodic events → "what happened"
- Interaction Memory — Parent-child UUID lineage, attribution, conflict resolution → "why it happened"
- Action Memory — Autonomous state compilation and graph traversal → "what to do next"
You: "That fixed the port conflict! Save this."
Agent: [add_memory] Saved to 'bugs': port 5441 conflicts with legacy DB, use 5442.
You: "How did we structure the auth system?"
Agent: [search_memory] From 'lessons': chose singleton JWT factory to avoid circular dependencies.
You: "How does our auth middleware work?"
Agent: [search_code] Found 3 files — here's the implementation inlib/auth.js...
You: "Before we start, verify what you know about the DB schema."
Agent: [deep_search] Cross-referencing codebase + memory — schema uses pgvector 1024 dims, last session added thetagscolumn.
You: "Always use
constoverletin this project."
Agent: [nugget_remember] Saved:coding-style:const-over-let.
You: "The previous agent was wrong about the database port."
Agent: [resolve_conflict] Merged conflicting states. Deprecated old branches, created unified resolution.
🛑 ZERO-TRUST CONTEXT VERIFICATION (MANDATORY FIRST STEP)
Root Cause Vulnerability: An agent operating in a new session suffers from "goldfish memory" and will confidently hallucinate an understanding of decoupled architectures if it hasn't anchored itself in recent memory.
Rule: Before executing a research task, writing code, or answering architectural questions in a new session, the agent MUST independently execute a Vector Database query to pull the latest codebase realities.
Execution: You are forbidden from guessing context. You must physically execute the Context MCP tool:
search_memory(category: 'lessons', query: "<current_topic_or_project>")
If you proceed without querying this database first, you are violating the core partnership agreement.
1. search_memory({ category: "lessons", query: "<topic>" })
→ Pull the latest codebase realities and historical context
2. deep_search({ query: "<topic>", project: "<project>" })
→ Verify codebase + memory in one call
3. nugget_nudges({ query: "<task>", active_project: "<project>" })
→ Load conventions and preferences
1. search_memory({ category: "bugs", query: "<symptoms>" }) → Check history
2. search_code({ query: "<error>", project: "<project>" }) → Find implementation
3. [Fix the bug]
4. add_memory({ category: "bugs", content: "<root cause + fix>" }) → Document
1. add_memory({ category: "outcomes", content: "<decisions and results>" })
2. nugget_remember({ key: "<project>:last-session", value: "<in-progress work>" })
3. consolidate({ category: "activity", project: "<project>", dry_run: true })
Full parameter details, defaults, and examples → Tool Reference
| Tool | Description |
|---|---|
| Episodic Memory | |
add_memory |
Store a memory (bug, lesson, priority, outcome, activity) |
search_memory |
Semantic search with temporal decay |
list_memories |
List recent memories by category |
delete_memory / update_memory |
CRUD by ID |
consolidate |
Merge semantically duplicate memories |
compile_state |
Contextmaxxing — compile full project state |
| Company Brain v2 | |
write_state |
Stateful write with concurrency control and attribution |
resolve_conflict |
Merge conflicting sibling states |
get_provenance |
Trace version history and lineage |
search_lens |
Role-filtered semantic retrieval |
traverse_graph |
Navigate parent/child lineage and linked blobs |
update_ontology / link_blob |
Tag management and codebase linking |
| Codebase Search | |
search_code |
Semantic search over indexed files |
deep_search |
Composite zero-trust search (memory + codebase) |
list_repos / read_tree / read_blob |
Browse indexed repositories |
| Nuggets | |
nugget_remember / nugget_nudges / nugget_forget / nugget_list |
Steering fact CRUD |
| System | |
docs_list / docs_search |
External documentation search |
health_check |
Server status verification |
krusch-context-mcp/
├── src/
│ ├── index.js # MCP server entry — tool registration & dispatch
│ ├── memory-engine.js # Episodic memory CRUD + consolidation
│ ├── v2-engine.js # Company Brain v2 substrate
│ ├── nuggets-engine.js # Holographic Nuggets CRUD
│ ├── sqlite-engine.js # Lakebase SQLite layer (pull/push sync)
│ └── llm-tags.js # Shared LLM tag generation
├── scripts/ # Benchmarking, evaluation, and maintenance
├── tests/ # *.test.js = automated, test_*.js = smoke
├── docs/
│ ├── TOOL_REFERENCE.md # Full parameter reference for all 26 tools
│ ├── SETUP.md # Configuration, storage routing, troubleshooting
│ └── research/ # Sentra Company Brain research essays
└── package.json
npm test # Automated (node:test, *.test.js)
npm run test:smoke # JSON-RPC stdio smoke tests
node tests/test_client.js # All 26 tools against live DB
node scripts/benchmark_latency.js # End-to-end latency
node scripts/eval_accuracy.js # Precision/recallConvention:
*.test.js= automated tests ·test_*.js= stdio smoke tests
| Project | Role |
|---|---|
| PG-Git-MCP | Semantic codebase search engine (sibling dependency) |
| Krusch Memory MCP | Legacy standalone memory (superseded) |
| Krusch Sequential MCP | Sequential thinking with PG persistence |
| Krusch Cascade Router | Automated LLM inference routing |
| NeoVertex Nuggets | Original Holographic Nuggets architecture |
The evolution from a simple RAG cache to a stateful Company Brain Substrate is deeply inspired by the Sentra "Company Brain" Essay Series. We recommend reading their work on why organizational memory is an infrastructure problem.
The automated, continuous optimization of agent tool usage through execution tracing and LLM analysis is powered by the HALO RLM Engine.
Tag generation and context analysis rely on the massive context extensions enabled by SpectralQuant KV Cache Compression, authored by Ashwin Gopinath. Our production proxy bridge seamlessly handles both agentic reasoning tasks and native /api/embeddings pass-through for RAG, and is open-source at the SpectralQuant Ollama Bridge standalone repository.
We welcome contributions! Please ensure tests pass and adhere to the project formatting standards.
MIT License © 2026 kruschdev
