A production-grade personal knowledge base with semantic search, designed to run on a single machine. Index codebases, web pages, PDFs, YouTube transcripts, and notes — then search everything with natural language through an MCP server, HTTP API, or CLI.
Built for developers who want their own searchable knowledge graph without cloud dependencies.
LLMs are powerful but stateless. Every conversation starts from scratch. Knowledge Hub gives your tools persistent memory:
- Index your entire codebase — AST-aware chunking understands Python class/function boundaries, not just line counts
- Search with natural language — "how does the exit manager calculate stop losses" finds the exact code
- Inject into any Claude Code session — MCP server makes your knowledge base a native tool
- RAG queries — ask questions, get synthesized answers with source citations
- Zero cloud lock-in — runs entirely on your machine (Ollama embeddings, Qdrant vector DB, SQLite metadata)
┌─────────────────────────────────────────────────────┐
│ Interfaces │
│ ┌──────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ MCP Server│ │ HTTP API │ │ CLI │ │
│ │ (stdio) │ │ (port 8006) │ │ │ │
│ └─────┬─────┘ └──────┬───────┘ └───────┬───────┘ │
│ └───────────────┼───────────────────┘ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ IngestPipeline │ │
│ │ (orchestrator) │ │
│ └────────┬────────┘ │
│ ┌──────────────┼──────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌────────────┐ ┌───────────┐ │
│ │ Ingestors │ │ Chunker │ │ Search │ │
│ │ Web/PDF/ │ │ AST/Prose/ │ │ Engine │ │
│ │ YouTube/ │ │ Markdown │ │ (hybrid) │ │
│ │ Code/Text │ │ │ │ │ │
│ └──────────┘ └────────────┘ └───────────┘ │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌───────────┐ │
│ │ Ollama │ │ Qdrant │ │ SQLite │ │
│ │ Embeddings│ │ Vectors │ │ Metadata │ │
│ │ (768-dim) │ │ (HNSW) │ │ (WAL) │ │
│ └──────────┘ └──────────┘ └───────────┘ │
└─────────────────────────────────────────────────────┘
| Component | Technology | Purpose |
|---|---|---|
| Vector DB | Qdrant (Docker) | Cosine similarity search, HNSW index, mmap disk storage |
| Embeddings | nomic-embed-text (Ollama) | 768-dim vectors, local inference, free |
| Metadata | SQLite (WAL mode) | Document tracking, dedup via content hashing |
| Chunking | Python ast module |
Class/function boundary detection for code |
| Search | Hybrid scoring | Semantic similarity + time decay + source credibility |
| RAG | Claude API | Synthesized answers with source citations |
| MCP | FastMCP (stdio) | Native Claude Code integration |
| HTTP | FastAPI | REST API for remote access |
- Python 3.11+
- Docker (for Qdrant)
- Ollama with
nomic-embed-textmodel
# Install Ollama and pull the embedding model
ollama pull nomic-embed-textgit clone https://github.com/sowmith95/KnowledgeHub.git
cd KnowledgeHub
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install
pip install -e .cd docker
docker compose up -d qdrant# Index a codebase
knowledge-hub ingest /path/to/your/project --code --name "My Project"
# Index a web page
knowledge-hub ingest https://docs.python.org/3/tutorial/classes.html
# Index a PDF
knowledge-hub ingest /path/to/paper.pdf
# Index a YouTube video
knowledge-hub ingest https://www.youtube.com/watch?v=dQw4w9WgXcQ
# Store a note
knowledge-hub ingest --text "Redis HGETALL returns all fields as strings" --title "Redis Notes"# Semantic search
knowledge-hub search "how does authentication work"
# Search only code
knowledge-hub search "database connection pooling" --type code
# RAG query (requires ANTHROPIC_API_KEY)
knowledge-hub ask "What is the retry strategy for failed API calls?"The primary interface. Add to your Claude Code config (~/.claude.json):
{
"mcpServers": {
"knowledge-hub": {
"command": "/path/to/KnowledgeHub/.venv/bin/python3",
"args": ["-m", "knowledge_hub"],
"env": {
"QDRANT_HOST": "localhost",
"QDRANT_PORT": "6333",
"OLLAMA_URL": "http://localhost:11434",
"KB_SQLITE_PATH": "/path/to/knowledge-hub-data/metadata.db"
}
}
}
}Once configured, these tools are available in every Claude Code session:
| Tool | Description | Key Parameters |
|---|---|---|
kb_search |
Semantic search across all indexed content | query, top_k (default 10), source_type filter |
kb_ingest_url |
Ingest a web page, YouTube video, or PDF | url, force (re-ingest even if unchanged) |
kb_ingest_code |
Index a local codebase directory | path, name |
kb_ingest_text |
Store raw text or markdown | text, title, source_type |
kb_query |
RAG: search + synthesize answer with citations | question, top_k (default 8) |
kb_list |
List all indexed documents | source_type filter, limit |
kb_stats |
System health and document counts | — |
kb_delete |
Remove a document and its vectors | doc_id |
Start the HTTP server for remote access (e.g., via Tailscale):
knowledge-hub serve --host 0.0.0.0 --port 8006| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
System health check |
| POST | /ingest/url |
Ingest URL (web/PDF/YouTube) |
| POST | /ingest/text |
Ingest raw text |
| POST | /ingest/code |
Index a codebase directory |
| POST | /search |
Semantic search |
| POST | /query |
RAG query with synthesized answer |
| GET | /documents |
List indexed documents |
| DELETE | /documents/{doc_id} |
Delete a document |
| GET | /stats |
System statistics |
curl -X POST http://localhost:8006/search \
-H "Content-Type: application/json" \
-d '{"query": "retry logic for API calls", "top_k": 5}'Source → Detect Type → Extract Content → Hash Check → Chunk → Embed → Store
- Detect: IngestorRegistry tries YouTube → PDF → Code → Web (most specific first)
- Extract: Pull text content, title, metadata from source
- Hash Check: SHA-256 content hash — skip if unchanged (unless
force=True) - Chunk: Split into semantically meaningful pieces (strategy depends on content type)
- Embed: Generate 768-dim vectors via Ollama nomic-embed-text (batches of 32)
- Store: Upsert vectors to Qdrant + document metadata to SQLite
Python Code (AST-based)
- Parses with
ast.parse()for exact node boundaries - Imports grouped as one chunk
- Each function/class becomes its own chunk
- Large classes split into per-method chunks with class signature as context
- Decorators stay attached to their definitions
Prose (sentence-aware)
- Splits at sentence boundaries (
.!?followed by uppercase) - 256-word target chunks with 32-word overlap
- Never splits mid-sentence
Markdown (heading-aware)
- Splits at heading boundaries (
#through######) - Maintains parent heading context for sub-sections
- Falls back to sentence chunking for large sections
Safety limits:
- Max 5,000 characters per chunk
- Max 3,500 characters sent to embedding model (nomic-embed-text has 8,192 token context)
- Minimum 10 words per chunk (filters noise)
Hybrid scoring combines three signals:
final_score = (similarity × 0.70) + (time_score × 0.15) + (source_weight × 0.15)
| Signal | Weight | Formula |
|---|---|---|
| Semantic similarity | 70% | Cosine distance from Qdrant |
| Time decay | 15% | exp(-0.693 × days_old / 90) — half-life of 90 days |
| Source credibility | 15% | code: 1.3, pdf: 1.2, markdown: 1.1, article: 1.0, youtube: 0.9, text: 0.8 |
Deduplication: Max 3 chunks per document in results to prevent one large file from dominating.
When you use kb_query or the /query endpoint (requires ANTHROPIC_API_KEY):
- Search for the top-k most relevant chunks
- Build a context window with numbered sources
- Send to Claude (claude-sonnet-4-20250514) with a system prompt enforcing context-only answers
- Returns synthesized answer with
[Source N]citations
Falls back to raw search results if no API key is configured.
| Type | Ingestor | Details |
|---|---|---|
| Codebases | CodeIngestor | 25+ file extensions, respects .gitignore, skips node_modules/tests/.venv, 1MB file limit |
| Web pages | WebIngestor | Strips nav/footer/ads, extracts <article> or <main> content |
| PDFs | PDFIngestor | Local files or URLs, extracts via PyPDF2 |
| YouTube | YouTubeIngestor | Transcript extraction (manual → auto-generated → any language) |
| Raw text | Direct | Notes, analysis results, anything you want searchable |
| Markdown | Direct | Heading-aware chunking with hierarchy context |
Python (AST-parsed), JavaScript, TypeScript, TSX/JSX, Go, Rust, Java, C, C++, SQL, Shell/Bash, YAML, TOML, JSON, HTML, CSS, Svelte, Vue, Terraform, HCL, Dockerfile, Makefile.
All configuration via environment variables with sensible defaults:
| Variable | Default | Description |
|---|---|---|
QDRANT_HOST |
localhost |
Qdrant server host |
QDRANT_PORT |
6333 |
Qdrant HTTP port |
QDRANT_GRPC_PORT |
6334 |
Qdrant gRPC port |
KB_COLLECTION |
knowledge_hub |
Qdrant collection name |
KB_EMBEDDING_DIM |
768 |
Embedding vector dimensions |
OLLAMA_URL |
http://localhost:11434 |
Ollama API endpoint |
KB_EMBEDDING_MODEL |
nomic-embed-text |
Ollama model name |
KB_CHUNK_SIZE |
256 |
Target words per chunk |
KB_CHUNK_OVERLAP |
32 |
Overlap words between chunks |
KB_SQLITE_PATH |
~/knowledge-hub-data/metadata.db |
SQLite database path |
KB_API_HOST |
0.0.0.0 |
HTTP server bind address |
KB_API_PORT |
8006 |
HTTP server port |
ANTHROPIC_API_KEY |
— | Required for RAG queries (kb_query) |
For running both Qdrant and the HTTP API server:
cd docker
# Start everything
docker compose up -d
# Or just Qdrant (if running MCP server locally)
docker compose up -d qdrantThe docker-compose.yml binds data to host directories for persistence:
- Qdrant data:
/Users/srp/knowledge-hub-data/qdrant - SQLite + app data:
/Users/srp/knowledge-hub-data
Update the volume paths in docker-compose.yml for your system.
KnowledgeHub/
├── knowledge_hub/
│ ├── __init__.py
│ ├── __main__.py # Entry point (MCP server)
│ ├── config.py # All configuration (env vars)
│ ├── models.py # Domain models (Document, Chunk, SearchResult)
│ ├── pipeline.py # Orchestrator (ingest, search, query)
│ ├── embeddings.py # Ollama nomic-embed-text client
│ ├── vector_store.py # Qdrant wrapper (upsert, search, delete)
│ ├── metadata_db.py # SQLite metadata (documents, dedup)
│ ├── chunker.py # AST/sentence/markdown chunking
│ ├── api.py # FastAPI HTTP server
│ ├── cli.py # CLI interface
│ ├── ingestors/
│ │ ├── base.py # BaseIngestor ABC
│ │ ├── web.py # Web page ingestor
│ │ ├── pdf.py # PDF ingestor
│ │ ├── youtube.py # YouTube transcript ingestor
│ │ └── code.py # Codebase directory ingestor
│ ├── search/
│ │ └── engine.py # Hybrid search + RAG query
│ └── mcp_server/
│ └── server.py # FastMCP tool definitions
├── docker/
│ ├── docker-compose.yml
│ └── Dockerfile
├── scripts/
│ ├── start.sh
│ ├── stop.sh
│ ├── configure-claude-code.sh
│ └── index-complextading.sh
├── pyproject.toml
└── README.md
MIT