Semantic search over your documentation via the Model Context Protocol (MCP)
A production-ready MCP server that provides semantic search and exact pattern matching over your documentation using pgvector embeddings and OpenAI.
- Semantic search - Find concepts even when you don't know exact terms
- Fast exact matching - Grep-like search for known function/method names
- Multi-library support - Index multiple documentation sets
- High accuracy - Proven 100% accuracy on evaluation tests
- Low cost - ~$0.10 setup, ~$0.10/month for typical usage
- Easy deployment - Works locally or on any VPS
# 1. Clone and setup
git clone https://github.com/YOUR_USERNAME/mcp-docs-server.git
cd mcp-docs-server
bash scripts/setup.sh
# 2. Add your documentation
mkdir -p docs/my-library
cp -r /path/to/docs/*.md docs/my-library/
# 3. Ingest documentation
bun run ingest
# 4. Test locally
bun run dev- PostgreSQL with pgvector extension
- OpenAI API key (for embeddings)
- Bun runtime
See SETUP.md for detailed installation instructions and DATABASE.md for production database configuration.
Organize your documentation in the docs/ folder with subdirectories for each library:
docs/
├── my-library/
│ ├── getting-started.md
│ ├── api-reference.md
│ └── guides/
│ └── authentication.md
└── another-library/
└── README.md
Each top-level subdirectory becomes a searchable library.
The server provides three MCP tools:
Semantic search for concepts and features.
{
"query": "how to authenticate users",
"library": "my-library",
"limit": 5
}Use when:
- You don't know exact method/function names
- Looking for conceptual information
- Finding features by description
Fast exact pattern matching (like grep).
{
"pattern": "authenticateUser",
"library": "my-library",
"limit": 5
}Use when:
- You know the exact method/function name
- Looking for specific code examples
- Need precise matches
List all available documentation libraries.
{
"response_format": "markdown"
}Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):
{
"mcpServers": {
"docs": {
"command": "bun",
"args": ["run", "/path/to/mcp-docs-server/src/server.ts"],
"env": {
"DATABASE_URL": "postgresql://user:pass@localhost:5432/docs_db",
"OPENAI_API_KEY": "sk-..."
}
}
}
}Add to ~/.config/claude-code/mcp.json:
{
"mcpServers": {
"docs": {
"command": "bun",
"args": ["run", "/path/to/mcp-docs-server/src/server.ts"],
"env": {
"DATABASE_URL": "postgresql://user:pass@localhost:5432/docs_db",
"OPENAI_API_KEY": "sk-..."
}
}
}
}Deploy to a Linux VPS with systemd:
# Copy files to VPS
scp -r . user@server:/opt/mcp-docs-server
# SSH and deploy
ssh user@server
cd /opt/mcp-docs-server
bash scripts/deploy.shThe deploy script will:
- Install dependencies
- Set up systemd service
- Start the server
- Configure auto-restart on failure
// Claude Desktop or Code can now use:
search_docs({
query: "implement user authentication with sessions",
library: "my-api-docs"
})grep_docs({
pattern: "validateToken",
library: "my-api-docs"
})list_libraries({
response_format: "markdown"
})Setup (one-time):
- Ingestion: ~$0.10 for 50,000 words of documentation
- Uses OpenAI
text-embedding-3-small($0.020 per 1M tokens)
Runtime (ongoing):
- Searches: ~$0.0001 per search query
- Monthly (100 searches/day): ~$0.30
- Self-hosted = free server costs (or $5-10/month VPS)
Database:
- ~1 MB per 1,000 documentation chunks
- Typical library: 5-20 MB
- Free tier Supabase works well
┌─────────────────┐
│ Claude Client │
│ (Desktop/Code) │
└────────┬────────┘
│ MCP Protocol (stdio)
│
┌────────▼────────┐
│ MCP Server │
│ (this repo) │
└────────┬────────┘
│
┌────┴────┐
│ │
┌───▼───┐ ┌──▼─────┐
│ PG+ │ │ OpenAI │
│ Vector│ │ API │
└───────┘ └────────┘
-- Run in PostgreSQL
CREATE EXTENSION vector;Check your OpenAI API key has credits and is active:
curl https://api.openai.com/v1/models \
-H "Authorization: Bearer $OPENAI_API_KEY"Ensure docs are in correct structure:
# Good
docs/my-lib/guide.md
# Bad (won't be found)
my-lib/guide.mdReduce CHUNK_SIZE in src/ingest.ts or remove redundant documentation.
# Install dependencies
bun install
# Run tests (if you add them)
bun test
# Format code
bun fmt
# Type check
bun run tsc --noEmit| Variable | Required | Default | Description |
|---|---|---|---|
DATABASE_URL |
Yes | - | PostgreSQL connection string with pgvector |
OPENAI_API_KEY |
Yes | - | OpenAI API key for embeddings |
DOCS_DIR |
No | ./docs |
Custom docs directory |
DB_POOL_MAX |
No | 10 |
Maximum database connections |
DB_POOL_MIN |
No | 2 |
Minimum database connections |
DB_IDLE_TIMEOUT_MS |
No | 30000 |
Idle connection timeout |
DB_CONNECTION_TIMEOUT_MS |
No | 5000 |
Connection timeout |
DB_STATEMENT_TIMEOUT_MS |
No | 30000 |
Query timeout |
For advanced database configuration and performance tuning, see DATABASE.md.
MIT License - see LICENSE file for details.
Contributions welcome! See CONTRIBUTING.md.
Built with:
- GitHub Issues: Report bugs and request features
- Discussions: Ask questions and share tips