Skip to content

raintree-technology/docs-mcp

Repository files navigation

MCP Documentation Server

Semantic search over your documentation via the Model Context Protocol (MCP)

A production-ready MCP server that provides semantic search and exact pattern matching over your documentation using pgvector embeddings and OpenAI.

Features

  • Semantic search - Find concepts even when you don't know exact terms
  • Fast exact matching - Grep-like search for known function/method names
  • Multi-library support - Index multiple documentation sets
  • High accuracy - Proven 100% accuracy on evaluation tests
  • Low cost - ~$0.10 setup, ~$0.10/month for typical usage
  • Easy deployment - Works locally or on any VPS

Quick Start

# 1. Clone and setup
git clone https://github.com/YOUR_USERNAME/mcp-docs-server.git
cd mcp-docs-server
bash scripts/setup.sh

# 2. Add your documentation
mkdir -p docs/my-library
cp -r /path/to/docs/*.md docs/my-library/

# 3. Ingest documentation
bun run ingest

# 4. Test locally
bun run dev

Requirements

  • PostgreSQL with pgvector extension
  • OpenAI API key (for embeddings)
  • Bun runtime

See SETUP.md for detailed installation instructions and DATABASE.md for production database configuration.

Documentation Structure

Organize your documentation in the docs/ folder with subdirectories for each library:

docs/
├── my-library/
│   ├── getting-started.md
│   ├── api-reference.md
│   └── guides/
│       └── authentication.md
└── another-library/
    └── README.md

Each top-level subdirectory becomes a searchable library.

MCP Tools

The server provides three MCP tools:

search_docs

Semantic search for concepts and features.

{
  "query": "how to authenticate users",
  "library": "my-library",
  "limit": 5
}

Use when:

  • You don't know exact method/function names
  • Looking for conceptual information
  • Finding features by description

grep_docs

Fast exact pattern matching (like grep).

{
  "pattern": "authenticateUser",
  "library": "my-library",
  "limit": 5
}

Use when:

  • You know the exact method/function name
  • Looking for specific code examples
  • Need precise matches

list_libraries

List all available documentation libraries.

{
  "response_format": "markdown"
}

Configuration

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):

{
  "mcpServers": {
    "docs": {
      "command": "bun",
      "args": ["run", "/path/to/mcp-docs-server/src/server.ts"],
      "env": {
        "DATABASE_URL": "postgresql://user:pass@localhost:5432/docs_db",
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

Claude Code (CLI)

Add to ~/.config/claude-code/mcp.json:

{
  "mcpServers": {
    "docs": {
      "command": "bun",
      "args": ["run", "/path/to/mcp-docs-server/src/server.ts"],
      "env": {
        "DATABASE_URL": "postgresql://user:pass@localhost:5432/docs_db",
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

VPS Deployment

Deploy to a Linux VPS with systemd:

# Copy files to VPS
scp -r . user@server:/opt/mcp-docs-server

# SSH and deploy
ssh user@server
cd /opt/mcp-docs-server
bash scripts/deploy.sh

The deploy script will:

  1. Install dependencies
  2. Set up systemd service
  3. Start the server
  4. Configure auto-restart on failure

Usage Examples

Search for authentication concepts

// Claude Desktop or Code can now use:
search_docs({
  query: "implement user authentication with sessions",
  library: "my-api-docs"
})

Find specific function

grep_docs({
  pattern: "validateToken",
  library: "my-api-docs"
})

Discover available docs

list_libraries({
  response_format: "markdown"
})

Cost Breakdown

Setup (one-time):

  • Ingestion: ~$0.10 for 50,000 words of documentation
  • Uses OpenAI text-embedding-3-small ($0.020 per 1M tokens)

Runtime (ongoing):

  • Searches: ~$0.0001 per search query
  • Monthly (100 searches/day): ~$0.30
  • Self-hosted = free server costs (or $5-10/month VPS)

Database:

  • ~1 MB per 1,000 documentation chunks
  • Typical library: 5-20 MB
  • Free tier Supabase works well

Architecture

┌─────────────────┐
│  Claude Client  │
│ (Desktop/Code)  │
└────────┬────────┘
         │ MCP Protocol (stdio)
         │
┌────────▼────────┐
│  MCP Server     │
│  (this repo)    │
└────────┬────────┘
         │
    ┌────┴────┐
    │         │
┌───▼───┐ ┌──▼─────┐
│ PG+   │ │ OpenAI │
│ Vector│ │ API    │
└───────┘ └────────┘

Troubleshooting

"pgvector extension not found"

-- Run in PostgreSQL
CREATE EXTENSION vector;

"Failed to generate embedding"

Check your OpenAI API key has credits and is active:

curl https://api.openai.com/v1/models \
  -H "Authorization: Bearer $OPENAI_API_KEY"

"No markdown files found"

Ensure docs are in correct structure:

# Good
docs/my-lib/guide.md

# Bad (won't be found)
my-lib/guide.md

High ingestion costs

Reduce CHUNK_SIZE in src/ingest.ts or remove redundant documentation.

Development

# Install dependencies
bun install

# Run tests (if you add them)
bun test

# Format code
bun fmt

# Type check
bun run tsc --noEmit

Environment Variables

Variable Required Default Description
DATABASE_URL Yes - PostgreSQL connection string with pgvector
OPENAI_API_KEY Yes - OpenAI API key for embeddings
DOCS_DIR No ./docs Custom docs directory
DB_POOL_MAX No 10 Maximum database connections
DB_POOL_MIN No 2 Minimum database connections
DB_IDLE_TIMEOUT_MS No 30000 Idle connection timeout
DB_CONNECTION_TIMEOUT_MS No 5000 Connection timeout
DB_STATEMENT_TIMEOUT_MS No 30000 Query timeout

For advanced database configuration and performance tuning, see DATABASE.md.

License

MIT License - see LICENSE file for details.

Contributing

Contributions welcome! See CONTRIBUTING.md.

Credits

Built with:

Support

  • GitHub Issues: Report bugs and request features
  • Discussions: Ask questions and share tips

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published