Skip to content

sandeshghanta/docs-mcp-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Python Documentation MCP Servers

Two Python-based MCP servers for documentation scraping and reading. Copied Heavily inspired by docs-mcp-server.

Overview

  • Scraper Server: Crawls documentation sites and stores content in SQLite
  • Reader Server: Searches and retrieves stored documentation

Features

Scraper Server

  • πŸ•·οΈ Site crawling with configurable depth and page limits
  • πŸ“„ HTML content extraction (title, content, metadata)
  • πŸ’Ύ SQLite storage with FTS5 full-text search
  • πŸ”§ MCP tools for Claude integration
  • πŸ’» CLI for direct usage

Reader Server

  • πŸ” Keyword search across all documentation
  • πŸ“š List available libraries and versions
  • πŸ“– Browse documentation by library
  • 🎯 Retrieve specific documents by URL
  • πŸ”§ 5 MCP tools for Claude integration

Installation

# Clone the repository
cd docs-mcp-server-python

# Install dependencies
uv sync --extra dev

Docker Deployment

Deploy both servers using Docker for production or containerized development.

Quick Start

# Build images
cd docker && ./build.sh

# Deploy services
cd ../deploy && docker-compose up -d

Both servers are now running:

Key Features

  • 🐳 Containerized: Isolated, reproducible environments
  • πŸ”’ Network Isolation: Reader server cannot access internet (security)
  • πŸ’Ύ Shared Database: Single SQLite database via Docker volume
  • πŸ”„ Auto-restart: Services restart on failure
  • βš™οΈ Configurable: Customize via .env file

Architecture

Scraper (6281)  ──┬──> Shared SQLite Database
                  β”‚
Reader (6282)   β”€β”€β”˜    (Network Isolated)

Documentation

Docker Commands

# Service management
docker-compose up -d              # Start all
docker-compose logs -f reader     # View logs
docker-compose restart scraper    # Restart service
docker-compose down               # Stop all

# Individual services
docker-compose up -d scraper      # Start scraper only
docker-compose stop reader        # Stop reader only

Usage

Scraper Server

Via CLI

# Scrape React documentation
uv run python scraper/cli.py scrape \
    --url https://react.dev \
    --library react \
    --version 19.0 \
    --max-depth 2 \
    --max-pages 50

# View all options
uv run python scraper/cli.py scrape --help

Via MCP Server (Local Development)

Add to your MCP client configuration (e.g., Claude Desktop) for local stdio-based execution:

{
  "mcpServers": {
    "docs-scraper": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/docs-mcp-server-python",
        "run",
        "python",
        "scraper/server.py"
      ],
      "env": {
        "MCP_TRANSPORT": "stdio"
      }
    }
  }
}

Then use the scrape_documentation tool in Claude.

Note: For Docker deployments, see MCP Configuration for Docker Containers below.

Reader Server

Via MCP Server (Local Development)

Add to your MCP client configuration for local stdio-based execution:

{
  "mcpServers": {
    "docs-reader": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/docs-mcp-server-python",
        "run",
        "python",
        "reader/server.py"
      ],
      "env": {
        "MCP_TRANSPORT": "stdio"
      }
    }
  }
}

Available MCP tools:

  • search_documentation - Search with keywords
  • list_libraries - List all available libraries
  • list_versions - List versions for a library
  • get_document - Retrieve document by URL
  • browse_library - Browse all documents for a library

MCP Configuration for Docker Containers

The servers run with HTTP transport in Docker, making them accessible as network services.

Using Claude CLI (Recommended)

After starting the Docker containers (docker-compose up -d), add the servers:

# Add scraper server
claude mcp add --transport http docs-scraper http://localhost:6281/mcp

# Add reader server
claude mcp add --transport http docs-reader http://localhost:6282/mcp

Manual Configuration

Alternatively, add to your MCP client configuration:

Scraper Server (Docker)

{
  "mcpServers": {
    "docs-scraper": {
      "transport": "http",
      "url": "http://localhost:6281/mcp"
    }
  }
}

Reader Server (Docker)

{
  "mcpServers": {
    "docs-reader": {
      "transport": "http",
      "url": "http://localhost:6282/mcp"
    }
  }
}

Requirements:

  • Docker containers must be running (docker-compose up -d)
  • Servers use HTTP transport for remote access
  • Database is shared between containers via bind mount at deploy/data/

Note: The servers auto-detect transport mode. They use HTTP by default (for Docker), but you can force stdio mode by setting MCP_TRANSPORT=stdio environment variable.

Configuration

Environment variables (optional):

# Database location (default: ~/.docs-mcp/documentation.db)
export DOCS_MCP_DB_PATH=/path/to/database.db

# Scraping settings
export DOCS_MCP_USER_AGENT="MyBot/1.0"
export DOCS_MCP_TIMEOUT=30
export DOCS_MCP_DELAY=0.5
export DOCS_MCP_MAX_DEPTH=3
export DOCS_MCP_MAX_PAGES=100

Project Structure

docs-mcp-server-python/
β”œβ”€β”€ scraper/              # Scraper server
β”‚   β”œβ”€β”€ server.py         # MCP server
β”‚   β”œβ”€β”€ cli.py            # CLI commands
β”‚   β”œβ”€β”€ crawler.py        # Site crawler
β”‚   β”œβ”€β”€ extractors/       # Content extractors
β”‚   β”‚   β”œβ”€β”€ base.py       # Base interface
β”‚   β”‚   └── html.py       # HTML extractor
β”‚   └── strategies/       # Scraping strategies
β”‚       β”œβ”€β”€ base.py       # Base interface
β”‚       └── site_crawler.py # Site crawler strategy
β”œβ”€β”€ reader/               # Reader server
β”‚   β”œβ”€β”€ server.py         # MCP server
β”‚   β”œβ”€β”€ search.py         # Search functionality
β”‚   └── query.py          # Query builder
β”œβ”€β”€ shared/               # Shared components
β”‚   β”œβ”€β”€ database.py       # Database manager
β”‚   β”œβ”€β”€ schema.py         # Database schema
β”‚   β”œβ”€β”€ models.py         # Data models
β”‚   └── config.py         # Configuration
β”œβ”€β”€ docker/               # Docker image building
β”‚   β”œβ”€β”€ Dockerfile.scraper    # Scraper image
β”‚   β”œβ”€β”€ Dockerfile.reader     # Reader image
β”‚   β”œβ”€β”€ build.sh              # Build script
β”‚   β”œβ”€β”€ push.sh               # Push to registry
β”‚   └── README.md             # Build documentation
β”œβ”€β”€ deploy/               # Deployment configuration
β”‚   β”œβ”€β”€ docker-compose.yml    # Service orchestration
β”‚   β”œβ”€β”€ .env.example          # Configuration template
β”‚   └── README.md             # Deployment guide
└── tests/                # Tests

Database Schema

Simple schema with FTS5 full-text search:

documents (
    id, library, version, url, title, content,
    raw_html, metadata, scraped_at, updated_at
)
documents_fts (FTS5 virtual table for search)

Development

Running Tests

# Run all tests
uv run pytest -v

# Run specific test file
uv run pytest tests/shared/test_database.py -v

Type Checking

uv run mypy shared/ scraper/ reader/ --strict

Linting

uv run ruff check .
uv run ruff format .

Examples

Example 1: Scrape and Search

# 1. Scrape Python documentation
uv run python scraper/cli.py scrape \
    --url https://docs.python.org/3/ \
    --library python \
    --version 3.13

# 2. Start reader server and search
# (In Claude with reader MCP configured)
# Use: search_documentation("async generators")

Example 2: Browse Libraries

# Start reader MCP server
# (In Claude)
# Use: list_libraries()
# Use: browse_library("python", "3.13")

Roadmap

See Implementation Plan for:

  • Future enhancements (Playwright, semantic search, etc.)
  • Testing strategy
  • Detailed implementation notes

MVP Status

βœ… Complete - All core functionality implemented:

  • Phase 1: Foundation & Shared Components
  • Phase 2: Scraper Server (MCP + CLI)
  • Phase 3: Reader Server (MCP tools)

License

MIT

Acknowledgments

Copied Heavily inspired by docs-mcp-server.

About

MCP Servers for fetching and reading documentation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •