Two Python-based MCP servers for documentation scraping and reading. Copied Heavily inspired by docs-mcp-server.
- Scraper Server: Crawls documentation sites and stores content in SQLite
- Reader Server: Searches and retrieves stored documentation
- π·οΈ Site crawling with configurable depth and page limits
- π HTML content extraction (title, content, metadata)
- πΎ SQLite storage with FTS5 full-text search
- π§ MCP tools for Claude integration
- π» CLI for direct usage
- π Keyword search across all documentation
- π List available libraries and versions
- π Browse documentation by library
- π― Retrieve specific documents by URL
- π§ 5 MCP tools for Claude integration
# Clone the repository
cd docs-mcp-server-python
# Install dependencies
uv sync --extra devDeploy both servers using Docker for production or containerized development.
# Build images
cd docker && ./build.sh
# Deploy services
cd ../deploy && docker-compose up -dBoth servers are now running:
- Scraper MCP Server: http://localhost:6281
- Reader MCP Server: http://localhost:6282
- π³ Containerized: Isolated, reproducible environments
- π Network Isolation: Reader server cannot access internet (security)
- πΎ Shared Database: Single SQLite database via Docker volume
- π Auto-restart: Services restart on failure
- βοΈ Configurable: Customize via
.envfile
Scraper (6281) βββ¬ββ> Shared SQLite Database
β
Reader (6282) βββ (Network Isolated)
- Building Images: See docker/README.md
- Deployment Guide: See deploy/README.md
- Full Details: Complete instructions, troubleshooting, and configuration options
# Service management
docker-compose up -d # Start all
docker-compose logs -f reader # View logs
docker-compose restart scraper # Restart service
docker-compose down # Stop all
# Individual services
docker-compose up -d scraper # Start scraper only
docker-compose stop reader # Stop reader only# Scrape React documentation
uv run python scraper/cli.py scrape \
--url https://react.dev \
--library react \
--version 19.0 \
--max-depth 2 \
--max-pages 50
# View all options
uv run python scraper/cli.py scrape --helpAdd to your MCP client configuration (e.g., Claude Desktop) for local stdio-based execution:
{
"mcpServers": {
"docs-scraper": {
"command": "uv",
"args": [
"--directory",
"/path/to/docs-mcp-server-python",
"run",
"python",
"scraper/server.py"
],
"env": {
"MCP_TRANSPORT": "stdio"
}
}
}
}Then use the scrape_documentation tool in Claude.
Note: For Docker deployments, see MCP Configuration for Docker Containers below.
Add to your MCP client configuration for local stdio-based execution:
{
"mcpServers": {
"docs-reader": {
"command": "uv",
"args": [
"--directory",
"/path/to/docs-mcp-server-python",
"run",
"python",
"reader/server.py"
],
"env": {
"MCP_TRANSPORT": "stdio"
}
}
}
}Available MCP tools:
search_documentation- Search with keywordslist_libraries- List all available librarieslist_versions- List versions for a libraryget_document- Retrieve document by URLbrowse_library- Browse all documents for a library
The servers run with HTTP transport in Docker, making them accessible as network services.
After starting the Docker containers (docker-compose up -d), add the servers:
# Add scraper server
claude mcp add --transport http docs-scraper http://localhost:6281/mcp
# Add reader server
claude mcp add --transport http docs-reader http://localhost:6282/mcpAlternatively, add to your MCP client configuration:
Scraper Server (Docker)
{
"mcpServers": {
"docs-scraper": {
"transport": "http",
"url": "http://localhost:6281/mcp"
}
}
}Reader Server (Docker)
{
"mcpServers": {
"docs-reader": {
"transport": "http",
"url": "http://localhost:6282/mcp"
}
}
}Requirements:
- Docker containers must be running (
docker-compose up -d) - Servers use HTTP transport for remote access
- Database is shared between containers via bind mount at
deploy/data/
Note: The servers auto-detect transport mode. They use HTTP by default (for Docker), but you can force stdio mode by setting MCP_TRANSPORT=stdio environment variable.
Environment variables (optional):
# Database location (default: ~/.docs-mcp/documentation.db)
export DOCS_MCP_DB_PATH=/path/to/database.db
# Scraping settings
export DOCS_MCP_USER_AGENT="MyBot/1.0"
export DOCS_MCP_TIMEOUT=30
export DOCS_MCP_DELAY=0.5
export DOCS_MCP_MAX_DEPTH=3
export DOCS_MCP_MAX_PAGES=100docs-mcp-server-python/
βββ scraper/ # Scraper server
β βββ server.py # MCP server
β βββ cli.py # CLI commands
β βββ crawler.py # Site crawler
β βββ extractors/ # Content extractors
β β βββ base.py # Base interface
β β βββ html.py # HTML extractor
β βββ strategies/ # Scraping strategies
β βββ base.py # Base interface
β βββ site_crawler.py # Site crawler strategy
βββ reader/ # Reader server
β βββ server.py # MCP server
β βββ search.py # Search functionality
β βββ query.py # Query builder
βββ shared/ # Shared components
β βββ database.py # Database manager
β βββ schema.py # Database schema
β βββ models.py # Data models
β βββ config.py # Configuration
βββ docker/ # Docker image building
β βββ Dockerfile.scraper # Scraper image
β βββ Dockerfile.reader # Reader image
β βββ build.sh # Build script
β βββ push.sh # Push to registry
β βββ README.md # Build documentation
βββ deploy/ # Deployment configuration
β βββ docker-compose.yml # Service orchestration
β βββ .env.example # Configuration template
β βββ README.md # Deployment guide
βββ tests/ # Tests
Simple schema with FTS5 full-text search:
documents (
id, library, version, url, title, content,
raw_html, metadata, scraped_at, updated_at
)
documents_fts (FTS5 virtual table for search)# Run all tests
uv run pytest -v
# Run specific test file
uv run pytest tests/shared/test_database.py -vuv run mypy shared/ scraper/ reader/ --strictuv run ruff check .
uv run ruff format .# 1. Scrape Python documentation
uv run python scraper/cli.py scrape \
--url https://docs.python.org/3/ \
--library python \
--version 3.13
# 2. Start reader server and search
# (In Claude with reader MCP configured)
# Use: search_documentation("async generators")# Start reader MCP server
# (In Claude)
# Use: list_libraries()
# Use: browse_library("python", "3.13")See Implementation Plan for:
- Future enhancements (Playwright, semantic search, etc.)
- Testing strategy
- Detailed implementation notes
β Complete - All core functionality implemented:
- Phase 1: Foundation & Shared Components
- Phase 2: Scraper Server (MCP + CLI)
- Phase 3: Reader Server (MCP tools)
MIT
Copied Heavily inspired by docs-mcp-server.