Skip to content

sandeshghanta/documentation-scraping-mcp

Repository files navigation

Docs MCP Server - Secure Local Setup

Run the Docs MCP Server (https://github.com/arabold/docs-mcp-server) completely isolated from the internet with local data storage.

Quick Start

1. Index Documentation (needs internet)

Note: Server does NOT need to be running for scraping. The scraper writes directly to ./data/docs.db.

./scrape-docs.sh react https://react.dev/reference/react 18.3.1
./scrape-docs.sh typescript https://www.typescriptlang.org/docs/

Data stored in: ./data/docs.db

2. Start Server (no internet, reads from database)

./start-docs-mcp-server.sh

Web interface: http://localhost:6280

3. Configure MCP Client

Add to your MCP client config (Claude Desktop, VS Code, Cline, etc.):

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "docs-mcp-server": {
      "type": "sse",
      "url": "http://localhost:6280/sse"
    }
  }
}

Restart your MCP client after updating the config.

4. Use It

Ask your AI assistant:

Search the React documentation for useState hooks

Security Features

Complete network isolation - Cannot reach external internet ✅ Read-only mode - Cannot modify or delete indexed data ✅ Manual scraping only - You control what gets indexed (no AI-initiated scraping) ✅ SHA-pinned images - Verified checksums prevent tampering ✅ Local data storage - Everything in ./data/ directory ✅ No telemetry - No analytics or tracking

Available Commands

Scraping

# Web URL
./scrape-docs.sh library https://example.com/docs [version]

# Local files
./scrape-docs.sh mylib file:///path/to/docs

Server Management

./start-docs-mcp-server.sh   # Start server
./stop-docs-mcp-server.sh    # Stop server
docker logs -f docs-mcp-server  # View logs

Image Updates

./update-image.sh  # Update to latest image (with SHA verification)

MCP Tools Available

Once connected, your AI can use:

  • search_docs - Search indexed documentation
  • list_libraries - Show all indexed libraries
  • find_version - Find matching versions

Important Security Note:

  • The server runs in read-only mode with no internet access
  • scrape_docs and fetch_url tools are disabled/non-functional
  • You are responsible for manually indexing documentation via ./scrape-docs.sh
  • This prevents the AI from making unauthorized network requests

Directory Structure

docs-mcp/
├── data/                    # SQLite database (auto-created)
├── start-docs-mcp-server.sh # Start isolated server
├── stop-docs-mcp-server.sh  # Stop server
├── scrape-docs.sh          # Index documentation
├── update-image.sh         # Update Docker image
└── README.md               # This file

How It Works

┌────────────────────────────────────────────┐
│ Step 1: Scraping (separate container)     │
│  ./scrape-docs.sh                          │
│  • Temporary Docker container              │
│  • Has internet access                     │
│  • Fetches docs from web                   │
│  • Writes to ./data/docs.db                │
│  • Container exits when done               │
└────────────────────────────────────────────┘
                    ↓ (writes to)
              ./data/docs.db
                    ↓ (reads from)
┌────────────────────────────────────────────┐
│ Step 2: Server (persistent container)     │
│  ./start-docs-mcp-server.sh                │
│  • Isolated network (no internet)          │
│  • Read-only mode                          │
│  • Reads from ./data/docs.db               │
│  • Serves MCP tools on port 6280           │
└────────────────────────────────────────────┘
                    ↓
┌────────────────────────────────────────────┐
│ Step 3: MCP Client                         │
│  Connects via http://localhost:6280/sse    │
└────────────────────────────────────────────┘

Key Points:

  • Scraping and serving are separate containers
  • Server does NOT need to run during scraping
  • Both read/write the same ./data/docs.db file
  • Scrape anytime (even with server running)

Advanced

Backup Data

cp -r ./data/ ./data-backup/

View Database

sqlite3 ./data/docs.db "SELECT name FROM libraries;"

Change Network Settings

Edit start-docs-mcp-server.sh to modify:

  • PORT - Server port
  • NETWORK_NAME - Docker network name
  • IMAGE - Docker image SHA (update via ./update-image.sh)

Security Notes

During scraping: Container has internet access to fetch documentation During serving: Container is completely isolated:

  • Custom bridge network with no masquerading
  • DNS disabled (127.0.0.1)
  • Network capabilities dropped (NET_RAW, NET_ADMIN)
  • SHA-pinned Docker images with checksum verification

Your documentation stays private and local!

About

Wrappers over arabold/docs-mcp-server to run it in a trusted way

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages