MCP server that fetches URLs and extracts clean, readable content as Markdown or plain text. Built on the Model Context Protocol so AI assistants (Claude, Cline, OpenCode, etc.) can read web pages.
- Extracts main article content from any URL using Readability
- Converts HTML to clean Markdown via Turndown
- Extracts metadata (title, author, publish date, Open Graph tags)
- Optionally extracts all links from the page
- Optional JavaScript rendering via Playwright for SPAs and JS-heavy sites
- Exposes a single
webReadertool over MCP Streamable HTTP transport
bun install
bun run devThe server starts on http://localhost:3001 (or PORT env var).
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
string (required) |
— | URL to fetch |
return_format |
"markdown" | "text" |
"markdown" |
Output format |
timeout |
number (1–60) |
20 |
Request timeout in seconds |
retain_images |
boolean |
true |
Keep image references in output |
with_links_summary |
boolean |
false |
Include a summary of all links |
render_js |
boolean |
false |
Use Playwright to render JavaScript |
{
"title": "Article Title",
"url": "https://example.com/article",
"content": "# Article Title\n\nArticle content in markdown...",
"excerpt": "A brief summary of the article.",
"byline": "Author Name",
"siteName": "Example",
"publishedTime": "2025-01-15T10:00:00Z",
"metadata": { "og:description": "..." }
}{
"mcpServers": {
"web-reader": {
"url": "http://localhost:3001/mcp"
}
}
}{
"mcpServers": {
"web-reader": {
"url": "http://localhost:3001/mcp",
"transportType": "streamable-http"
}
}
}By default, pages are fetched with a simple HTTP client (got). For JavaScript-heavy sites (SPAs, React apps), enable Playwright:
bun add playwright
npx playwright install chromiumThen set render_js: true when calling the tool. Note that Playwright requires ~300–500 MB RAM per browser tab.
| Command | Description |
|---|---|
bun run dev |
Development server with hot reload |
bun run build |
Compile TypeScript |
bun run start |
Production server (node dist/index.js) |
bun run typecheck |
Type-check without emitting |
docker compose up --buildRuns on port 3000 with health checks enabled.
src/
├── index.ts # Express server + MCP session management
├── types.ts # Shared TypeScript interfaces
├── tools/
│ └── web-reader.ts # Tool registration (Zod schema + handler)
└── lib/
├── fetcher.ts # HTTP fetching (got)
├── fetcher-playwright.ts # JS rendering (Playwright, optional)
├── parser.ts # HTML → structured content (Readability)
└── converter.ts # HTML → Markdown (Turndown)
- Runtime: Node.js (ES2022)
- Package Manager: Bun
- HTTP Server: Express
- MCP SDK:
@modelcontextprotocol/sdkv1 - HTML Parsing: jsdom + Readability
- Markdown: Turndown
- HTTP Client: got
- JS Rendering: Playwright (optional)
MIT