rdrr

Convert any URL to clean markdown for AI agents.

npx rdrr https://react.dev/learn

Features

Fast: no headless browser, lightweight
Smart: 20+ site-specific extractors (Wiki, Reddit, X, Hacker News, GitHub, ChatGPT, Claude, Substack, ...)
LLM-ready: strips ads, navigation, footers; keeps code blocks, tables, math
Versatile: webpages, GitHub issues/PRs, PDFs, X profiles, YouTube transcripts

Install

npm install rdrr

Quick start

CLI

# Webpage
rdrr https://react.dev/learn

# YouTube transcript
rdrr https://www.youtube.com/watch?v=dQw4w9WgXcQ

# GitHub issue with comments
rdrr https://github.com/mozilla/readability/issues/1

# X timeline
rdrr https://x.com/discotune -n 10

# Save to file
rdrr https://example.com -o article.md

# JSON with metadata
rdrr https://example.com --json

Library

import { parse } from "rdrr"

const result = await parse("https://en.wikipedia.org/wiki/TypeScript")

result.title       // "TypeScript"
result.content     // clean markdown
result.wordCount   // 2847
result.siteName    // "Wikipedia"

CLI flags

Flag	Description
`-o, --output <file>`	Save to file instead of stdout
`-j, --json`	Full JSON with metadata
`-p, --property <name>`	Extract a single field (`title`, `content`, ...)
`-l, --language <code>`	Preferred language (BCP 47)
`-n, --limit <n>`	Max items for aggregate URLs (default: `10`)
`--order <order>`	`newest` (default) or `oldest`
`--check`	Probe if URL is readable (exit 0/1)
`--llms`	Append site's `/llms.txt`
`--debug`	Pipeline diagnostics to stderr

API

`parse(url, options?)`

import { parse } from "rdrr"

const result = await parse(url, {
  language: "en",
  includeLlmsTxt: true,
})

Returns a ParseResult with type, title, author, content, description, domain, siteName, published, wordCount, readTime, and more. The result is narrowed by type: "webpage", "youtube", "github", "pdf", or "x-profile".

`parseHtml(html, options?)`

Run the extraction engine on raw HTML: useful for saved pages or pipelines where you already have the bytes.

import { parseHtml } from "rdrr"

const result = await parseHtml(html, {
  url: "https://example.com/article",
})

`isProbablyReaderable(input)`

Lightweight pre-check: will this URL yield a meaningful article? Useful for routing in AI agents.

import { isProbablyReaderable } from "rdrr"

await isProbablyReaderable("https://example.com") // true | false

Also available as direct imports: parseWeb, parseYouTube, parseGitHub, parsePdf, detectUrlType, extractVideoId, normalizeUrl.

Supported sources

Type	What it handles
Webpages	Any HTML page with 20+ site-specific extractors
YouTube	Transcripts with chapters, speakers, timestamps
GitHub	Issues, PRs (with comments), raw files
PDFs	Any public `.pdf` (requires optional `unpdf`)
X/Twitter	Single posts and full profile timelines
llms.txt	Appended on demand via `--llms` or `includeLlmsTxt`

Community

Discussion, questions, site-extractor requests: GitHub Discussions
Bugs: GitHub Issues
Security: see SECURITY.md

Contributing

Contributions welcome! See CONTRIBUTING.md.

Want to add a site extractor? Check out src/extract/sites/: each one is a self-contained file.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github		.github
src		src
.gitignore		.gitignore
.oxfmtrc.json		.oxfmtrc.json
.oxlintrc.json		.oxlintrc.json
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
knip.json		knip.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
tsdown.config.ts		tsdown.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rdrr

Features

Install

Quick start

CLI

Library

CLI flags

API

`parse(url, options?)`

`parseHtml(html, options?)`

`isProbablyReaderable(input)`

Supported sources

Community

Contributing

License

About

Uh oh!

Releases 3

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

rdrr

Features

Install

Quick start

CLI

Library

CLI flags

API

parse(url, options?)

parseHtml(html, options?)

isProbablyReaderable(input)

Supported sources

Community

Contributing

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Contributors

Uh oh!

Languages

`parse(url, options?)`

`parseHtml(html, options?)`

`isProbablyReaderable(input)`