Convert HTML or a URL to Markdown.
pip install htmlquillOptional Playwright backend:
pip install "htmlquill[browser]"
playwright install chromium# Auto-save using the first Markdown heading
htmlquill convert https://example.com/article
# Manual output path
htmlquill convert https://example.com/article -o article.md
# Preview generated filename without saving
htmlquill convert https://example.com/article --filename-only
# Print Markdown content without saving
htmlquill convert https://example.com/article --stdout
# Save generated filename to a target directory
htmlquill convert https://example.com/article --output-dir notes
# Limit generated filename stem length
htmlquill convert https://example.com/article --filename-max-length 60
# Inspect effective config
htmlquill config show https://example.com
# Initialize config and inspect paths
htmlquill config init
htmlquill config path
# Run diagnostics
htmlquill doctor
# Count generated Markdown structure
htmlquill analyse example.md
# Preview Markdown in the terminal
htmlquill preview example.mdhtmlquill SOURCE is retained as shorthand for htmlquill convert SOURCE; it now follows the same auto-save behavior unless --stdout is used.
htmlquill convert SOURCE [options]htmlquill config path|show|init|validatehtmlquill auth path|show|inithtmlquill doctor [--url URL] [--fetch] [--json] [--strict]htmlquill analyse SOURCE(alias:htmlquill analyze SOURCE)htmlquill preview SOURCE
| Option | Description |
|---|---|
SOURCE |
URL (https://...), HTML file path, or - for stdin |
-o, --output PATH |
Manual output file path. Overrides generated filename. |
--stdout |
Print converted Markdown to stdout and do not save. |
--filename-only |
Print resolved output filename and do not save. |
--filename-max-length N |
Max generated filename stem length, excluding .md. Default: 80. |
--output-dir DIR |
Directory for generated output files. Default: current directory. |
--force |
Overwrite generated output target instead of adding a numeric suffix. |
--timeout |
HTTP timeout override in seconds |
--user-agent |
Custom HTTP User-Agent header |
--browser |
Fetching mode override: auto, requests, playwright, chromium |
--config PATH |
Use this config file |
--no-config |
Disable config loading |
--auth-file PATH |
Use this auth file |
--no-auth |
Disable auth loading |
--profile NAME |
Force a named auth profile |
--print-config |
Deprecated; use htmlquill config show URL |
auto(default): triesrequestsfirst; on HTTP 403 or detected challenge page, falls back to system Chromium, then Playwright.requests: plain HTTP viarequests.chromium: uses system Chromium via subprocess.playwright: uses Playwright Chromium (optional dependency).
htmlquill resolves config file paths in this order:
--config PATHHTMLQUILL_CONFIG$XDG_CONFIG_HOME/htmlquill/config.toml~/.config/htmlquill/config.toml
Example config.toml:
version = 1
[defaults]
adapter = "html"
browser = "auto"
timeout = 30.0
fail_on_challenge = true
fallback_on_challenge = true
[paths]
auth_file = "~/.config/htmlquill/auth.json"
[challenge]
markers = [
"Performing security verification",
"verifies you are not a bot",
"You've been blocked by network security",
"blocked by network security",
"If you think you've been blocked by mistake, file a ticket",
]
[sites."medium.com"]
browser = "chromium"
timeout = 60.0
auth = "medium"HtmlQuill supports browser-state auth profiles through auth.json.
Use this when a site works in an already-authenticated browser session and you want HtmlQuill to reuse that state.
Auth file resolution order:
--auth-file PATHHTMLQUILL_AUTH[paths].auth_filefrom config$XDG_CONFIG_HOME/htmlquill/auth.jsonor~/.config/htmlquill/auth.json
Example auth.json:
{
"version": 1,
"profiles": {
"medium": {
"kind": "browser_state",
"playwright_storage_state": "~/.config/htmlquill/auth/medium.storage-state.json",
"chromium_user_data_dir": "~/.config/htmlquill/chromium/medium"
}
}
}Security notes:
- Do not commit auth files, storage-state files, or browser profile directories.
- Recommended permissions:
chmod 600 ~/.config/htmlquill/auth.json. - Recommended browser profile directory permissions:
chmod 700 ~/.config/htmlquill/chromium/medium.
HtmlQuill no longer ships a Reddit API/OAuth adapter. Reddit URLs are processed through the normal HTML fetch path, the same as other URLs. If Reddit returns a network-security or login interstitial, use a browser-based fetch profile, retry later, or export/save the page manually. htmlquill auth login reddit is intentionally not available.
from htmlquill import html_to_markdown, url_to_markdown
markdown = html_to_markdown("<h1>Hello</h1><p>World</p>")
markdown = url_to_markdown("https://example.com")
# New optional controls (all optional)
markdown = url_to_markdown(
"https://example.com",
browser="requests",
config=True,
auth=False,
)pip install -e ".[dev]"
pytest -q
ruff check .