Broken link scanner for Markdown and HTML with proper AST parsing, GitHub-style anchor resolution, and per-host rate limiting. Designed to drop into a CI pipeline next to your existing linters: --no-external skips the network for fast pre-commit hooks, --format github emits Actions annotations inline on PRs, and exit codes are CI-correct (0 OK, 1 broken, 2 config error).
# Clone
git clone https://github.com/sen-ltd/linkcheck
cd linkcheck
# Build & run via Docker (no Python install needed)
docker build -t linkcheck .
docker run --rm -v "$PWD:/work" linkcheck --no-external docs/Three of the obvious tools each have a gap:
lycheeis excellent and very fast, but it's Rust — adding it to a Python or JS project means another toolchain. Worth it for big sites; overkill for a 30-page repo.markdown-link-checkis Node-only and famously slow on larger trees, and it doesn't speak HTML.html-linkcheckerhas been quiet since 2019 and doesn't understand Markdown at all.
linkcheck is a small Python CLI — pip install or docker run — that handles both Markdown and HTML in one pass, resolves internal anchors against the actual heading slugs each file would generate on GitHub, and respects per-host rate limits so you don't accidentally DoS a site you're trying to validate.
# Default: scan a directory, check everything (local + external)
linkcheck docs/
# Fast, deterministic, no network — perfect for pre-commit
linkcheck --no-external docs/
# CI-ready: GitHub Actions annotations
linkcheck --format github docs/
# JSON for piping into anything else
linkcheck --format json docs/ > report.json
# Filter to a subset
linkcheck --include 'docs/api/*.md' --exclude '*draft*' docs/| Link kind | How |
|---|---|
External http(s) |
Async HTTP HEAD via httpx, fall back to GET on 405/501 |
| Internal file paths | Filesystem Path.exists() |
| Internal anchors | Parse target file, generate GitHub-style heading slugs, compare |
| Same-doc anchors | Same as above but against the current file |
mailto:, tel: |
Skipped with a note (no false positives) |
| Other schemes | Skipped (data:, javascript:, ftp:, ...) |
This is the bit most existing tools cut corners on. When you write [Usage](./guide.md#usage), the anchor usage only exists if guide.md has a heading whose GitHub slug is exactly usage. linkcheck parses the target file, runs every heading through the same slug algorithm GitHub uses (lowercase, strip punctuation, hyphenate spaces, suffix duplicates with -1, -2, ...), and compares against the requested anchor. This catches a huge class of "I renamed a section and forgot the link to it" bugs.
External checks run concurrently, but --per-host N (default 4) caps how many in-flight requests can hit any single hostname. Without this, a docs tree with 200 links to https://docs.python.org/... becomes 200 simultaneous connections — unfriendly at best, banned at worst. The limiter is per-host: github.com and python.org each get their own 4-slot pool, so you still get parallelism across hosts.
| Flag | Behavior |
|---|---|
paths (positional) |
One or more files or directories. Required. |
--include GLOB |
Only scan matching files. Repeatable. |
--exclude GLOB |
Skip matching files. Repeatable. |
--no-external |
Skip http/https checks entirely. Fast and deterministic. |
--timeout SECONDS |
HTTP timeout (default 10). |
--retries N |
Retry transient failures with backoff (default 2). |
--per-host N |
Max concurrent requests per host (default 4). |
--status-accept 200,301,... |
Whitelist HTTP status codes as success. |
--user-agent STRING |
Override the request UA (defaults to linkcheck-cli/<version>). |
--insecure |
Skip TLS verification. Off by default. |
--format {human,json,github} |
Output format. Default human. |
--no-color |
Disable ANSI color in human output. |
--version |
Print version and exit. |
| Code | Meaning |
|---|---|
| 0 | All links OK. |
| 1 | At least one broken link found. |
| 2 | Configuration error (missing path, bad flag). |
pip install .
linkcheck --no-external docs/Requires Python 3.10+. Dependencies: httpx, markdown-it-py. That's it.
docker run --rm --entrypoint pytest linkcheck -qTests use httpx.MockTransport so no real network calls are made — the suite is fully self-contained and runs in under a second.
MIT. See LICENSE.
