Skip to content

sen-ltd/linkcheck

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

linkcheck

Broken link scanner for Markdown and HTML with proper AST parsing, GitHub-style anchor resolution, and per-host rate limiting. Designed to drop into a CI pipeline next to your existing linters: --no-external skips the network for fast pre-commit hooks, --format github emits Actions annotations inline on PRs, and exit codes are CI-correct (0 OK, 1 broken, 2 config error).

# Clone
git clone https://github.com/sen-ltd/linkcheck
cd linkcheck

# Build & run via Docker (no Python install needed)
docker build -t linkcheck .
docker run --rm -v "$PWD:/work" linkcheck --no-external docs/

Screenshot

Why another link checker?

Three of the obvious tools each have a gap:

  • lychee is excellent and very fast, but it's Rust — adding it to a Python or JS project means another toolchain. Worth it for big sites; overkill for a 30-page repo.
  • markdown-link-check is Node-only and famously slow on larger trees, and it doesn't speak HTML.
  • html-linkchecker has been quiet since 2019 and doesn't understand Markdown at all.

linkcheck is a small Python CLI — pip install or docker run — that handles both Markdown and HTML in one pass, resolves internal anchors against the actual heading slugs each file would generate on GitHub, and respects per-host rate limits so you don't accidentally DoS a site you're trying to validate.

Two-second tour

# Default: scan a directory, check everything (local + external)
linkcheck docs/

# Fast, deterministic, no network — perfect for pre-commit
linkcheck --no-external docs/

# CI-ready: GitHub Actions annotations
linkcheck --format github docs/

# JSON for piping into anything else
linkcheck --format json docs/ > report.json

# Filter to a subset
linkcheck --include 'docs/api/*.md' --exclude '*draft*' docs/

What it checks

Link kind How
External http(s) Async HTTP HEAD via httpx, fall back to GET on 405/501
Internal file paths Filesystem Path.exists()
Internal anchors Parse target file, generate GitHub-style heading slugs, compare
Same-doc anchors Same as above but against the current file
mailto:, tel: Skipped with a note (no false positives)
Other schemes Skipped (data:, javascript:, ftp:, ...)

Anchor resolution

This is the bit most existing tools cut corners on. When you write [Usage](./guide.md#usage), the anchor usage only exists if guide.md has a heading whose GitHub slug is exactly usage. linkcheck parses the target file, runs every heading through the same slug algorithm GitHub uses (lowercase, strip punctuation, hyphenate spaces, suffix duplicates with -1, -2, ...), and compares against the requested anchor. This catches a huge class of "I renamed a section and forgot the link to it" bugs.

Per-host rate limiting

External checks run concurrently, but --per-host N (default 4) caps how many in-flight requests can hit any single hostname. Without this, a docs tree with 200 links to https://docs.python.org/... becomes 200 simultaneous connections — unfriendly at best, banned at worst. The limiter is per-host: github.com and python.org each get their own 4-slot pool, so you still get parallelism across hosts.

Flags

Flag Behavior
paths (positional) One or more files or directories. Required.
--include GLOB Only scan matching files. Repeatable.
--exclude GLOB Skip matching files. Repeatable.
--no-external Skip http/https checks entirely. Fast and deterministic.
--timeout SECONDS HTTP timeout (default 10).
--retries N Retry transient failures with backoff (default 2).
--per-host N Max concurrent requests per host (default 4).
--status-accept 200,301,... Whitelist HTTP status codes as success.
--user-agent STRING Override the request UA (defaults to linkcheck-cli/<version>).
--insecure Skip TLS verification. Off by default.
--format {human,json,github} Output format. Default human.
--no-color Disable ANSI color in human output.
--version Print version and exit.

Exit codes

Code Meaning
0 All links OK.
1 At least one broken link found.
2 Configuration error (missing path, bad flag).

Running without Docker

pip install .
linkcheck --no-external docs/

Requires Python 3.10+. Dependencies: httpx, markdown-it-py. That's it.

Tests

docker run --rm --entrypoint pytest linkcheck -q

Tests use httpx.MockTransport so no real network calls are made — the suite is fully self-contained and runs in under a second.

License

MIT. See LICENSE.

Links

About

Broken link scanner for Markdown and HTML with anchor resolution

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors