Release v1.13.1 · jztan/pdf-mcp

What's New in v1.13.1

Changed

Release automation (scripts/release.py) reordered: the GitHub
release is now created only after the publish-pypi.yml workflow
reports success and the version is live on PyPI, so the pip install
line in the release notes is never a lie. A new preflight step runs
the same pip-audit invocation CI uses, so a vulnerability that
would block publishing is caught locally before the tag is pushed.
When the publish step does fail post-tag, the script now prints the
exact recovery steps (rerun the workflow vs. burn the version)
instead of exiting silently.
The pip-audit ignore list now lives in a single
scripts/audit.sh, invoked by ci.yml,
dependency-review.yml, publish-pypi.yml, and the release
preflight, so the four call sites cannot drift.

Security

Bumped idna 3.11 → 3.15 to clear CVE-2026-45409.
Added PYSEC-2025-183 (pyjwt, transitive via mcp, no upstream
fix yet) to the pip-audit ignore list, alongside the existing
CVE-2026-4539 (pygments, dev-only) and CVE-2026-3219 (pip,
build-time only) entries.

BREAKING

pdf_read_pages response shape: per-image dicts now carry image_id
(content-addressed basename) instead of path (absolute filesystem
path), and the per-page render_path field is replaced by
render_id. Rationale: API hygiene. The previous path field
embedded the current cache directory, so the value was unstable
across runs and across PDF_MCP_CACHE_DIR changes; the new IDs are
stable opaque tokens. Callers that need bytes resolve the ID against
images_dir / renders_dir from pdf_cache_stats, or call
pdf_render_pages (which inlines PNG content blocks for vision
models). No compatibility shim, since these keys have never
appeared in a released version.

Added

pdf_cache_stats response now includes images_dir and
renders_dir so callers can resolve the opaque image_id /
render_id returned by pdf_read_pages to disk paths when they
need to read bytes directly. The tool description marks
pdf_cache_stats as cache diagnostics.
[limits].max_response_bytes config option (default 200 KB, max 2 MB)
capping pdf_read_all and section-granularity pdf_search response
payloads. New response fields: truncated, truncated_pages,
truncated_bytes, bytes_returned, bytes_available, next_page
(on pdf_read_all) and matches_omitted (on section search).
Untrusted-content security preamble on every MCP tool that returns
PDF-derived text/OCR/section content, visible to non-Claude-Code
clients via the tool description field.

Security

url_fetcher now rejects non-PDF content-types (text/*,
application/json, image/audio/video, etc.) before buffering bytes.
Expanded IPv6 SSRF deny list: ::ffff:0:0/96 (IPv4-mapped),
64:ff9b::/96, 100::/64, 2001:db8::/32, fd00:ec2::254/128
(AWS IMDS over IPv6), and ::/128 (unspecified). IPv4-mapped IPv6
addresses are unwrapped and re-tested against the IPv4 deny list.
url_fetcher now pins the DNS-resolved IP per redirect hop,
closing the TOCTOU gap between SSRF validation and TCP connect.
Cache directory is now chmod 0o700 after creation (defense-in-
depth). pdf-mcp's supported deployment is single-user, so this
does not patch an in-scope threat — it tightens permissions to
match images/ and renders/ which were already 0o700, and
reduces blast radius if the supported model ever expands.

Changed

PDF_MCP_CACHE_DIR and PDF_MCP_CACHE_TTL environment variables
are now honored at server startup (previously declared in the MCP
registry manifest but not wired into the Python code). CACHE_TTL
must parse as an integer in [0, 8760] hours (up to one year) —
bad values fail loud at startup rather than silently falling back
to the default.
pdf_read_all now accepts start_page: int (default 1) and
echoes the post-clamp value in the response. The pre-existing
next_page field in the response is now consumable: pass it back
as start_page to resume the read on a clean page boundary.
Previously next_page named a continuation cursor the tool had
no parameter to accept, forcing callers to fall back to
pdf_read_pages for the resume. A regression test enforces the
invariant that iterating start_page=next_page covers every page
exactly once.
The MCP initialize handshake now reports pdf-mcp's __version__
as serverInfo.version. Previously the field carried FastMCP's
framework version (e.g. 3.2.4) because no explicit version=
was passed to FastMCP(...), so MCP clients could not tell
pdf-mcp releases apart from the handshake alone.
SSRF rejection now surfaces a self-describing error
("URL host resolves to a blocked IP on the SSRF deny list (loopback /
RFC 1918 / link-local / IMDS / IPv6 ULA): …") instead of the previous
generic "URL does not point to a valid PDF file" wrapper, so security
blocks are no longer indistinguishable from format problems or
filesystem 404s.
URLFetcher.is_url now recognises http:// URLs as well as
https://, routing them through the validator so callers get a clear
"Only HTTPS URLs are supported" error rather than the misleading
"PDF file not found" path-resolution error.
pdf_search section-mode docstring clarifies that matches_omitted
counts byte-cap drops only — drops caused by a low max_results are
not counted there (re-query with a higher max_results to see them).
pdf_info docstring clarifies that the toc field is gated by
toc_entry_count <= 50, independent of the detail flag (which only
controls per-page text_coverage arrays).
pdf_search @mcp.tool description corrected from "keyword,
semantic, or hybrid (RRF) modes" to "keyword, semantic, or auto
(hybrid RRF) modes" — the public mode name is auto, hybrid is
rejected. The runtime always accepted only auto/keyword/semantic;
the description was wrong, so a caller reading the tool description
would try mode="hybrid" and get an inline error.
pdf_search and pdf_info tool descriptions now carry the
matches_omitted byte-cap-only semantics and the toc ≤50 gating
note. Previously these clarifications lived only in function
docstrings, which FastMCP does not surface as description= on the
wire — so LLM callers couldn't see them.

Documentation

Clarified [limits].max_response_bytes docstring: the cap bounds
the text content field (full_text on pdf_read_all; section
titles + overhead on section-mode pdf_search), not the wire-
level MCP TextContent envelope. The envelope adds ~300–500 bytes
of other response fields and JSON framing on top of the cap.

Installation

pip install pdf-mcp==1.13.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.13.1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's New in v1.13.1

Changed

Security

BREAKING

Added

Security

Changed

Documentation

Installation

Links

Uh oh!