What's New in v1.13.1
Changed
- Release automation (
scripts/release.py) reordered: the GitHub
release is now created only after the publish-pypi.yml workflow
reports success and the version is live on PyPI, so the pip install
line in the release notes is never a lie. A new preflight step runs
the same pip-audit invocation CI uses, so a vulnerability that
would block publishing is caught locally before the tag is pushed.
When the publish step does fail post-tag, the script now prints the
exact recovery steps (rerun the workflow vs. burn the version)
instead of exiting silently.
- The
pip-audit ignore list now lives in a single
scripts/audit.sh, invoked by ci.yml,
dependency-review.yml, publish-pypi.yml, and the release
preflight, so the four call sites cannot drift.
Security
- Bumped
idna 3.11 → 3.15 to clear CVE-2026-45409.
- Added
PYSEC-2025-183 (pyjwt, transitive via mcp, no upstream
fix yet) to the pip-audit ignore list, alongside the existing
CVE-2026-4539 (pygments, dev-only) and CVE-2026-3219 (pip,
build-time only) entries.
BREAKING
pdf_read_pages response shape: per-image dicts now carry image_id
(content-addressed basename) instead of path (absolute filesystem
path), and the per-page render_path field is replaced by
render_id. Rationale: API hygiene. The previous path field
embedded the current cache directory, so the value was unstable
across runs and across PDF_MCP_CACHE_DIR changes; the new IDs are
stable opaque tokens. Callers that need bytes resolve the ID against
images_dir / renders_dir from pdf_cache_stats, or call
pdf_render_pages (which inlines PNG content blocks for vision
models). No compatibility shim, since these keys have never
appeared in a released version.
Added
pdf_cache_stats response now includes images_dir and
renders_dir so callers can resolve the opaque image_id /
render_id returned by pdf_read_pages to disk paths when they
need to read bytes directly. The tool description marks
pdf_cache_stats as cache diagnostics.
[limits].max_response_bytes config option (default 200 KB, max 2 MB)
capping pdf_read_all and section-granularity pdf_search response
payloads. New response fields: truncated, truncated_pages,
truncated_bytes, bytes_returned, bytes_available, next_page
(on pdf_read_all) and matches_omitted (on section search).
- Untrusted-content security preamble on every MCP tool that returns
PDF-derived text/OCR/section content, visible to non-Claude-Code
clients via the tool description field.
Security
url_fetcher now rejects non-PDF content-types (text/*,
application/json, image/audio/video, etc.) before buffering bytes.
- Expanded IPv6 SSRF deny list:
::ffff:0:0/96 (IPv4-mapped),
64:ff9b::/96, 100::/64, 2001:db8::/32, fd00:ec2::254/128
(AWS IMDS over IPv6), and ::/128 (unspecified). IPv4-mapped IPv6
addresses are unwrapped and re-tested against the IPv4 deny list.
url_fetcher now pins the DNS-resolved IP per redirect hop,
closing the TOCTOU gap between SSRF validation and TCP connect.
- Cache directory is now
chmod 0o700 after creation (defense-in-
depth). pdf-mcp's supported deployment is single-user, so this
does not patch an in-scope threat — it tightens permissions to
match images/ and renders/ which were already 0o700, and
reduces blast radius if the supported model ever expands.
Changed
PDF_MCP_CACHE_DIR and PDF_MCP_CACHE_TTL environment variables
are now honored at server startup (previously declared in the MCP
registry manifest but not wired into the Python code). CACHE_TTL
must parse as an integer in [0, 8760] hours (up to one year) —
bad values fail loud at startup rather than silently falling back
to the default.
pdf_read_all now accepts start_page: int (default 1) and
echoes the post-clamp value in the response. The pre-existing
next_page field in the response is now consumable: pass it back
as start_page to resume the read on a clean page boundary.
Previously next_page named a continuation cursor the tool had
no parameter to accept, forcing callers to fall back to
pdf_read_pages for the resume. A regression test enforces the
invariant that iterating start_page=next_page covers every page
exactly once.
- The MCP
initialize handshake now reports pdf-mcp's __version__
as serverInfo.version. Previously the field carried FastMCP's
framework version (e.g. 3.2.4) because no explicit version=
was passed to FastMCP(...), so MCP clients could not tell
pdf-mcp releases apart from the handshake alone.
- SSRF rejection now surfaces a self-describing error
("URL host resolves to a blocked IP on the SSRF deny list (loopback /
RFC 1918 / link-local / IMDS / IPv6 ULA): …") instead of the previous
generic "URL does not point to a valid PDF file" wrapper, so security
blocks are no longer indistinguishable from format problems or
filesystem 404s.
URLFetcher.is_url now recognises http:// URLs as well as
https://, routing them through the validator so callers get a clear
"Only HTTPS URLs are supported" error rather than the misleading
"PDF file not found" path-resolution error.
pdf_search section-mode docstring clarifies that matches_omitted
counts byte-cap drops only — drops caused by a low max_results are
not counted there (re-query with a higher max_results to see them).
pdf_info docstring clarifies that the toc field is gated by
toc_entry_count <= 50, independent of the detail flag (which only
controls per-page text_coverage arrays).
pdf_search @mcp.tool description corrected from "keyword,
semantic, or hybrid (RRF) modes" to "keyword, semantic, or auto
(hybrid RRF) modes" — the public mode name is auto, hybrid is
rejected. The runtime always accepted only auto/keyword/semantic;
the description was wrong, so a caller reading the tool description
would try mode="hybrid" and get an inline error.
pdf_search and pdf_info tool descriptions now carry the
matches_omitted byte-cap-only semantics and the toc ≤50 gating
note. Previously these clarifications lived only in function
docstrings, which FastMCP does not surface as description= on the
wire — so LLM callers couldn't see them.
Documentation
- Clarified
[limits].max_response_bytes docstring: the cap bounds
the text content field (full_text on pdf_read_all; section
titles + overhead on section-mode pdf_search), not the wire-
level MCP TextContent envelope. The envelope adds ~300–500 bytes
of other response fields and JSON framing on top of the cap.
Installation
pip install pdf-mcp==1.13.1
Links