Skip to content

v1.13.1

Choose a tag to compare

@jztan jztan released this 21 May 11:49
· 244 commits to develop since this release

What's New in v1.13.1

Changed

  • Release automation (scripts/release.py) reordered: the GitHub
    release is now created only after the publish-pypi.yml workflow
    reports success and the version is live on PyPI, so the pip install
    line in the release notes is never a lie. A new preflight step runs
    the same pip-audit invocation CI uses, so a vulnerability that
    would block publishing is caught locally before the tag is pushed.
    When the publish step does fail post-tag, the script now prints the
    exact recovery steps (rerun the workflow vs. burn the version)
    instead of exiting silently.
  • The pip-audit ignore list now lives in a single
    scripts/audit.sh, invoked by ci.yml,
    dependency-review.yml, publish-pypi.yml, and the release
    preflight, so the four call sites cannot drift.

Security

  • Bumped idna 3.11 → 3.15 to clear CVE-2026-45409.
  • Added PYSEC-2025-183 (pyjwt, transitive via mcp, no upstream
    fix yet) to the pip-audit ignore list, alongside the existing
    CVE-2026-4539 (pygments, dev-only) and CVE-2026-3219 (pip,
    build-time only) entries.

BREAKING

  • pdf_read_pages response shape: per-image dicts now carry image_id
    (content-addressed basename) instead of path (absolute filesystem
    path), and the per-page render_path field is replaced by
    render_id. Rationale: API hygiene. The previous path field
    embedded the current cache directory, so the value was unstable
    across runs and across PDF_MCP_CACHE_DIR changes; the new IDs are
    stable opaque tokens. Callers that need bytes resolve the ID against
    images_dir / renders_dir from pdf_cache_stats, or call
    pdf_render_pages (which inlines PNG content blocks for vision
    models). No compatibility shim, since these keys have never
    appeared in a released version.

Added

  • pdf_cache_stats response now includes images_dir and
    renders_dir so callers can resolve the opaque image_id /
    render_id returned by pdf_read_pages to disk paths when they
    need to read bytes directly. The tool description marks
    pdf_cache_stats as cache diagnostics.
  • [limits].max_response_bytes config option (default 200 KB, max 2 MB)
    capping pdf_read_all and section-granularity pdf_search response
    payloads. New response fields: truncated, truncated_pages,
    truncated_bytes, bytes_returned, bytes_available, next_page
    (on pdf_read_all) and matches_omitted (on section search).
  • Untrusted-content security preamble on every MCP tool that returns
    PDF-derived text/OCR/section content, visible to non-Claude-Code
    clients via the tool description field.

Security

  • url_fetcher now rejects non-PDF content-types (text/*,
    application/json, image/audio/video, etc.) before buffering bytes.
  • Expanded IPv6 SSRF deny list: ::ffff:0:0/96 (IPv4-mapped),
    64:ff9b::/96, 100::/64, 2001:db8::/32, fd00:ec2::254/128
    (AWS IMDS over IPv6), and ::/128 (unspecified). IPv4-mapped IPv6
    addresses are unwrapped and re-tested against the IPv4 deny list.
  • url_fetcher now pins the DNS-resolved IP per redirect hop,
    closing the TOCTOU gap between SSRF validation and TCP connect.
  • Cache directory is now chmod 0o700 after creation (defense-in-
    depth). pdf-mcp's supported deployment is single-user, so this
    does not patch an in-scope threat — it tightens permissions to
    match images/ and renders/ which were already 0o700, and
    reduces blast radius if the supported model ever expands.

Changed

  • PDF_MCP_CACHE_DIR and PDF_MCP_CACHE_TTL environment variables
    are now honored at server startup (previously declared in the MCP
    registry manifest but not wired into the Python code). CACHE_TTL
    must parse as an integer in [0, 8760] hours (up to one year) —
    bad values fail loud at startup rather than silently falling back
    to the default.
  • pdf_read_all now accepts start_page: int (default 1) and
    echoes the post-clamp value in the response. The pre-existing
    next_page field in the response is now consumable: pass it back
    as start_page to resume the read on a clean page boundary.
    Previously next_page named a continuation cursor the tool had
    no parameter to accept, forcing callers to fall back to
    pdf_read_pages for the resume. A regression test enforces the
    invariant that iterating start_page=next_page covers every page
    exactly once.
  • The MCP initialize handshake now reports pdf-mcp's __version__
    as serverInfo.version. Previously the field carried FastMCP's
    framework version (e.g. 3.2.4) because no explicit version=
    was passed to FastMCP(...), so MCP clients could not tell
    pdf-mcp releases apart from the handshake alone.
  • SSRF rejection now surfaces a self-describing error
    ("URL host resolves to a blocked IP on the SSRF deny list (loopback /
    RFC 1918 / link-local / IMDS / IPv6 ULA): …") instead of the previous
    generic "URL does not point to a valid PDF file" wrapper, so security
    blocks are no longer indistinguishable from format problems or
    filesystem 404s.
  • URLFetcher.is_url now recognises http:// URLs as well as
    https://, routing them through the validator so callers get a clear
    "Only HTTPS URLs are supported" error rather than the misleading
    "PDF file not found" path-resolution error.
  • pdf_search section-mode docstring clarifies that matches_omitted
    counts byte-cap drops only — drops caused by a low max_results are
    not counted there (re-query with a higher max_results to see them).
  • pdf_info docstring clarifies that the toc field is gated by
    toc_entry_count <= 50, independent of the detail flag (which only
    controls per-page text_coverage arrays).
  • pdf_search @mcp.tool description corrected from "keyword,
    semantic, or hybrid (RRF) modes" to "keyword, semantic, or auto
    (hybrid RRF) modes" — the public mode name is auto, hybrid is
    rejected. The runtime always accepted only auto/keyword/semantic;
    the description was wrong, so a caller reading the tool description
    would try mode="hybrid" and get an inline error.
  • pdf_search and pdf_info tool descriptions now carry the
    matches_omitted byte-cap-only semantics and the toc ≤50 gating
    note. Previously these clarifications lived only in function
    docstrings, which FastMCP does not surface as description= on the
    wire — so LLM callers couldn't see them.

Documentation

  • Clarified [limits].max_response_bytes docstring: the cap bounds
    the text content field (full_text on pdf_read_all; section
    titles + overhead on section-mode pdf_search), not the wire-
    level MCP TextContent envelope. The envelope adds ~300–500 bytes
    of other response fields and JSON framing on top of the cap.

Installation

pip install pdf-mcp==1.13.1

Links