Skip to content

pymupdf.open() processes .zip file without raising #4700

@JohnBurantSFI

Description

@JohnBurantSFI

Description of the bug

Attempts to open a .zip file using pymupdf.open() succeed, leading to unexpected results.

How to reproduce the bug

To reproduce, use this code:

import pymupdf
import tempfile

zipfile_content = b'PK\x03\x04\n\x00\x00\x00\x00\x00\x19U0[\xf40\x8b&\x1b\x00\x00\x00\x1b\x00\x00\x00\x08\x00\x1c\x00textfileUT\t\x00\x03\x92"\xc9h\x94"\xc9hux\x0b\x00\x01\x04\xf5\x01\x00\x00\x04\x14\x00\x00\x00This is a plain text file.\nPK\x01\x02\x1e\x03\n\x00\x00\x00\x00\x00\x19U0[\xf40\x8b&\x1b\x00\x00\x00\x1b\x00\x00\x00\x08\x00\x18\x00\x00\x00\x00\x00\x01\x00\x00\x00\xa4\x81\x00\x00\x00\x00textfileUT\x05\x00\x03\x92"\xc9hux\x0b\x00\x01\x04\xf5\x01\x00\x00\x04\x14\x00\x00\x00PK\x05\x06\x00\x00\x00\x00\x01\x00\x01\x00N\x00\x00\x00]\x00\x00\x00\x00\x00'

tmpfile = tempfile.NamedTemporaryFile(suffix='.zip', delete=True)
with open(tmpfile.name, 'wb') as f:
    f.write(zipfile_content)

with pymupdf.open(tmpfile.name) as doc:
    print(f"doc.page_count={doc.page_count}")

This (very short!) .zip file contains one plain text file.

The code executes cleanly and prints" doc.page_count=0'.

Expectation: PyMuPDF would recognize that the file content is not a PDF and raise.

PyMuPDF will fail when using pymupdf.open(tmpfile, filetype='pdf') in this example. But:

(1) We'd expect it to fail even without specifying filetype, I'd hope...?
(2) With longer .zip files it succeeds even with specifying filetype='pdf', indicating (in the instance I tried) that there were 120 PDF pages in the .zip file. (And to be clear, there was no pdf content in that zip file. I can share if needed, but expect the behavior documented here to be problematic enough to merit a fix).

PyMuPDF version

1.26.4

Operating system

MacOS

Python version

3.13

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions