Skip to content

Unable to Extract Images from PDF with PyMuPDF and PyMuPDF4llm #4342

@HanXiaoyou

Description

@HanXiaoyou

Description of the bug

Both PyMuPDF and PyMuPDF4llm fail to extract images from a specific PDF, despite the presence of images. No errors occur during extraction; the images simply aren't retrieved.

Versions:

PyMuPDF: 1.25.1
PyMuPDF4llm: 0.0.17
Additional Info:

PDF is not encrypted.
Tried different extraction methods, but the issue persists.

How to reproduce the bug

Both PyMuPDF and PyMuPDF4llm fail to extract images from a specific PDF, despite the presence of images. No errors occur during extraction; the images simply aren't retrieved.

Versions:

PyMuPDF: 1.25.1
PyMuPDF4llm: 0.0.17
Additional Info:

a.pdf

PDF is not encrypted.
Tried different extraction methods, but the issue persists.

PyMuPDF version

1.25.1

Operating system

Windows

Python version

3.9

Metadata

Metadata

Assignees

No one assigned

    Labels

    not a bugnot a bug / user error / unable to reproduce

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions