Skip to content

get_images() returns 0 images for a page that has an image #4618

@erotavlas

Description

@erotavlas

Description of the bug

I have a test pdf file I created that looks like it contains one image. The entire page is a screenshot that I saved and then converted to a pdf, but I don't remember how I did it since it was several years ago. The only ways I can think of was using Word, or Print to PDF from a browser.

In any case, I tested it two ways, I tried getting the text blocks from the page and there are none.

I also created a method to detect if page contains images or not and after I ran my code, I found that it would not detect the image in this one pdf file.

So according to my code it contains no text, but also contains no images. Attached is the file and below is the code I used to detect images on the page.


import pymupdf
import fitz
print(pymupdf.__doc__)
file_path = r"D:\SOFTWARE_DEVELOPMENT\_APPS\temp\debug_files_for_ocr\testpdf_image1.pdf"

doc = pymupdf.open(file_path) 

def get_images(file_name: str) -> float:

    total_page_area = 0.0
    total_text_area = 0.0

    doc = fitz.open(file_name)

    for page_index in range(len(doc)): # iterate over pdf pages
        page = doc[page_index] # get the page
        total_page_area = total_page_area + abs(page.bound())
        print("total page area: ", total_page_area)
        image_list = page.get_images(full=True)

        # print the number of images found on the page
        if image_list:
            print(f"Found {len(image_list)} images on page {page_index}")
            # Iterate through the images on the page
            for img in image_list:
                #print(page.get_image_bbox(img))
                bbox = page.get_image_bbox(img)  # Get the bounding box of the image
                area = bbox.width * bbox.height  # Calculate the area of the image
                print("image area:", area)
        else:
            print("No images found on page", page_index)

    doc.close()
    return 

get_images(file_path)


How to reproduce the bug

Run code provided on attached file, result should be

total page area:  484704.0
No images found on page 0

testpdf_image1.pdf

PyMuPDF version

1.26.3

Operating system

Windows

Python version

3.13

Metadata

Metadata

Assignees

No one assigned

    Labels

    invalidsomething is wrong here

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions