-
Notifications
You must be signed in to change notification settings - Fork 706
get_images() returns 0 images for a page that has an image #4618
Description
Description of the bug
I have a test pdf file I created that looks like it contains one image. The entire page is a screenshot that I saved and then converted to a pdf, but I don't remember how I did it since it was several years ago. The only ways I can think of was using Word, or Print to PDF from a browser.
In any case, I tested it two ways, I tried getting the text blocks from the page and there are none.
I also created a method to detect if page contains images or not and after I ran my code, I found that it would not detect the image in this one pdf file.
So according to my code it contains no text, but also contains no images. Attached is the file and below is the code I used to detect images on the page.
import pymupdf
import fitz
print(pymupdf.__doc__)
file_path = r"D:\SOFTWARE_DEVELOPMENT\_APPS\temp\debug_files_for_ocr\testpdf_image1.pdf"
doc = pymupdf.open(file_path)
def get_images(file_name: str) -> float:
total_page_area = 0.0
total_text_area = 0.0
doc = fitz.open(file_name)
for page_index in range(len(doc)): # iterate over pdf pages
page = doc[page_index] # get the page
total_page_area = total_page_area + abs(page.bound())
print("total page area: ", total_page_area)
image_list = page.get_images(full=True)
# print the number of images found on the page
if image_list:
print(f"Found {len(image_list)} images on page {page_index}")
# Iterate through the images on the page
for img in image_list:
#print(page.get_image_bbox(img))
bbox = page.get_image_bbox(img) # Get the bounding box of the image
area = bbox.width * bbox.height # Calculate the area of the image
print("image area:", area)
else:
print("No images found on page", page_index)
doc.close()
return
get_images(file_path)
How to reproduce the bug
Run code provided on attached file, result should be
total page area: 484704.0
No images found on page 0
PyMuPDF version
1.26.3
Operating system
Windows
Python version
3.13