bug : Issues during image extraction #1863

pubpub-zz · 2023-05-26T18:43:34Z

Environment

Mac OS 12.6.6

$ python3 -c "import pypdf;print(pypdf.__version__)"
3.9.0

Code + PDF

from pypdf import PdfReader
reader = PdfReader(pdf_file)
for index, page in enumerate(reader.pages):
     for img in page.images:
        print(f"{index}-{img.name}")

Tested with the following pdf: https://ufile.io/o1whh9b3
The error seems to be happening on page # 34

Traceback

Traceback (most recent call last):
  File "/Users/eric/Downloads/pdf_extract_images.py", line 60, in scan_directory
    for img in page.images:
                      ^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pypdf/_page.py", line 463, in images
    extension, byte_stream = _xobj_to_image(x_object[obj])
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pypdf/filters.py", line 707, in _xobj_to_image
    alpha = Image.frombytes("L", size, x_object_obj[G.S_MASK].get_data())
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/PIL/Image.py", line 2970, in frombytes
    im.frombytes(data, decoder_name, args)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/PIL/Image.py", line 826, in frombytes
    raise ValueError(msg)
ValueError: not enough image data

Originally posted by @ericgonzadev in #1814 (comment)

The text was updated successfully, but these errors were encountered:

pubpub-zz · 2023-05-26T18:45:04Z

for test :
o1whh9b3

pubpub-zz · 2023-05-26T18:45:45Z

@ericgonzadev
issue created
fixed in #1834

closes py-pdf#1863

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue May 26, 2023

fix issues from py-pdf#1863

34a1f31

closes py-pdf#1863

pubpub-zz mentioned this issue May 26, 2023

BUG: Fix RGB FlateEncode Images(PNG) and transparency #1834

Merged

pubpub-zz changed the title ~~@pubpub-zz~~ bug : Issues during image extraction May 26, 2023

pubpub-zz added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF workflow-images From a users perspective, image handling is the affected feature/workflow labels May 26, 2023

MartinThoma closed this as completed in #1834 Jun 18, 2023

MartinThoma closed this as completed in 68e2cf0 Jun 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug : Issues during image extraction #1863

bug : Issues during image extraction #1863

pubpub-zz commented May 26, 2023

pubpub-zz commented May 26, 2023

pubpub-zz commented May 26, 2023 •

edited

Loading

bug : Issues during image extraction #1863

bug : Issues during image extraction #1863

Comments

pubpub-zz commented May 26, 2023

Environment

Code + PDF

Traceback

pubpub-zz commented May 26, 2023

pubpub-zz commented May 26, 2023 • edited Loading

pubpub-zz commented May 26, 2023 •

edited

Loading