Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug : Issues during image extraction #1863

Closed
pubpub-zz opened this issue May 26, 2023 · 2 comments · Fixed by #1834
Closed

bug : Issues during image extraction #1863

pubpub-zz opened this issue May 26, 2023 · 2 comments · Fixed by #1834
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF workflow-images From a users perspective, image handling is the affected feature/workflow

Comments

@pubpub-zz
Copy link
Collaborator

from @ericgonzadev

Environment

Mac OS 12.6.6

$ python3 -c "import pypdf;print(pypdf.__version__)"
3.9.0

Code + PDF

from pypdf import PdfReader
reader = PdfReader(pdf_file)
for index, page in enumerate(reader.pages):
     for img in page.images:
        print(f"{index}-{img.name}")

Tested with the following pdf: https://ufile.io/o1whh9b3
The error seems to be happening on page # 34

Traceback

Traceback (most recent call last):
  File "/Users/eric/Downloads/pdf_extract_images.py", line 60, in scan_directory
    for img in page.images:
                      ^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pypdf/_page.py", line 463, in images
    extension, byte_stream = _xobj_to_image(x_object[obj])
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pypdf/filters.py", line 707, in _xobj_to_image
    alpha = Image.frombytes("L", size, x_object_obj[G.S_MASK].get_data())
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/PIL/Image.py", line 2970, in frombytes
    im.frombytes(data, decoder_name, args)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/PIL/Image.py", line 826, in frombytes
    raise ValueError(msg)
ValueError: not enough image data

Originally posted by @ericgonzadev in #1814 (comment)

@pubpub-zz
Copy link
Collaborator Author

for test :
o1whh9b3

@pubpub-zz
Copy link
Collaborator Author

pubpub-zz commented May 26, 2023

@ericgonzadev
issue created
fixed in #1834

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue May 26, 2023
@pubpub-zz pubpub-zz changed the title @pubpub-zz bug : Issues during image extraction May 26, 2023
@pubpub-zz pubpub-zz added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF workflow-images From a users perspective, image handling is the affected feature/workflow labels May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF workflow-images From a users perspective, image handling is the affected feature/workflow
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant