Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMYK image with filter_type equal to flate_decode return "not enough image data" error #2321

Closed
jianfan123 opened this issue Nov 29, 2023 · 4 comments · Fixed by #2322
Closed

Comments

@jianfan123
Copy link

try to extract image from this PDF file . page 6 image return "not enough image data page " page 9 and page 11 's images get extracted from this PDF file

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-4.18.0-477.27.1.el8_8.x86_64-x86_64-with-glibc2.2.5

$ python -c "import pypdf;print(pypdf._debug_versions)"
# pypdf==3.17.1, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=10.0.0

## Code + PDF
[Addressing_Adversarial_Attacks.pdf](https://github.com/py-pdf/pypdf/files/13501846/Addressing_Adversarial_Attacks.pdf)


```python
from pypdf import PdfReader
doc= PdfReader("./Addressing_Adversarial_Attacks.pdf")
for page_idx, page in enumerate(doc.pages):
     count = 0
     for image_file_object in page.images:
         
         with open(str(count) + image_file_object.name, "wb") as fp:
              fp.write(image_file_object.data)
               count += 1

Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!

Traceback

This is the complete traceback I see:

# TODO: Your traceback goes here (if applicable)
@stefan6419846
Copy link
Collaborator

Please provide the complete traceback in the corresponding field.

@jianfan123
Copy link
Author

TODO: Your traceback goes here (if applicable)

Traceback (most recent call last):
File "", line 1, in
File "/opt/envs/torch/lib64/python3.8/site-packages/pypdf/_page.py", line 2717, in iter
yield self[i]
File "/opt/envs/torch/lib64/python3.8/site-packages/pypdf/_page.py", line 2713, in getitem
return self.get_function(lst[index])
File "/opt/envs/torch/lib64/python3.8/site-packages/pypdf/_page.py", line 547, in _get_image
imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))
File "/opt/envs/torch/lib64/python3.8/site-packages/pypdf/filters.py", line 781, in _xobj_to_image
img, image_format, extension, invert_color = _handle_flate(
File "/opt/envs/torch/lib64/python3.8/site-packages/pypdf/_xobj_image_helpers.py", line 163, in _handle_flate
img = Image.frombytes(mode, size, data)
File "/opt/envs/torch/lib64/python3.8/site-packages/PIL/Image.py", line 2951, in frombytes
im.frombytes(data, decoder_name, args)
File "/opt/envs/torch/lib64/python3.8/site-packages/PIL/Image.py", line 804, in frombytes
raise ValueError(msg)
ValueError: not enough image data

@pubpub-zz
Copy link
Collaborator

extracted images for test:
p5

p10

@jianfan123
Copy link
Author

Thanks you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants