CMYK image with filter_type equal to flate_decode return "not enough image data" error #2321

jianfan123 · 2023-11-29T14:48:26Z

try to extract image from this PDF file . page 6 image return "not enough image data page " page 9 and page 11 's images get extracted from this PDF file

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-4.18.0-477.27.1.el8_8.x86_64-x86_64-with-glibc2.2.5

$ python -c "import pypdf;print(pypdf._debug_versions)"
# pypdf==3.17.1, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=10.0.0

## Code + PDF
[Addressing_Adversarial_Attacks.pdf](https://github.com/py-pdf/pypdf/files/13501846/Addressing_Adversarial_Attacks.pdf)


```python
from pypdf import PdfReader
doc= PdfReader("./Addressing_Adversarial_Attacks.pdf")
for page_idx, page in enumerate(doc.pages):
     count = 0
     for image_file_object in page.images:
         
         with open(str(count) + image_file_object.name, "wb") as fp:
              fp.write(image_file_object.data)
               count += 1

Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!

Traceback

This is the complete traceback I see:

# TODO: Your traceback goes here (if applicable)

The text was updated successfully, but these errors were encountered:

stefan6419846 · 2023-11-29T16:02:06Z

Please provide the complete traceback in the corresponding field.

jianfan123 · 2023-11-29T17:28:32Z

TODO: Your traceback goes here (if applicable)

Traceback (most recent call last):
File "", line 1, in
File "/opt/envs/torch/lib64/python3.8/site-packages/pypdf/_page.py", line 2717, in iter
yield self[i]
File "/opt/envs/torch/lib64/python3.8/site-packages/pypdf/_page.py", line 2713, in getitem
return self.get_function(lst[index])
File "/opt/envs/torch/lib64/python3.8/site-packages/pypdf/_page.py", line 547, in _get_image
imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))
File "/opt/envs/torch/lib64/python3.8/site-packages/pypdf/filters.py", line 781, in _xobj_to_image
img, image_format, extension, invert_color = _handle_flate(
File "/opt/envs/torch/lib64/python3.8/site-packages/pypdf/_xobj_image_helpers.py", line 163, in _handle_flate
img = Image.frombytes(mode, size, data)
File "/opt/envs/torch/lib64/python3.8/site-packages/PIL/Image.py", line 2951, in frombytes
im.frombytes(data, decoder_name, args)
File "/opt/envs/torch/lib64/python3.8/site-packages/PIL/Image.py", line 804, in frombytes
raise ValueError(msg)
ValueError: not enough image data

pubpub-zz · 2023-11-29T20:44:31Z

extracted images for test:

closes py-pdf#2321

jianfan123 · 2023-11-30T18:35:48Z

Thanks you!

Closes #2321

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Nov 29, 2023

cope with deflated images with CMYK Black Only

ec4e1ca

closes py-pdf#2321

pubpub-zz mentioned this issue Nov 29, 2023

BUG: Cope with deflated images with CMYK Black Only #2322

Merged

MartinThoma closed this as completed in #2322 Dec 2, 2023

MartinThoma pushed a commit that referenced this issue Dec 2, 2023

BUG: Cope with deflated images with CMYK Black Only (#2322)

a3742ae

Closes #2321

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CMYK image with filter_type equal to flate_decode return "not enough image data" error #2321

CMYK image with filter_type equal to flate_decode return "not enough image data" error #2321

jianfan123 commented Nov 29, 2023

stefan6419846 commented Nov 29, 2023

jianfan123 commented Nov 29, 2023

pubpub-zz commented Nov 29, 2023

jianfan123 commented Nov 30, 2023

CMYK image with filter_type equal to flate_decode return "not enough image data" error #2321

CMYK image with filter_type equal to flate_decode return "not enough image data" error #2321

Comments

jianfan123 commented Nov 29, 2023

Environment

Traceback

stefan6419846 commented Nov 29, 2023

jianfan123 commented Nov 29, 2023

TODO: Your traceback goes here (if applicable)

pubpub-zz commented Nov 29, 2023

jianfan123 commented Nov 30, 2023