Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken image extraction if no filters and CMYK colorspace #2522

Closed
stefan6419846 opened this issue Mar 15, 2024 · 0 comments · Fixed by #2557
Closed

Broken image extraction if no filters and CMYK colorspace #2522

stefan6419846 opened this issue Mar 15, 2024 · 0 comments · Fixed by #2557
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF workflow-images From a users perspective, image handling is the affected feature/workflow

Comments

@stefan6419846
Copy link
Collaborator

Image extraction is broken when isinstance(lfilters, NullObject) and mode == "CMYK" in

pypdf/pypdf/filters.py

Lines 818 to 826 in 0106904

else:
if mode == "":
raise PdfReadError(f"ColorSpace field not found in {x_object_obj}")
img, image_format, extension, invert_color = (
Image.frombytes(mode, size, data),
"PNG",
".png",
False,
)
as CMYK is not supported for PNG images.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-5.14.21-150400.24.100-default-x86_64-with-glibc2.31

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.1.0, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=10.2.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader


reader = PdfReader('file.pdf')
for page in reader.pages:
    print(page)
    for key in page.images.keys():
        print(key)
        print(page.images[key])

An anonymized version of the file is out3.pdf.

Traceback

This is the complete traceback I see:

Traceback (most recent call last):
  File "/home/stefan/tmp/venv/lib64/python3.9/site-packages/PIL/PngImagePlugin.py", line 1279, in _save
    rawmode, mode = _OUTMODES[mode]
KeyError: 'CMYK'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/stefan/tmp/venv/lib/python3.9/site-packages/pypdf/filters.py", line 876, in _xobj_to_image
    img.save(img_byte_arr, format=image_format)
  File "/home/stefan/tmp/venv/lib64/python3.9/site-packages/PIL/Image.py", line 2439, in save
    save_handler(self, fp, filename)
  File "/home/stefan/tmp/venv/lib64/python3.9/site-packages/PIL/PngImagePlugin.py", line 1282, in _save
    raise OSError(msg) from e
OSError: cannot write mode CMYK as PNG

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/stefan/tmp/venv/lib64/python3.9/site-packages/PIL/PngImagePlugin.py", line 1279, in _save
    rawmode, mode = _OUTMODES[mode]
KeyError: 'CMYK'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/stefan/tmp/run.py", line 9, in <module>
    print(page.images[key])
  File "/home/stefan/tmp/venv/lib/python3.9/site-packages/pypdf/_page.py", line 2420, in __getitem__
    return self.get_function(index)
  File "/home/stefan/tmp/venv/lib/python3.9/site-packages/pypdf/_page.py", line 501, in _get_image
    imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))
  File "/home/stefan/tmp/venv/lib/python3.9/site-packages/pypdf/filters.py", line 880, in _xobj_to_image
    img.save(img_byte_arr, format=image_format)
  File "/home/stefan/tmp/venv/lib64/python3.9/site-packages/PIL/Image.py", line 2439, in save
    save_handler(self, fp, filename)
  File "/home/stefan/tmp/venv/lib64/python3.9/site-packages/PIL/PngImagePlugin.py", line 1282, in _save
    raise OSError(msg) from e
OSError: cannot write mode CMYK as PNG
@stefan6419846 stefan6419846 added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF workflow-images From a users perspective, image handling is the affected feature/workflow labels Mar 15, 2024
pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF workflow-images From a users perspective, image handling is the affected feature/workflow
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant