Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid image lookup tables not handled correctly #2110

Closed
stefan6419846 opened this issue Aug 23, 2023 · 2 comments · Fixed by #2128
Closed

Invalid image lookup tables not handled correctly #2110

stefan6419846 opened this issue Aug 23, 2023 · 2 comments · Fixed by #2128

Comments

@stefan6419846
Copy link
Collaborator

Invalid image lookup tables do not seem to be handled correctly and might end up trying to iterate over None:

pypdf/pypdf/filters.py

Lines 900 to 926 in 89eb626

img = img.convert(conv)
if len(lookup) != (hival + 1) * nb:
logger_warning(
f"Invalid Lookup Table in {obj_as_text}", __name__
)
lookup = None
if mode == "L":
# gray lookup does not work : it is converted to a similar RGB lookup
lookup = b"".join([bytes([b, b, b]) for b in lookup])
mode = "RGB"
# TODO : cf https://github.com/py-pdf/pypdf/pull/2039
# this is a work around until PIL is able to process CMYK images
elif mode == "CMYK":
_rgb = []
for _c, _m, _y, _k in (
lookup[n : n + 4]
for n in range(0, 4 * (len(lookup) // 4), 4)
):
_r = int(255 * (1 - _c / 255) * (1 - _k / 255))
_g = int(255 * (1 - _m / 255) * (1 - _k / 255))
_b = int(255 * (1 - _y / 255) * (1 - _k / 255))
_rgb.append(bytes((_r, _g, _b)))
lookup = b"".join(_rgb)
mode = "RGB"
if lookup is not None:
img.putpalette(lookup, rawmode=mode)
img = img.convert("L" if base == ColorSpaces.DEVICE_GRAY else "RGB")

Here you can see that in line 905 the lookup table will be set to None, but both line 908 and lines 915-916 try to iterate over a possibly None value. The condition in line 924 is too late to prevent issues.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-5.14.21-150400.24.81-default-x86_64-with-glibc2.31

$ python -c "import pypdf;print(pypdf.__version__)"
3.15.2

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader


for page in PdfReader('file.pdf').pages:
    for key in page.images.keys():
        print(key)
        page.images[key].image.convert('RGB').save(key[1:] + '.png')

Traceback

This is the complete traceback I see:

Invalid Lookup Table in {'/BitsPerComponent': 8, '/ColorSpace': IndirectObject(37, 0, 140090665353664), '/Filter': '/FlateDecode', '/Height': 77, '/Subtype': '/Image', '/Type': '/XObject', '/Width': 106}
Traceback (most recent call last):
  File "/home/stefan/temp/run.py", line 9, in <module>
    print(page.images[key].indirect_reference)
  File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/_page.py", line 2636, in __getitem__
    return self.get_function(index)
  File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/_page.py", line 544, in _get_image
    imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))
  File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/filters.py", line 1026, in _xobj_to_image
    img, image_format, extension, invert_color = _handle_flate(
  File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/filters.py", line 916, in _handle_flate
    for n in range(0, 4 * (len(lookup) // 4), 4)
TypeError: object of type 'NoneType' has no len()
@pubpub-zz
Copy link
Collaborator

@stefan6419846 can you provide the pdf file?

@stefan6419846
Copy link
Collaborator Author

A reproducing file has been sent to Martin directly for privacy reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants