Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"OSError: encoder error -2 when writing image file" while enumerating images #2265

Closed
michelcrypt4d4mus opened this issue Oct 24, 2023 · 7 comments · Fixed by #2595
Closed
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF is-regression Regression introduced as a side-effect of another change workflow-images From a users perspective, image handling is the affected feature/workflow

Comments

@michelcrypt4d4mus
Copy link

michelcrypt4d4mus commented Oct 24, 2023

Exception while enumerating images.

This seems to be a regression - when I was including 3.14.0 in clown_sort i rarely if ever had issues enumerating pages. Now I have them in a large % of PDFs.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
3.11.5

$ python -c "import pypdf;print(pypdf._debug_versions)"
3.16.4

Code + PDF

See exception text. PDF attached; you can add it to your tests.

Traceback

  ➤ OSError: encoder error -2 when writing image file while parsing embedded image 1 on page 3...
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/PIL/ImageFile.py:515 in _save                                                   │
│                                                                                                  │
│   512 │   # a tricky case.                                                                       │
│   513 │   bufsize = max(MAXBLOCK, bufsize, im.size[0] * 4)  # see RawEncode.c                    │
│   514 │   try:                                                                                   │
│ ❱ 515 │   │   fh = fp.fileno()                                                                   │
│   516 │   │   fp.flush()                                                                         │
│   517 │   │   _encode_tile(im, fp, tile, bufsize, fh)                                            │
│   518 │   except (AttributeError, io.UnsupportedOperation) as exc:                               │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
UnsupportedOperation: fileno

The above exception was the direct cause of the following exception:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/pypdf/filters.py:872 in _xobj_to_image                                          │
│                                                                                                  │
│   869 │                                                                                          │
│   870 │   img_byte_arr = BytesIO()                                                               │
│   871 │   try:                                                                                   │
│ ❱ 872 │   │   img.save(img_byte_arr, format=image_format)                                        │
│   873 │   except OSError:  # pragma: no cover                                                    │
│   874 │   │   # odd error                                                                        │
│   875 │   │   img_byte_arr = BytesIO()                                                           │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/PIL/Image.py:2438 in save                                                       │
│                                                                                                  │
│   2435 │   │   │   │   fp = builtins.open(filename, "w+b")                                       │
│   2436 │   │                                                                                     │
│   2437 │   │   try:                                                                              │
│ ❱ 2438 │   │   │   save_handler(self, fp, filename)                                              │
│   2439 │   │   except Exception:                                                                 │
│   2440 │   │   │   if open_fp:                                                                   │
│   2441 │   │   │   │   fp.close()                                                                │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/PIL/Jpeg2KImagePlugin.py:385 in _save                                           │
│                                                                                                  │
│   382 │   │   plt,                                                                               │
│   383 │   )                                                                                      │
│   384 │                                                                                          │
│ ❱ 385 │   ImageFile._save(im, fp, [("jpeg2k", (0, 0) + im.size, 0, kind)])                       │
│   386                                                                                            │
│   387                                                                                            │
│   388 # ------------------------------------------------------------                             │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/PIL/ImageFile.py:519 in _save                                                   │
│                                                                                                  │
│   516 │   │   fp.flush()                                                                         │
│   517 │   │   _encode_tile(im, fp, tile, bufsize, fh)                                            │
│   518 │   except (AttributeError, io.UnsupportedOperation) as exc:                               │
│ ❱ 519 │   │   _encode_tile(im, fp, tile, bufsize, None, exc)                                     │
│   520 │   if hasattr(fp, "flush"):                                                               │
│   521 │   │   fp.flush()                                                                         │
│   522                                                                                            │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/PIL/ImageFile.py:547 in _encode_tile                                            │
│                                                                                                  │
│   544 │   │   │   │   │   errcode = encoder.encode_to_file(fh, bufsize)                          │
│   545 │   │   │   if errcode < 0:                                                                │
│   546 │   │   │   │   msg = f"encoder error {errcode} when writing image file"                   │
│ ❱ 547 │   │   │   │   raise OSError(msg) from exc                                                │
│   548 │   │   finally:                                                                           │
│   549 │   │   │   encoder.cleanup()                                                              │
│   550                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OSError: encoder error -2 when writing image file

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/PIL/ImageFile.py:515 in _save                                                   │
│                                                                                                  │
│   512 │   # a tricky case.                                                                       │
│   513 │   bufsize = max(MAXBLOCK, bufsize, im.size[0] * 4)  # see RawEncode.c                    │
│   514 │   try:                                                                                   │
│ ❱ 515 │   │   fh = fp.fileno()                                                                   │
│   516 │   │   fp.flush()                                                                         │
│   517 │   │   _encode_tile(im, fp, tile, bufsize, fh)                                            │
│   518 │   except (AttributeError, io.UnsupportedOperation) as exc:                               │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
UnsupportedOperation: fileno

The above exception was the direct cause of the following exception:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/uzor/workspace/clown_sort/clown_sort/files/pdf_file.py:61 in extracted_text      │
│                                                                                                  │
│    58 │   │   │   │                                                                              │
│    59 │   │   │   │   # Extracting images is a bit fraught (lots of PIL and pypdf exceptions h   │
│    60 │   │   │   │   try:                                                                       │
│ ❱  61 │   │   │   │   │   for image_number, image in enumerate(page.images, start=1):            │
│    62 │   │   │   │   │   │   image_name = f"Page {page_number}, Image {image_number}"           │
│    63 │   │   │   │   │   │   self._log_to_stderr(f"   Processing {image_name}...")              │
│    64 │   │   │   │   │   │   page_buffer.print(Panel(image_name, expand=False))                 │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/pypdf/_page.py:2722 in __iter__                                                 │
│                                                                                                  │
│   2719 │                                                                                         │
│   2720 │   def __iter__(self) -> Iterator[ImageFile]:                                            │
│   2721 │   │   for i in range(len(self)):                                                        │
│ ❱ 2722 │   │   │   yield self[i]                                                                 │
│   2723 │                                                                                         │
│   2724 │   def __str__(self) -> str:                                                             │
│   2725 │   │   p = [f"Image_{i}={n}" for i, n in enumerate(self.ids_function())]                 │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/pypdf/_page.py:2718 in __getitem__                                              │
│                                                                                                  │
│   2715 │   │   │   index = len_self + index                                                      │
│   2716 │   │   if index < 0 or index >= len_self:                                                │
│   2717 │   │   │   raise IndexError("sequence index out of range")                               │
│ ❱ 2718 │   │   return self.get_function(lst[index])                                              │
│   2719 │                                                                                         │
│   2720 │   def __iter__(self) -> Iterator[ImageFile]:                                            │
│   2721 │   │   for i in range(len(self)):                                                        │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/pypdf/_page.py:547 in _get_image                                                │
│                                                                                                  │
│    544 │   │   │   │   │   raise KeyError("no inline image can be found")                        │
│    545 │   │   │   │   return self.inline_images[id]                                             │
│    546 │   │   │                                                                                 │
│ ❱  547 │   │   │   imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))                      │
│    548 │   │   │   extension, byte_stream = imgd[:2]                                             │
│    549 │   │   │   f = ImageFile(                                                                │
│    550 │   │   │   │   name=f"{id[1:]}{extension}",                                              │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/pypdf/filters.py:876 in _xobj_to_image                                          │
│                                                                                                  │
│   873 │   except OSError:  # pragma: no cover                                                    │
│   874 │   │   # odd error                                                                        │
│   875 │   │   img_byte_arr = BytesIO()                                                           │
│ ❱ 876 │   │   img.save(img_byte_arr, format=image_format)                                        │
│   877 │   data = img_byte_arr.getvalue()                                                         │
│   878 │                                                                                          │
│   879 │   try:  # temporary try/except until other fixes of images                               │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/PIL/Image.py:2438 in save                                                       │
│                                                                                                  │
│   2435 │   │   │   │   fp = builtins.open(filename, "w+b")                                       │
│   2436 │   │                                                                                     │
│   2437 │   │   try:                                                                              │
│ ❱ 2438 │   │   │   save_handler(self, fp, filename)                                              │
│   2439 │   │   except Exception:                                                                 │
│   2440 │   │   │   if open_fp:                                                                   │
│   2441 │   │   │   │   fp.close()                                                                │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/PIL/Jpeg2KImagePlugin.py:385 in _save                                           │
│                                                                                                  │
│   382 │   │   plt,                                                                               │
│   383 │   )                                                                                      │
│   384 │                                                                                          │
│ ❱ 385 │   ImageFile._save(im, fp, [("jpeg2k", (0, 0) + im.size, 0, kind)])                       │
│   386                                                                                            │
│   387                                                                                            │
│   388 # ------------------------------------------------------------                             │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/PIL/ImageFile.py:519 in _save                                                   │
│                                                                                                  │
│   516 │   │   fp.flush()                                                                         │
│   517 │   │   _encode_tile(im, fp, tile, bufsize, fh)                                            │
│   518 │   except (AttributeError, io.UnsupportedOperation) as exc:                               │
│ ❱ 519 │   │   _encode_tile(im, fp, tile, bufsize, None, exc)                                     │
│   520 │   if hasattr(fp, "flush"):                                                               │
│   521 │   │   fp.flush()                                                                         │
│   522                                                                                            │
│                                                                                                  │
│ /Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-BrYcfkKs-py3.11/lib/python3. │
│ 11/site-packages/PIL/ImageFile.py:547 in _encode_tile                                            │
│                                                                                                  │
│   544 │   │   │   │   │   errcode = encoder.encode_to_file(fh, bufsize)                          │
│   545 │   │   │   if errcode < 0:                                                                │
│   546 │   │   │   │   msg = f"encoder error {errcode} when writing image file"                   │
│ ❱ 547 │   │   │   │   raise OSError(msg) f
[Binance discovery responses 2 gov.uscourts.dcd.256060.140.1.pdf](https://github.com/py-pdf/pypdf/files/13126365/Binance.discovery.responses.2.gov.uscourts.dcd.256060.140.1.pdf)
rom exc                                                │
│   548 │   │   finally:                                                                           │
│   549 │   │   │   encoder.cleanup()                                                              │
│   550                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
@pubpub-zz
Copy link
Collaborator

@michelcrypt4d4mus
please provide pdf file with the issue and a clear simple code to evaluation : reading the stack is awfull

@michelcrypt4d4mus
Copy link
Author

this is the line of code that is causing the error

and sorry i thought i had attached the file but somehow it did not attach... trying again
Binance discovery responses 2 gov.uscourts.dcd.256060.140.1.pdf

@michelcrypt4d4mus
Copy link
Author

Like #2266 this issue seems to go away when i downgrade to PyPDF 3.14.0

@MartinThoma MartinThoma added workflow-images From a users perspective, image handling is the affected feature/workflow is-regression Regression introduced as a side-effect of another change is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF labels Oct 28, 2023
@pubpub-zz
Copy link
Collaborator

@michelcrypt4d4mus
can you give confirm the issue is still present with latest release. Can you also provide a simple code

@stefan6419846
Copy link
Collaborator

It seems like this still raises the same issue:

>>> from pypdf import PdfReader
>>> reader = PdfReader('../Binance.discovery.responses.2.gov.uscourts.dcd.256060.140.1.pdf')
>>> for page in reader.pages:
...   print(page)
...   for image in page.images:
...     print(image)
...     print(image.image)
... 

@pubpub-zz
Copy link
Collaborator

the error is linked due to pillow not handling all formats as requested (JPEG2000 with Palette encoding).
the worst is that, depending on the version of pillow we may have some errors or not and the image is corrupted or not.😫😫

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Apr 10, 2024
@michelcrypt4d4mus
Copy link
Author

as mentioned this was not an issue with PyPDF 3.14.0

stefan6419846 pushed a commit that referenced this issue Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF is-regression Regression introduced as a side-effect of another change workflow-images From a users perspective, image handling is the affected feature/workflow
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants