Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnsupportedImageTypeError('/CCITTFaxDecode with decode parameter /EndOfBlock not equal True') when converting PDF to image #517

Closed
oooyiyangc opened this issue Sep 5, 2023 · 3 comments

Comments

@oooyiyangc
Copy link

oooyiyangc commented Sep 5, 2023

Hi, I've encountered issues using pikepdf to convert pdf to images. Here's my code for conversion.

pdf_file = Pdf.open(filepath)
page1 = pdf_file.pages[0]

relevant_key = [key for key in page1.images.keys()][0]
rawimage = page1.images[relevant_key]

pdfimage = PdfImage(rawimage)
image = pdfimage.as_pil_image()

Issue

I've identified that the issue only happens in later versions for some PDFs, and I'll list my observation below.

  • pikepdf==8.4.0 (to 7.1.0)
    In those versions, the conversion will fail. Error message:
UnsupportedImageTypeError('/CCITTFaxDecode with decode parameter /EndOfBlock not equal True')
  • pikepdf==7.0.0
    In this version, the conversion will work, but the image produced has inverted color (e.g. pixels are white when they should be black)

  • pikepdf==3.1.0
    Everything works.

To replicate:

Here're two sample PDFs where the above issue happens: 65150963-idaho-state-journal-Jun-11-1972-p-1, 101250036-jefferson-city-post-tribune-Feb-16-1967-p-1.pdf

Or you can simply go to my repo: https://github.com/oooyiyangc/pdf2img_test, and run check_conversion.py. The required packages are numpy, Pillow, and pikepdf.

My results on Ubuntu 20.04 LTS:

  • pikepdf==8.4.0 (to 7.1.0)
============================
Testing pdf 1 ... (should pass)
Converting ................. Pass
Matching expected output ... Pass

============================
Testing pdf 2 ... (should fail)
UnsupportedImageTypeError('/CCITTFaxDecode with decode parameter /EndOfBlock not equal True')
Converting ................. Fail
Matching expected output ... Skipped

============================
Testing pdf 3 ... (should fail)
UnsupportedImageTypeError('/CCITTFaxDecode with decode parameter /EndOfBlock not equal True')
Converting ................. Fail
Matching expected output ... Skipped
============================
  • pikepdf==7.0.0
============================
Testing pdf 1 ... (should pass)
Converting ................. Pass
Matching expected output ... Pass

============================
Testing pdf 2 ... (should fail)
Converting ................. Pass
Matching expected output ... Fail

============================
Testing pdf 3 ... (should fail)
Converting ................. Pass
Matching expected output ... Fail
============================
  • pikepdf==3.1.0
============================
Testing pdf 1 ... (should pass)
Converting ................. Pass
Matching expected output ... Pass

============================
Testing pdf 2 ... (should fail)
Converting ................. Pass
Matching expected output ... Pass

============================
Testing pdf 3 ... (should fail)
Converting ................. Pass
Matching expected output ... Pass
============================
@jbarlow83
Copy link
Member

#269

@jbarlow83
Copy link
Member

Fixed in 8.4.1

@oooyiyangc
Copy link
Author

Thank you @jbarlow83 ! Really appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants