Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF Optimization Error: getStreamData called on unfilterable stream #285

Closed
jsetton opened this issue Aug 19, 2018 · 4 comments
Closed

PDF Optimization Error: getStreamData called on unfilterable stream #285

jsetton opened this issue Aug 19, 2018 · 4 comments

Comments

@jsetton
Copy link

jsetton commented Aug 19, 2018

I have been trying to perform lossless optimizations on the PDF generated by the EPSON Scan application. It works fine on b/w (gray/ccitt) or color (rgb) image type but I am getting the below error for grayscale type.

  DEBUG - PdfError('/tmp/com.github.ocrmypdf.8v5q65k6/metafix.pdf (offset 1593): getStreamData called on unfilterable stream',)
  DEBUG - Optimizable images: JBIG2 groups: 0 JPEGs: 0 PNGs: 0 Errors: 1
   INFO - Optimize ratio: 1.00 savings: 0.0%

I used the following command to generate output file

$ ocrmypdf -v -j2 -O3 --clean --output-type pdf <input>.pdf <output>.pdf

Below, are the PDF input files for each format and verbose log output.

B/W

Color

Grayscale

@jbarlow83
Copy link
Collaborator

Thanks for the detailed report.

The issue should be fixed by upgrading pikepdf to v0.3.2.

@jsetton
Copy link
Author

jsetton commented Aug 20, 2018

@jbarlow83 thanks for the quick turnaround. I was able to confirm that my issue was resolved with the changes you made to pikepdf.

However, I initially tried to rebuild my docker image to use the latest version pikepdf (from 0.3.0 previously) and there seems to be an incompatibility, per the pip error snippet below, with qpdf 8.0.2 which is currently included in the Ubuntu 18.04 image.

I couldn't find explicit information of an updated minimum requirement for the latest pikepdf version but I was able to resolve the issue by bumping the docker base image to Ubuntu 18.10 (devel) which includes the latest version of qpdf 8.2.1. Is this expected behavior?

For reference, I am running the docker image on a Raspberry Pi.

    building 'pikepdf._qpdf' extension
    creating build/temp.linux-armv7l-3.6
    creating build/temp.linux-armv7l-3.6/src
    creating build/temp.linux-armv7l-3.6/src/qpdf
    arm-linux-gnueabihf-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fdebug-prefix-map=/build/python3.6-55P5Ug/python3.6-3.6.5=. -specs=/usr/share/dpkg/no-pie-compile.specs -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -Isrc/vendor/pybind11/include -Isrc/vendor/pybind11/include -I/appenv/include -I/usr/include/python3.6m -c src/qpdf/object.cpp -o build/temp.linux-armv7l-3.6/src/qpdf/object.o -DVERSION_INFO="0.3.2" -std=c++14 -fvisibility=hidden
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    src/qpdf/object.cpp: In lambda function:
    src/qpdf/object.cpp:776:38: error: ‘newUnicodeString’ is not a member of ‘QPDFObjectHandle’
                 return QPDFObjectHandle::newUnicodeString(utf8);
                                          ^~~~~~~~~~~~~~~~
    error: command 'arm-linux-gnueabihf-gcc' failed with exit status 1

Update
I did a quick search and this error seems to be related to the change below which looks to imply that qpdf 8.1.0 is required for pikepdf>=0.3.1.

pikepdf/pikepdf@c34e89b

@jbarlow83
Copy link
Collaborator

jbarlow83 commented Aug 20, 2018

Yes, pikepdf 0.3.1 does require qpdf 8.1.0 or higher, so Ubuntu 18.04 users must upgrade or build QPDF from source. .travis.yml does this of necessity.

I'd be quite interested in adding anything you've learned about installation, usage and performance on RaspPI to the documentation if you're willing to write it up.

@jsetton
Copy link
Author

jsetton commented Aug 21, 2018

so Ubuntu 18.04 users must upgrade or build QPDF from source.

I initially went with building QPDF from source but it's pretty heavy to compile on a Raspberry Pi, so I went with the base image upgrade instead until the stable one includes the updated packages. Another advantage using the devel base image is that you also get the latest version of tesseract-ocr.

I'd be quite interested in adding anything you've learned about installation, usage and performance on RaspPI to the documentation if you're willing to write it up.

In term of the installation, I had to add the below Ubuntu packages (initially using base image 18.04) to the current list in order to be able to compile the latest cffi and pikepdf Python packages. But I would somehow expect these packages to be necessary for any system.

libffi-dev
libqpdf-dev
python3-dev

In term of usage, it's pretty straight forward although I am now using a modified docker image that include other PDF tools such pdftk to reorder pages and pdfcrop to remove unnecessary white space on non-standard letter size documents prior to running ocrmypdf. In term of performance, it's not blazing fast but it's functional. I had to limit the job option to 2 cores since I have other containers running on the device, otherwise, the OOM killer would start acting up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants