PDF Optimization Error: getStreamData called on unfilterable stream #285

jsetton · 2018-08-19T16:32:35Z

I have been trying to perform lossless optimizations on the PDF generated by the EPSON Scan application. It works fine on b/w (gray/ccitt) or color (rgb) image type but I am getting the below error for grayscale type.

  DEBUG - PdfError('/tmp/com.github.ocrmypdf.8v5q65k6/metafix.pdf (offset 1593): getStreamData called on unfilterable stream',)
  DEBUG - Optimizable images: JBIG2 groups: 0 JPEGs: 0 PNGs: 0 Errors: 1
   INFO - Optimize ratio: 1.00 savings: 0.0%

I used the following command to generate output file

$ ocrmypdf -v -j2 -O3 --clean --output-type pdf <input>.pdf <output>.pdf

Below, are the PDF input files for each format and verbose log output.

B/W

Color

Grayscale

The text was updated successfully, but these errors were encountered:

jbarlow83 · 2018-08-20T06:46:09Z

Thanks for the detailed report.

The issue should be fixed by upgrading pikepdf to v0.3.2.

jsetton · 2018-08-20T16:33:08Z

@jbarlow83 thanks for the quick turnaround. I was able to confirm that my issue was resolved with the changes you made to pikepdf.

However, I initially tried to rebuild my docker image to use the latest version pikepdf (from 0.3.0 previously) and there seems to be an incompatibility, per the pip error snippet below, with qpdf 8.0.2 which is currently included in the Ubuntu 18.04 image.

I couldn't find explicit information of an updated minimum requirement for the latest pikepdf version but I was able to resolve the issue by bumping the docker base image to Ubuntu 18.10 (devel) which includes the latest version of qpdf 8.2.1. Is this expected behavior?

For reference, I am running the docker image on a Raspberry Pi.

    building 'pikepdf._qpdf' extension
    creating build/temp.linux-armv7l-3.6
    creating build/temp.linux-armv7l-3.6/src
    creating build/temp.linux-armv7l-3.6/src/qpdf
    arm-linux-gnueabihf-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fdebug-prefix-map=/build/python3.6-55P5Ug/python3.6-3.6.5=. -specs=/usr/share/dpkg/no-pie-compile.specs -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -Isrc/vendor/pybind11/include -Isrc/vendor/pybind11/include -I/appenv/include -I/usr/include/python3.6m -c src/qpdf/object.cpp -o build/temp.linux-armv7l-3.6/src/qpdf/object.o -DVERSION_INFO="0.3.2" -std=c++14 -fvisibility=hidden
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    src/qpdf/object.cpp: In lambda function:
    src/qpdf/object.cpp:776:38: error: ‘newUnicodeString’ is not a member of ‘QPDFObjectHandle’
                 return QPDFObjectHandle::newUnicodeString(utf8);
                                          ^~~~~~~~~~~~~~~~
    error: command 'arm-linux-gnueabihf-gcc' failed with exit status 1

Update
I did a quick search and this error seems to be related to the change below which looks to imply that qpdf 8.1.0 is required for pikepdf>=0.3.1.

pikepdf/pikepdf@c34e89b

jbarlow83 · 2018-08-20T22:32:53Z

Yes, pikepdf 0.3.1 does require qpdf 8.1.0 or higher, so Ubuntu 18.04 users must upgrade or build QPDF from source. .travis.yml does this of necessity.

I'd be quite interested in adding anything you've learned about installation, usage and performance on RaspPI to the documentation if you're willing to write it up.

jsetton · 2018-08-21T01:53:10Z

so Ubuntu 18.04 users must upgrade or build QPDF from source.

I initially went with building QPDF from source but it's pretty heavy to compile on a Raspberry Pi, so I went with the base image upgrade instead until the stable one includes the updated packages. Another advantage using the devel base image is that you also get the latest version of tesseract-ocr.

I'd be quite interested in adding anything you've learned about installation, usage and performance on RaspPI to the documentation if you're willing to write it up.

In term of the installation, I had to add the below Ubuntu packages (initially using base image 18.04) to the current list in order to be able to compile the latest cffi and pikepdf Python packages. But I would somehow expect these packages to be necessary for any system.

libffi-dev
libqpdf-dev
python3-dev

In term of usage, it's pretty straight forward although I am now using a modified docker image that include other PDF tools such pdftk to reorder pages and pdfcrop to remove unnecessary white space on non-standard letter size documents prior to running ocrmypdf. In term of performance, it's not blazing fast but it's functional. I had to limit the job option to 2 cores since I have other containers running on the device, otherwise, the OOM killer would start acting up.

jbarlow83 closed this as completed Aug 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDF Optimization Error: getStreamData called on unfilterable stream #285

PDF Optimization Error: getStreamData called on unfilterable stream #285

jsetton commented Aug 19, 2018

jbarlow83 commented Aug 20, 2018

jsetton commented Aug 20, 2018 •

edited

jbarlow83 commented Aug 20, 2018 •

edited

jsetton commented Aug 21, 2018

PDF Optimization Error: getStreamData called on unfilterable stream #285

PDF Optimization Error: getStreamData called on unfilterable stream #285

Comments

jsetton commented Aug 19, 2018

jbarlow83 commented Aug 20, 2018

jsetton commented Aug 20, 2018 • edited

jbarlow83 commented Aug 20, 2018 • edited

jsetton commented Aug 21, 2018

jsetton commented Aug 20, 2018 •

edited

jbarlow83 commented Aug 20, 2018 •

edited