New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDF Optimization Error: getStreamData called on unfilterable stream #285
Comments
Thanks for the detailed report. The issue should be fixed by upgrading pikepdf to v0.3.2. |
@jbarlow83 thanks for the quick turnaround. I was able to confirm that my issue was resolved with the changes you made to pikepdf. However, I initially tried to rebuild my docker image to use the latest version pikepdf (from 0.3.0 previously) and there seems to be an incompatibility, per the pip error snippet below, with qpdf 8.0.2 which is currently included in the Ubuntu 18.04 image. I couldn't find explicit information of an updated minimum requirement for the latest pikepdf version but I was able to resolve the issue by bumping the docker base image to Ubuntu 18.10 (devel) which includes the latest version of qpdf 8.2.1. Is this expected behavior? For reference, I am running the docker image on a Raspberry Pi.
Update |
Yes, pikepdf 0.3.1 does require qpdf 8.1.0 or higher, so Ubuntu 18.04 users must upgrade or build QPDF from source. I'd be quite interested in adding anything you've learned about installation, usage and performance on RaspPI to the documentation if you're willing to write it up. |
I initially went with building QPDF from source but it's pretty heavy to compile on a Raspberry Pi, so I went with the base image upgrade instead until the stable one includes the updated packages. Another advantage using the devel base image is that you also get the latest version of tesseract-ocr.
In term of the installation, I had to add the below Ubuntu packages (initially using base image 18.04) to the current list in order to be able to compile the latest cffi and pikepdf Python packages. But I would somehow expect these packages to be necessary for any system.
In term of usage, it's pretty straight forward although I am now using a modified docker image that include other PDF tools such pdftk to reorder pages and pdfcrop to remove unnecessary white space on non-standard letter size documents prior to running ocrmypdf. In term of performance, it's not blazing fast but it's functional. I had to limit the job option to 2 cores since I have other containers running on the device, otherwise, the OOM killer would start acting up. |
I have been trying to perform lossless optimizations on the PDF generated by the EPSON Scan application. It works fine on b/w (gray/ccitt) or color (rgb) image type but I am getting the below error for grayscale type.
I used the following command to generate output file
Below, are the PDF input files for each format and verbose log output.
B/W
Color
Grayscale
The text was updated successfully, but these errors were encountered: