New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve optimization - add option to remove unreferenced images #807
Comments
I think we can do smth about this. |
This is fixed with latest commit. |
Good morning, thank you very much! Just compiled the latest version and tested against some of our PDFs. Here's the result with a representative file we have:
So |
Please consider becoming a pdfcpu sponsor. |
Sure, done 😃 I've stumbled across another test case of our PDF test suite where the optimization does not fully work. The first page with the image of the source file |
Appreciated! |
Should be fixed with latest commit. |
Wow, thanks for the quick fix! Just compiled the latest commit, now this test case also succeeds. |
In some cases, PDFs may contain image resources which are not referenced on pages anymore.
Example files:
pdf-optimization-original.pdf
pdf-optimization-page-removed.pdf
Here a second page containing an otter image was there, but has been removed by a third-party PDF editing tool. However, the image resource is still in the PDF.
If you diff the two PDFs, the size is almost the same, and the original image of page two is still taking up space in the file.
Would it be possible to add an optimization option to remove such "orphan" image resources?
The ocrmypdf tool does something similar during its PDF optimization see optimize.py. The possibility to perform this optimization also via pdfcpu would allow us to simplify our toolchain and reduce the number of required dependencies.
EDIT: optimization output via
ocrmypdf
:PDF optimized with
ocrmypdf
:pdf-optimization-page-removed-optimized.pdf
Thank you very much, best regards from Tyrol
Andreas
The text was updated successfully, but these errors were encountered: