-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lossy JPEG image optimization #95
Comments
About lossless JPEG image optimization, see #41. (It's not yet implemented.) I've repurposed this bug to lossy JPEG optimization, including the use of external applications. Currently this is not implemented in pdfsizeopt, and it's unlikely to be implemented any time soon, unless somebody volunteers, does the implementation, and sends a patch (pull request). (Search for For PNG optimization using external programs, use the |
See also #123 for jpeg-recompress command-lines and lossy JPEG and JPEG 2000 optimizers. Currently pdfsizeopt doesn't do any lossy optimizations (image or other). It would be possible to add lossy optimizations (which can be enabled with a command-line flag) in general and lossy image optimizations with external tools such as jpeg-recompress in particular, but that would need substantial software development and maintenance work, and that would need either funding or volunteering (i.e. pull requests). Closing this issue until funding or volunteering is proposed. |
I move the question from #123: Can use https://github.com/strichter/img2pdf instead of sam2p? |
@pts, for reflection: https://github.com/ImageProcessing-ElectronicPublications/python-pdf-jpeg-extract In the case of |
@zvezdochiot : How would pdfsizeopt benefit from https://github.com/strichter/img2pdf ? What is the use case? Do you have an example PDF input file? |
@zvezdochiot : How would pdfsizeopt benefit from https://github.com/ImageProcessing-ElectronicPublications/python-pdf-jpeg-extract ? pdfsizeopt already contains code which can find image objects, detect JPEG compression (/Filter /DCTDecode) and extract the compressed JPEG data. Do you have an example PDF input file? |
@pts say:
Img2pdf can generate PDF from JPEG without recoding (inserts JPEG into obj-wrapper). This allows you to think about the processing of DCTDecode. PS: True, img2pdf uses PIL, so it has limitations on color mode and TIFF encoding. |
@zvezdochiot : This bug Is about adding this feature to pdfsizeopt: run a lossy JPEG optimizer (which degrades visual quality and makes the file smaller) and copy its JPEG output to a PDF image object with /Filter /DCTDecode. img2pdf could help in the copy step, but pdfsizeopt doesn't need such help, it already has such code. Colorspace processing can be tricky though, some of the colorspace information is in PDF-specific JPEG markers, some are in the PDF object header, and the JPEG optimizer doesn't see the PDF object header. It's unlikely that this feature gets implemented soon unless somebody volunteers to implement it in pdfsizeopt, or pdfsizeopt receives funding. |
@pts say:
Not! pdfsizeopt/lib/pdfsizeopt/main.py Lines 7283 to 7284 in 33ec5e5
The only way I can work with /DCTDecode (JPEG) is via csplit : rbrito#1 (comment)
|
how much funding would you need? |
On February 8, 2020 10:12:48 PM GMT-03:00, Stephan Busch ***@***.***> wrote:
how much funding would you need?
I implemented two scripts that use a Python module based on qpdf to remove metadata, thumbnails, Javascript and to losslessly call jpgcrush on RGB or Gray JPEG's.
Running that before pdfsizeopt gives an overall great reduction of the size of the original PDF in most cases...
It would, of course, be great to have everything like this in a single program...
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
@rbrito would you mind sharing your script here? I would love to test it. |
@rbrito say:
I want to draw your attention to the possibility of applying lossy operations with JPEG coefficients. Such as https://github.com/ImageProcessing-ElectronicPublications/jpegquant. Or even a full JPEG transcoding: https://github.com/ilyakurdyukov/jpeg-quantsmooth (https://github.com/ImageProcessing-ElectronicPublications/jpeg-quantsmooth) + https://github.com/danielgtaylor/jpeg-archive (https://github.com/ImageProcessing-ElectronicPublications/jpeg-recompress). PS: https://github.com/rbrito/pkg-jpgcrush (https://github.com/ImageProcessing-ElectronicPublications/jpegrescan-perl) |
@rbrito Is that the script you are talking about? |
@StephanBusch say:
Not. @rbrito talks about another script (which I don’t know). This applies PS: (https://github.com/StephanBusch/FastECC) -> see (https://github.com/fridex/rscode-correction). |
@StephanBusch, I uploaded the scripts (that should be merged into one) that I'm writing to my repository https://github.com/rbrito/scripts/ I usually run https://github.com/rbrito/scripts/blob/master/using_pikepdf.py, then https://github.com/rbrito/scripts/blob/master/optimize_jpegs.py and, finally, https://github.com/rbrito/scripts/blob/master/best_pdf_compression.py (which calls I packaged @zvezdochiot, I don't follow your note to use ECC (other than the topic being of my interest too). Hope this helps, Rogério Brito. |
@rbrito say:
So this is not for you. This is for @StephanBusch . Thanks for the scripts (#95 (comment) , #41 (comment)). |
@rbrito thank you very much |
I have used an external application for optimising images in PDF. I am using the option for CMD_PATTERN. However, it looks like this utility does not invoke external application for JPG images. Is there any option for optimise JPG images inside PDF file. I want to invoke external application to optimise JPG & PNG images
The text was updated successfully, but these errors were encountered: