lossy JPEG image optimization #95

shailenderjain · 2018-07-28T02:17:12Z

I have used an external application for optimising images in PDF. I am using the option for CMD_PATTERN. However, it looks like this utility does not invoke external application for JPG images. Is there any option for optimise JPG images inside PDF file. I want to invoke external application to optimise JPG & PNG images

pts · 2018-07-28T22:36:05Z

About lossless JPEG image optimization, see #41. (It's not yet implemented.)

I've repurposed this bug to lossy JPEG optimization, including the use of external applications. Currently this is not implemented in pdfsizeopt, and it's unlikely to be implemented any time soon, unless somebody volunteers, does the implementation, and sends a patch (pull request). (Search for /DCTDecode in main.py.) The original design philosophy of pdfsizeopt is that it does only optimizations which don't change the visual appearance of the PDF, thus lossy JPEG optimization is not allowed. However, if it gets implemented, we can enable it with a command-line flag which is turned off by default.

For PNG optimization using external programs, use the --use-image-optimizer=... flag described in the Image optimizers section in the README (https://github.com/pts/pdfsizeopt).

pts · 2019-07-04T14:43:03Z

See also #123 for jpeg-recompress command-lines and lossy JPEG and JPEG 2000 optimizers.

Currently pdfsizeopt doesn't do any lossy optimizations (image or other). It would be possible to add lossy optimizations (which can be enabled with a command-line flag) in general and lossy image optimizations with external tools such as jpeg-recompress in particular, but that would need substantial software development and maintenance work, and that would need either funding or volunteering (i.e. pull requests). Closing this issue until funding or volunteering is proposed.

zvezdochiot · 2019-07-06T18:50:50Z

I move the question from #123:

Can use https://github.com/strichter/img2pdf instead of sam2p?

zvezdochiot · 2019-07-25T19:29:05Z

@pts, for reflection:

https://github.com/ImageProcessing-ElectronicPublications/python-pdf-jpeg-extract

In the case of pdfsizeopt, the find operation must be applied to obj.stream(In the caseif ('/DCTDecode ' in filter2):).

pts · 2020-02-05T18:51:50Z

@zvezdochiot : How would pdfsizeopt benefit from https://github.com/strichter/img2pdf ? What is the use case? Do you have an example PDF input file?

pts · 2020-02-05T18:54:11Z

@zvezdochiot : How would pdfsizeopt benefit from https://github.com/ImageProcessing-ElectronicPublications/python-pdf-jpeg-extract ? pdfsizeopt already contains code which can find image objects, detect JPEG compression (/Filter /DCTDecode) and extract the compressed JPEG data. Do you have an example PDF input file?

zvezdochiot · 2020-02-05T19:12:09Z

@pts say:

How would pdfsizeopt benefit from https://github.com/strichter/img2pdf ?

Img2pdf can generate PDF from JPEG without recoding (inserts JPEG into obj-wrapper). This allows you to think about the processing of DCTDecode.

PS: True, img2pdf uses PIL, so it has limitations on color mode and TIFF encoding.

pts · 2020-02-06T10:41:13Z

@zvezdochiot : This bug Is about adding this feature to pdfsizeopt: run a lossy JPEG optimizer (which degrades visual quality and makes the file smaller) and copy its JPEG output to a PDF image object with /Filter /DCTDecode. img2pdf could help in the copy step, but pdfsizeopt doesn't need such help, it already has such code. Colorspace processing can be tricky though, some of the colorspace information is in PDF-specific JPEG markers, some are in the PDF object header, and the JPEG optimizer doesn't see the PDF object header.

It's unlikely that this feature gets implemented soon unless somebody volunteers to implement it in pdfsizeopt, or pdfsizeopt receives funding.

zvezdochiot · 2020-02-06T13:45:58Z

@pts say:

but pdfsizeopt doesn't need such help, it already has such code.

Not!

pdfsizeopt/lib/pdfsizeopt/main.py

Lines 7283 to 7284 in 33ec5e5

    
           if ('/JPXDecode ' in filter2 or '/DCTDecode ' in filter2): 
        
             continue

The only way I can work with /DCTDecode (JPEG) is via csplit: rbrito#1 (comment)

StephanBusch · 2020-02-09T01:12:47Z

how much funding would you need?

rbrito · 2020-02-09T03:00:28Z

On February 8, 2020 10:12:48 PM GMT-03:00, Stephan Busch ***@***.***> wrote: how much funding would you need?

I implemented two scripts that use a Python module based on qpdf to remove metadata, thumbnails, Javascript and to losslessly call jpgcrush on RGB or Gray JPEG's. Running that before pdfsizeopt gives an overall great reduction of the size of the original PDF in most cases... It would, of course, be great to have everything like this in a single program... -- Sent from my Android device with K-9 Mail. Please excuse my brevity.

StephanBusch · 2020-02-09T05:33:22Z

@rbrito would you mind sharing your script here? I would love to test it.

zvezdochiot · 2020-02-09T07:42:36Z

@rbrito say:

to losslessly call jpgcrush on RGB or Gray JPEG's.

I want to draw your attention to the possibility of applying lossy operations with JPEG coefficients. Such as https://github.com/ImageProcessing-ElectronicPublications/jpegquant. Or even a full JPEG transcoding: https://github.com/ilyakurdyukov/jpeg-quantsmooth (https://github.com/ImageProcessing-ElectronicPublications/jpeg-quantsmooth) + https://github.com/danielgtaylor/jpeg-archive (https://github.com/ImageProcessing-ElectronicPublications/jpeg-recompress).

PS: https://github.com/rbrito/pkg-jpgcrush (https://github.com/ImageProcessing-ElectronicPublications/jpegrescan-perl)

StephanBusch · 2020-02-09T14:25:09Z

@rbrito Is that the script you are talking about?
PS: https://github.com/rbrito/pkg-jpgcrush (https://github.com/ImageProcessing-ElectronicPublications/jpegrescan-perl)

zvezdochiot · 2020-02-09T14:38:21Z

@StephanBusch say:

Is that the script you are talking about?

Not. @rbrito talks about another script (which I don’t know). This applies jpegtran to JPEG files (not PDF).

PS: (https://github.com/StephanBusch/FastECC) -> see (https://github.com/fridex/rscode-correction).

rbrito · 2020-02-10T22:01:38Z

@StephanBusch, I uploaded the scripts (that should be merged into one) that I'm writing to my repository https://github.com/rbrito/scripts/

I usually run https://github.com/rbrito/scripts/blob/master/using_pikepdf.py, then https://github.com/rbrito/scripts/blob/master/optimize_jpegs.py and, finally, https://github.com/rbrito/scripts/blob/master/best_pdf_compression.py (which calls pdfsizeopt as a "garbage collector" and removes unused objects).

I packaged jpgcrush for my own use and uploaded it to https://launchpad.net/~rbrito/+archive/ubuntu/ppa for convenience of other people too.

@zvezdochiot, I don't follow your note to use ECC (other than the topic being of my interest too).

Hope this helps,

Rogério Brito.

zvezdochiot · 2020-02-11T09:56:28Z

@rbrito say:

I don't follow your note to use ECC

So this is not for you. This is for @StephanBusch .

Thanks for the scripts (#95 (comment) , #41 (comment)).

StephanBusch · 2020-02-12T13:38:31Z

@rbrito thank you very much

pts changed the title ~~JPG Image optimization~~ lossy JPG Image optimization Jul 28, 2018

pts changed the title ~~lossy JPG Image optimization~~ lossy JPEG image optimization Jul 28, 2018

pts added the enhancement label Jul 28, 2018

pts mentioned this issue Jul 4, 2019

lossy JPEG image optimizers (e.g. jpeg-recompress) through command line #123

Closed

zvezdochiot mentioned this issue Jul 21, 2019

change /Filter [/FlateDecode /DCTDecode] to /Filter /DCTDecode #127

Open

zvezdochiot mentioned this issue Jan 4, 2021

PDF file containing JPEG images is not getting any smaller #147

Closed

pts mentioned this issue Feb 27, 2023

add lossless optimizations for JPEG images embedded into the PDF #41

Open

zvezdochiot mentioned this issue Jun 16, 2023

I'm excited! #166

Open

zvezdochiot mentioned this issue Apr 19, 2024

Lossy JPEG compression #67

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lossy JPEG image optimization #95

lossy JPEG image optimization #95

shailenderjain commented Jul 28, 2018

pts commented Jul 28, 2018 •

edited

Loading

pts commented Jul 4, 2019

zvezdochiot commented Jul 6, 2019

zvezdochiot commented Jul 25, 2019 •

edited

Loading

pts commented Feb 5, 2020

pts commented Feb 5, 2020

zvezdochiot commented Feb 5, 2020

pts commented Feb 6, 2020

zvezdochiot commented Feb 6, 2020 •

edited

Loading

StephanBusch commented Feb 9, 2020

rbrito commented Feb 9, 2020 via email

StephanBusch commented Feb 9, 2020

zvezdochiot commented Feb 9, 2020 •

edited

Loading

StephanBusch commented Feb 9, 2020

zvezdochiot commented Feb 9, 2020 •

edited

Loading

rbrito commented Feb 10, 2020

zvezdochiot commented Feb 11, 2020 •

edited

Loading

StephanBusch commented Feb 12, 2020

lossy JPEG image optimization #95

lossy JPEG image optimization #95

Comments

shailenderjain commented Jul 28, 2018

pts commented Jul 28, 2018 • edited Loading

pts commented Jul 4, 2019

zvezdochiot commented Jul 6, 2019

zvezdochiot commented Jul 25, 2019 • edited Loading

pts commented Feb 5, 2020

pts commented Feb 5, 2020

zvezdochiot commented Feb 5, 2020

pts commented Feb 6, 2020

zvezdochiot commented Feb 6, 2020 • edited Loading

StephanBusch commented Feb 9, 2020

rbrito commented Feb 9, 2020 via email

StephanBusch commented Feb 9, 2020

zvezdochiot commented Feb 9, 2020 • edited Loading

StephanBusch commented Feb 9, 2020

zvezdochiot commented Feb 9, 2020 • edited Loading

rbrito commented Feb 10, 2020

zvezdochiot commented Feb 11, 2020 • edited Loading

StephanBusch commented Feb 12, 2020

pts commented Jul 28, 2018 •

edited

Loading

zvezdochiot commented Jul 25, 2019 •

edited

Loading

zvezdochiot commented Feb 6, 2020 •

edited

Loading

zvezdochiot commented Feb 9, 2020 •

edited

Loading

zvezdochiot commented Feb 9, 2020 •

edited

Loading

zvezdochiot commented Feb 11, 2020 •

edited

Loading