Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lossy JPEG image optimization #95

Open
shailenderjain opened this issue Jul 28, 2018 · 18 comments
Open

lossy JPEG image optimization #95

shailenderjain opened this issue Jul 28, 2018 · 18 comments

Comments

@shailenderjain
Copy link

I have used an external application for optimising images in PDF. I am using the option for CMD_PATTERN. However, it looks like this utility does not invoke external application for JPG images. Is there any option for optimise JPG images inside PDF file. I want to invoke external application to optimise JPG & PNG images

@pts pts changed the title JPG Image optimization lossy JPG Image optimization Jul 28, 2018
@pts pts changed the title lossy JPG Image optimization lossy JPEG image optimization Jul 28, 2018
@pts
Copy link
Owner

pts commented Jul 28, 2018

About lossless JPEG image optimization, see #41. (It's not yet implemented.)

I've repurposed this bug to lossy JPEG optimization, including the use of external applications. Currently this is not implemented in pdfsizeopt, and it's unlikely to be implemented any time soon, unless somebody volunteers, does the implementation, and sends a patch (pull request). (Search for /DCTDecode in main.py.) The original design philosophy of pdfsizeopt is that it does only optimizations which don't change the visual appearance of the PDF, thus lossy JPEG optimization is not allowed. However, if it gets implemented, we can enable it with a command-line flag which is turned off by default.

For PNG optimization using external programs, use the --use-image-optimizer=... flag described in the Image optimizers section in the README (https://github.com/pts/pdfsizeopt).

@pts
Copy link
Owner

pts commented Jul 4, 2019

See also #123 for jpeg-recompress command-lines and lossy JPEG and JPEG 2000 optimizers.

Currently pdfsizeopt doesn't do any lossy optimizations (image or other). It would be possible to add lossy optimizations (which can be enabled with a command-line flag) in general and lossy image optimizations with external tools such as jpeg-recompress in particular, but that would need substantial software development and maintenance work, and that would need either funding or volunteering (i.e. pull requests). Closing this issue until funding or volunteering is proposed.

@zvezdochiot
Copy link

I move the question from #123:

Can use https://github.com/strichter/img2pdf instead of sam2p?

@zvezdochiot
Copy link

zvezdochiot commented Jul 25, 2019

@pts, for reflection:

https://github.com/ImageProcessing-ElectronicPublications/python-pdf-jpeg-extract

In the case of pdfsizeopt, the find operation must be applied to obj.stream(In the caseif ('/DCTDecode ' in filter2):).

@pts
Copy link
Owner

pts commented Feb 5, 2020

@zvezdochiot : How would pdfsizeopt benefit from https://github.com/strichter/img2pdf ? What is the use case? Do you have an example PDF input file?

@pts
Copy link
Owner

pts commented Feb 5, 2020

@zvezdochiot : How would pdfsizeopt benefit from https://github.com/ImageProcessing-ElectronicPublications/python-pdf-jpeg-extract ? pdfsizeopt already contains code which can find image objects, detect JPEG compression (/Filter /DCTDecode) and extract the compressed JPEG data. Do you have an example PDF input file?

@zvezdochiot
Copy link

@pts say:

How would pdfsizeopt benefit from https://github.com/strichter/img2pdf ?

Img2pdf can generate PDF from JPEG without recoding (inserts JPEG into obj-wrapper). This allows you to think about the processing of DCTDecode.

PS: True, img2pdf uses PIL, so it has limitations on color mode and TIFF encoding.

@pts
Copy link
Owner

pts commented Feb 6, 2020

@zvezdochiot : This bug Is about adding this feature to pdfsizeopt: run a lossy JPEG optimizer (which degrades visual quality and makes the file smaller) and copy its JPEG output to a PDF image object with /Filter /DCTDecode. img2pdf could help in the copy step, but pdfsizeopt doesn't need such help, it already has such code. Colorspace processing can be tricky though, some of the colorspace information is in PDF-specific JPEG markers, some are in the PDF object header, and the JPEG optimizer doesn't see the PDF object header.

It's unlikely that this feature gets implemented soon unless somebody volunteers to implement it in pdfsizeopt, or pdfsizeopt receives funding.

@zvezdochiot
Copy link

zvezdochiot commented Feb 6, 2020

@pts say:

but pdfsizeopt doesn't need such help, it already has such code.

Not!

if ('/JPXDecode ' in filter2 or '/DCTDecode ' in filter2):
continue

The only way I can work with /DCTDecode (JPEG) is via csplit: rbrito#1 (comment)

@StephanBusch
Copy link

how much funding would you need?

@rbrito
Copy link

rbrito commented Feb 9, 2020 via email

@StephanBusch
Copy link

@rbrito would you mind sharing your script here? I would love to test it.

@zvezdochiot
Copy link

zvezdochiot commented Feb 9, 2020

@StephanBusch
Copy link

@rbrito Is that the script you are talking about?
PS: https://github.com/rbrito/pkg-jpgcrush (https://github.com/ImageProcessing-ElectronicPublications/jpegrescan-perl)

@zvezdochiot
Copy link

zvezdochiot commented Feb 9, 2020

@StephanBusch say:

Is that the script you are talking about?

Not. @rbrito talks about another script (which I don’t know). This applies jpegtran to JPEG files (not PDF).

PS: (https://github.com/StephanBusch/FastECC) -> see (https://github.com/fridex/rscode-correction).

@rbrito
Copy link

rbrito commented Feb 10, 2020

@StephanBusch, I uploaded the scripts (that should be merged into one) that I'm writing to my repository https://github.com/rbrito/scripts/

I usually run https://github.com/rbrito/scripts/blob/master/using_pikepdf.py, then https://github.com/rbrito/scripts/blob/master/optimize_jpegs.py and, finally, https://github.com/rbrito/scripts/blob/master/best_pdf_compression.py (which calls pdfsizeopt as a "garbage collector" and removes unused objects).

I packaged jpgcrush for my own use and uploaded it to https://launchpad.net/~rbrito/+archive/ubuntu/ppa for convenience of other people too.

@zvezdochiot, I don't follow your note to use ECC (other than the topic being of my interest too).

Hope this helps,

Rogério Brito.

@zvezdochiot
Copy link

zvezdochiot commented Feb 11, 2020

@rbrito say:

I don't follow your note to use ECC

So this is not for you. This is for @StephanBusch .

Thanks for the scripts (#95 (comment) , #41 (comment)).

@StephanBusch
Copy link

@rbrito thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants