MRC #88

v217 · 2014-09-19T16:14:39Z

Hi, especially for scans integration with jbig2enc for better compression of the textimage layer would make this software perfect.

jbarlow83 · 2014-09-20T06:33:15Z

pdfbeads (a ruby project) attempts to do that although it has issues with aligning the hidden OCR text layer with the image and some crash bugs, and the documentation is mainly in Russian.

I've looked into making the changes for OCRmyPDF. It would be a major overhaul/rewrite and would call for a new PDF generation backend.

v217 · 2014-09-20T07:57:02Z

jbig2enc itself is quite stable, now it recognises also quite well the resolution of the images. There is also support for basic foreground background separation. There's a one page script in python for generation of multilayer pdf for an earlier version of jbig2enc. When this script was written the recognition of the resolution of the pdf still did not work reliably. In short for scans jbig2 is a must, but on linux this is still not available.

v217 · 2014-09-20T08:38:25Z

If one is willing to use more than one graphics library leptonica written in c for jbig2enc and for text foreground and background separation gamera written in python for didjvu all the ingredients are already there and well tested.

OCRmyPDF-issuebot mentioned this issue Sep 14, 2015

MRC ocrmypdf/OCRmyPDF#9

Closed

jbarlow83 closed this as completed Dec 5, 2015

rmast mentioned this issue Jan 1, 2022

Introduce a way to radically reduce the output file size (sacrificing image quality) ocrmypdf/OCRmyPDF#541

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MRC #88

MRC #88

v217 commented Sep 19, 2014

jbarlow83 commented Sep 20, 2014

v217 commented Sep 20, 2014

v217 commented Sep 20, 2014

MRC #88

MRC #88

Comments

v217 commented Sep 19, 2014

jbarlow83 commented Sep 20, 2014

v217 commented Sep 20, 2014

v217 commented Sep 20, 2014