Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRC #88

Closed
v217 opened this issue Sep 19, 2014 · 3 comments
Closed

MRC #88

v217 opened this issue Sep 19, 2014 · 3 comments

Comments

@v217
Copy link

v217 commented Sep 19, 2014

Hi, especially for scans integration with jbig2enc for better compression of the textimage layer would make this software perfect.

@jbarlow83
Copy link
Collaborator

pdfbeads (a ruby project) attempts to do that although it has issues with aligning the hidden OCR text layer with the image and some crash bugs, and the documentation is mainly in Russian.

I've looked into making the changes for OCRmyPDF. It would be a major overhaul/rewrite and would call for a new PDF generation backend.

@v217
Copy link
Author

v217 commented Sep 20, 2014

jbig2enc itself is quite stable, now it recognises also quite well the resolution of the images. There is also support for basic foreground background separation. There's a one page script in python for generation of multilayer pdf for an earlier version of jbig2enc. When this script was written the recognition of the resolution of the pdf still did not work reliably. In short for scans jbig2 is a must, but on linux this is still not available.

@v217
Copy link
Author

v217 commented Sep 20, 2014

If one is willing to use more than one graphics library leptonica written in c for jbig2enc and for text foreground and background separation gamera written in python for didjvu all the ingredients are already there and well tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants