-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pagesegmode #40
Comments
No official way, but you can try (ab)using the --tesseract-config argument which forwards one argument at a time to tesseract. e.g. for a single text line I'm not sure if I'd implement this since most PDF images have a text page, not a line or word. |
Hey thanks for your help!
It's not really about a one line pdf or a one word pdf. My Problem is the automatic column detection which ruins my OCR (the page is a mix of 2 and 1 column text) |
Implemented in commit 8d323ae. |
Officially released in v3.2 |
Hey,
is there a way to define the pagesegmode for the tesseract OCR?
(https://tesseract-ocr.googlecode.com/git/doc/tesseract.1.html)
Thank you very much
tuxasus
The text was updated successfully, but these errors were encountered: