-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto correct image rotation (-180, -90, 0, +90) #46
Comments
it seems that orientation detection will be supported in the next version of the tesseract command line interface: |
Have been testing with v3.04 (compiled from git source). With -psm 0 it gives the orientation as well as confidence and an integer, but then that means you have to run tesseract-ocr over the page twice (first for orientation and then for OCR). In -psm 1 mode it adds a 'textangle ###' attribute to the tags in the hocr file, so at the moment I am using the following to detect the rotation and correct it, after
Unfortunately this doesn't work; If I rotate the image after OCR (and orientation detection), but before calling If I rotate the image after the PDF is generated, it doesn't rotate correctly and/or the OCR'ed text is correct but not laid out correctly. So it looks like the only way to do it properly is to call tesseract-ocr twice. Once to determine orientation, rotate the image if necessary and then a second time to perform OCR duties. Edit: |
I ported this to a node library (here), part of it was implementing auto-rotation. Added a prototype to find the general rotation (by finding the greatest number of textangles in the hocr). Also by climbing up/down the DOM to the ocr_line class |
No description provided.
The text was updated successfully, but these errors were encountered: