-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tesseract error in preprocessing #7
Comments
That last line of the error message is informative.
Try running that command and see if it gives you any more useful info. Maybe Looking at that image of the FirstBank logo... it doesn't look like something this library was written for. I don't think it will parse anything from that as-is. It will require a huge amount of customization for that. |
This library will need a lot of customization to work with a table like that. The code looks for vertical and horizontal lines to detect a table. And then when it finds a cell, it expects the cell to contain a single line of text. Your example is incompatible with both of those expectations. The best I can suggest if you want to use anything from this library is to use it as a reference while writing a lot of custom code. |
Attempting to OCR a table and I keep getting an error.
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/table_ocr/pdf_to_images/init.py", line 69, in preprocess_img
rotate = get_rotate(filepath, tess_params)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/table_ocr/pdf_to_images/init.py", line 79, in get_rotate
subprocess.check_output(tess_command)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 336, in check_output
**kwargs).stdout
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['tesseract', '--psm', '0', '--oem', '0', '/Users/andrewmcfadden/Documents/GitHub/one2many.github.io/image-table-ocr/dance/ga-20190131-001.png', '-']' returned non-zero exit status 1.
The image is the logo at the top of the page (every page).
The text was updated successfully, but these errors were encountered: