-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error while using hocr-pdf file #121
Comments
Can you provide a hOCR file which causes this error? How did you create it? |
I used Tesseract 4.0.0 to generate hocr |
Is there any other solution for getting table from hocr data ? |
This works for me as well after I have renamed the image and converted it to a jpg file.
|
are you able to generate searchable pdf ? |
Tesseract has an option to output to pdf. Did you tried it? |
Yes, I see a searchable PDF, but I am working on Linux. For windows terminal the encoding can be a problem. You can check the encoding for python in windows terminal by starting >>> import sys
>>> sys.stdout.encoding If that is now UTF-8 then you can try to run the command with
This is with the git bash on windows, right? Can you upload your result here? |
@shekarnode There is text in your generated PDF and I can search for text as well. |
I was using adobe reader and all the time was not able to search ,now when I opened the pdf in browser I found out it was searchable. Thanks @zuphilip for helping out. |
The pdf produced by Tesseract is also searchable. |
While using the below command i m getting error related to character
help out please
The text was updated successfully, but these errors were encountered: