New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
query lines, words, paragraphs, blocks get error no text returned #249
Comments
I'm not an expert with tesseract's API but is it possible that the |
yes, I agree, but how can I reset the rectangle after each call? |
I have added bevor each loop: api.SetRectangle(0, 0, *image.size) image is the pillow image instance and size returns weight and height in pixel of the image, it works in general, I get boxes for words, symbols, lines etc. But it seems that the ordering is not set correctly, so e.g. words boxes does not have got an order like the origin text, so if I have get all words but I cannot create by concatinating the origin text. My goal is to get the whole text in different box detail levels |
I also see your use of
You want to use that function before triggering layout analysis or recognition, not afterwards. Since you are already using the page iterator (via The use case for |
@sirfz, again, the problem is already in the usage example of the current README: Lines 181 to 187 in 711cbab
|
Hello,
I try to get all boxes of lines, words, paragraph, blocks and symbols but I get on a second call the error "No text returned". I have written a method to iterate over all boxes
ElementData is a dataclass for storing the data and api is the api reference, I call this function with:
The input by set image is a PIL-Image as a single page (a JPEG file). The image of the page has goot a header, a footer with some text, multiple paragraphs and multiple lines with words. How can I do this
The text was updated successfully, but these errors were encountered: