You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I really like your tool, it's recognition seems to be better than Tesseract's in some cases. Tesseract, however, has a more detailled hOCR output:
Each word gets wrapped in a span with class ocrx_word and has a bbox and x_wconf property.
The bbox property for each word gives the user the possibility to write an own implementation of layout detection, while the x_wconf allows omitting words, which were probably not recognized correctly.
Is this also possible with ocropy or is this planned?
Thank you.
The text was updated successfully, but these errors were encountered:
kraken has word and character bounding boxes with character confidences and is as far as I know completely compatible with ocropus models.
You have to keep in mind though that the model does not really segment the line in words and characters (or rather grapheme clusters) but is trained to create the correct labels in the right order (if I understand CTC correctly) so character cuts are often not quite correct even if the recognition result is. In the same line ocrx_words are calculated "artificially" from the recognition result as ocropus has no notion of words while tesseract does some word based postprocessing.
Hi,
I really like your tool, it's recognition seems to be better than Tesseract's in some cases. Tesseract, however, has a more detailled hOCR output:
Each word gets wrapped in a span with class ocrx_word and has a bbox and x_wconf property.
The bbox property for each word gives the user the possibility to write an own implementation of layout detection, while the x_wconf allows omitting words, which were probably not recognized correctly.
Is this also possible with ocropy or is this planned?
Thank you.
The text was updated successfully, but these errors were encountered: