-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix API usage examples #260
Conversation
api.SetRectangle(box['x'], box['y'], box['w'], box['h']) | ||
ocrResult = api.GetUTF8Text() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: it's not enirely wrong to use the API this way, because GetComponentImages
makes copies of the segment images and bboxes, so it does not hurt that SetRectangle
invalidates the layout analysis results. But it is still not useful to loop that way if the ultimate goal is the text – you would rather look into the iterator directly for the text. Also, in this formulation, you would have needed to at least set the PSM to line level for a decent OCR result.
Orientation and script detection (OSD): | ||
``````````````````````````````````````` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, the differ from here on compares the wrong lines. I did not replace the OSD example, but inserted a full GetIterator example (i.e. the second half of the above loop), and conflated the two OSD variants below into one.
bbox = {'x': int(bbox[0]), | ||
'y': int(bbox[1]), | ||
'w': int(bbox[2])-int(bbox[0]), | ||
'h': int(bbox[3])-int(bbox[1])} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's noteworthy that PageIterator.BoundingBox
gives a completely different format than GetComponentImages
/GetRegions
– better be explicit here.
print("Deskew angle: {:.4f}".format(deskew_angle)) | ||
|
||
or more simply with ``OSD_ONLY`` page segmentation mode: | ||
Layout analysis with orientation and deskewing: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's important to understand that there are two distinct mechanisms providing orientation detection: the normal page layout analysis (which you can use with any model, including LSTMs) and the dedicated osd
model (which is legacy-only). It was documented the other way round.
print("Orientation: {}".format(membername(Orientation, orientation))) | ||
print("WritingDirection: {}".format(membername(WritingDirection, direction))) | ||
print("TextlineOrder: {:d}".format(membername(TextlineOrder, order))) | ||
print("Deskew angle: {:.1f}°".format(deskew_angle * 180 / math.pi)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems more informative to me to get left-to-right
or PAGE_UP
strings instead of just 0 as output.
Also converting the angle from radians to degrees is more illustrative.
with PyTessBaseAPI(psm=PSM.OSD_ONLY, oem=OEM.LSTM_ONLY) as api: | ||
with PyTessBaseAPI(psm=PSM.OSD_ONLY, | ||
oem=OEM.TESSERACT_ONLY, | ||
lang="osd") as api: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Believe me, it does not work with LSTMs. The standalone CLI even loads the osd
model anyway if the user forgot to. On the API, it will look like it works without loading it, because the default model eng
will get loaded. But there are only symbols from one script in that model, Latin
, thus no actual script detection would happen. (The "signal" would always be strong, because no competing scripts are loaded, like in osd
. DetectOS
/ os_detect
is very special, because it does not use multiple models, but needs a single model with multiple scripts – for which osd
is of course the most versatile, but also frk
contains both Latin
and Fraktur
, hin
contains some Latin
as well etc.)
@@ -246,24 +270,25 @@ Iterator over the classifier choices for a single symbol: | |||
|
|||
with PyTessBaseAPI() as api: | |||
api.SetImageFile('/usr/src/tesseract/testing/phototest.tif') | |||
api.SetVariable("save_blob_choices", "T") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is long gone. I am not sure what is needed for legacy models today. But for LSTMs, it's lstm_choice_mode
. Unfortunately, it usually does not yield any actual results. It's complicated...
No description provided.