Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix API usage examples #260

Closed
wants to merge 1 commit into from
Closed

fix API usage examples #260

wants to merge 1 commit into from

Conversation

bertsky
Copy link
Contributor

@bertsky bertsky commented Jul 2, 2021

No description provided.

Comment on lines -186 to -187
api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
ocrResult = api.GetUTF8Text()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: it's not enirely wrong to use the API this way, because GetComponentImages makes copies of the segment images and bboxes, so it does not hurt that SetRectangle invalidates the layout analysis results. But it is still not useful to loop that way if the ultimate goal is the text – you would rather look into the iterator directly for the text. Also, in this formulation, you would have needed to at least set the PSM to line level for a decent OCR result.

Comment on lines -192 to -193
Orientation and script detection (OSD):
```````````````````````````````````````
Copy link
Contributor Author

@bertsky bertsky Jul 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the differ from here on compares the wrong lines. I did not replace the OSD example, but inserted a full GetIterator example (i.e. the second half of the above loop), and conflated the two OSD variants below into one.

Comment on lines +206 to +209
bbox = {'x': int(bbox[0]),
'y': int(bbox[1]),
'w': int(bbox[2])-int(bbox[0]),
'h': int(bbox[3])-int(bbox[1])}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's noteworthy that PageIterator.BoundingBox gives a completely different format than GetComponentImages/GetRegions – better be explicit here.

print("Deskew angle: {:.4f}".format(deskew_angle))

or more simply with ``OSD_ONLY`` page segmentation mode:
Layout analysis with orientation and deskewing:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important to understand that there are two distinct mechanisms providing orientation detection: the normal page layout analysis (which you can use with any model, including LSTMs) and the dedicated osd model (which is legacy-only). It was documented the other way round.

Comment on lines +229 to +232
print("Orientation: {}".format(membername(Orientation, orientation)))
print("WritingDirection: {}".format(membername(WritingDirection, direction)))
print("TextlineOrder: {:d}".format(membername(TextlineOrder, order)))
print("Deskew angle: {:.1f}°".format(deskew_angle * 180 / math.pi))
Copy link
Contributor Author

@bertsky bertsky Jul 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems more informative to me to get left-to-right or PAGE_UP strings instead of just 0 as output.

Also converting the angle from radians to degrees is more illustrative.

Comment on lines -231 to +247
with PyTessBaseAPI(psm=PSM.OSD_ONLY, oem=OEM.LSTM_ONLY) as api:
with PyTessBaseAPI(psm=PSM.OSD_ONLY,
oem=OEM.TESSERACT_ONLY,
lang="osd") as api:
Copy link
Contributor Author

@bertsky bertsky Jul 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Believe me, it does not work with LSTMs. The standalone CLI even loads the osd model anyway if the user forgot to. On the API, it will look like it works without loading it, because the default model eng will get loaded. But there are only symbols from one script in that model, Latin, thus no actual script detection would happen. (The "signal" would always be strong, because no competing scripts are loaded, like in osd. DetectOS / os_detect is very special, because it does not use multiple models, but needs a single model with multiple scripts – for which osd is of course the most versatile, but also frk contains both Latin and Fraktur, hin contains some Latin as well etc.)

@@ -246,24 +270,25 @@ Iterator over the classifier choices for a single symbol:

with PyTessBaseAPI() as api:
api.SetImageFile('/usr/src/tesseract/testing/phototest.tif')
api.SetVariable("save_blob_choices", "T")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is long gone. I am not sure what is needed for legacy models today. But for LSTMs, it's lstm_choice_mode. Unfortunately, it usually does not yield any actual results. It's complicated...

@bertsky bertsky closed this Jul 3, 2021
@bertsky bertsky deleted the patch-2 branch July 3, 2021 00:01
@bertsky bertsky mentioned this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant