Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add simple orientation detection #34

Merged
merged 4 commits into from Jun 5, 2022
Merged

Add simple orientation detection #34

merged 4 commits into from Jun 5, 2022

Conversation

robertknight
Copy link
Owner

@robertknight robertknight commented Jun 2, 2022

Add simple orientation detection using Leptonica's pixOrientDetect function. This was used in Tesseract because Tesseract's implementation requires the legacy (non-LSTM) engine, which is not compiled in. Leptonica's algorithm relies mostly on "the preponderence of ascenders over descenders in languages with roman characters", per this paper. Tesseract's approach which is not being used is described here.

TODO:

  • Investigate issues with same rotated image producing different results when loaded in different browsers (see notes in second commit)
  • Perhaps add a way for getOrientation API to indicate uncertainty in the result or errors in the process. Currently it returns 0 in the event of any error, and has no way to represent confidence in the result.

@robertknight
Copy link
Owner Author

After some local tests I think an alternative approach might be to:

  1. Run layout analysis
  2. Sample a few words or lines and try running text recognition on them in each of the 4 orientations
  3. Pick the orientation which gives the highest mean confidence score

Tesseract's has built-in script and orientation detection but it is part of the
classic (pre-LSTM) engine, which has been compiled out to reduce binary size.
Hence this initial implementation use Leptonica's more simplistic orientation
detection, which is based on counting numbers of ascenders and descenders, as
described on pages 12-14 of http://www.leptonica.org/papers/skew-measurement.pdf.
In adding this I encountered issues where the same rotated image dropped into
Safari, Chrome and Firefox could give different results. I believe this has to
do with how the EXIF rotation information is handled by the various browser APIs
used to load and draw images, but this still needs to be debugged.
The confidence value is currently 0 if an error occurred or 1 otherwise.
This at least creates a space in the API to include a confidence score
in the result.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant