New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Couldn't import tesserocr, because locale check error #137

Open
atuyosi opened this Issue Aug 5, 2018 · 4 comments

Comments

Projects
None yet
2 participants
@atuyosi
Copy link

atuyosi commented Aug 5, 2018

I got a import error.

import tesserocr
!strcmp(locale, "C"):Error:Assert failed:in file baseapi.cpp, line 203
Abort trap: 6

This error was caused by locale check.

Please see commit .

Simple workaround here .

import locale
locale.setlocale(locale.LC_ALL, 'C')
import tesserocr

I think that it is necessary to add the code somewhere in the appropriate place.

Environment:

  • macOS 10.13.6
  • Python 3.6.5
  • tesserocr 2.3.0
  • tesseract 4.0.0-beta.4-20-ge9b4e

In addition, I avoid install error by #129 workaroud.

@sirfz

This comment has been minimized.

Copy link
Owner

sirfz commented Aug 28, 2018

tesseract 4 requires LC_ALL, LC_CTYPE and LC_NUMERIC to be set to C: https://github.com/tesseract-ocr/tesseract/blob/4.0.0-beta.4/src/api/baseapi.cpp#L203

In my local tests it seems to have no effect with Python 2.7 but crashes with Python 3.6 and 3.7.

I'm reluctant to hard-code this into tesserocr because I'm not sure what the effect would be on other modules or Python's behavior. Maybe someone with more knowledge about this can chip in?

@atuyosi

This comment has been minimized.

Copy link
Author

atuyosi commented Aug 28, 2018

As you said, the hardcoding is undesirable.

IMO, I think that it is necessary to ask the Cython community about handling environment variables.

cf. 24.2. locale — Internationalization services — Python 3.7.0 documentation

FYI, other Language's solution.

Various fixes for Tesseract 4 beta.3 · ropensci/tesseract@2784542

@sirfz

This comment has been minimized.

Copy link
Owner

sirfz commented Aug 29, 2018

Changing locale before and after calling Init seems reasonable so I'll go with that (thanks for the links). Would resetting the locale to something other than C affect the results of other API methods though?

@sirfz

This comment has been minimized.

Copy link
Owner

sirfz commented Aug 29, 2018

According to tesseract-ocr/tesseract/issues/1670, this might only be temporary until they replace function calls which rely on locale settings. I'd rather wait and see how this plays out before pushing any patches.

Chilipp added a commit to Chilipp/straditize that referenced this issue Dec 9, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment