Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

!strcmp(locale, "C"):Error:Assert failed:in file ../../../src/api/baseapi.cpp, line 191 #105

Closed
jxc928 opened this issue Jun 28, 2018 · 23 comments

Comments

@jxc928
Copy link

commented Jun 28, 2018

Hi,
I am using tess4j in my demo project and get errors as showed below....
image

Some informations if you need
OS: Ubuntu 16.04
IDE: Intellij IDEA
Tesseract:
image

The tesseract works fine in command line but not in java...

other informations

  1. As mentioned in this link.. This problem also happens in python environment. Just want more people can help, so i submit issue also in here...
  2. As mentioned in this link. These codes are added in 4.0.0-beta.3 and i am using 4.0.0-beta.1.

So,,,,how can i solve this problem??

@jxc928

This comment has been minimized.

Copy link
Author

commented Jun 28, 2018

I'm going to reinstall the tesseract in older version now..hope it works fine....

but still, any suggestion to solve this problem in new version of tesseract??? Thanks...

@alexanderAnexsys

This comment has been minimized.

Copy link

commented Jun 28, 2018

What does it say if you type "locale"?

@alexanderAnexsys

This comment has been minimized.

Copy link

commented Jun 28, 2018

Try: export LC_ALL=C

@jxc928

This comment has been minimized.

Copy link
Author

commented Jun 28, 2018

@alexanderAnexsys thanks for replying!
When i type "locale" in command line, it showed as:
image

I haven't try type "export LC_ALL=C" yet, i'll try it later,,,, i wonder if this command will make any side effects or not?

@alexanderAnexsys

This comment has been minimized.

Copy link

commented Jun 28, 2018

Not sure, but it will solve your issue.
It can be reversed with export LC_ALL="en_US.UTF-8"

@jxc928

This comment has been minimized.

Copy link
Author

commented Jun 28, 2018

@alexanderAnexsys thanks a lot~

Actually i have already started to reinstall tesseract and leptonica with lower versions when you mentioned the typing solution, so i didn't have a chance to try it that time..
Versions i am using now are as shown in below :
image

And the problem is not happened anymore..

So,,

  1. Maybe the way you suggested before will going fine with latest version. (maybe i will try it again few days later because of some reasons, 2nd tip is one of them).
    **For now, i haven't find out any useful information about what side effects will be happened after change the setting of LC_ALL, but i really don't think that changing the settings of OS is the best way to solve this problem.

  2. As version of tessdata should match with tesseract :
    tessdata(4.00) <===> tesseract(>4.0.0)
    tessdata(3.04.00) <===> tesseract(<4.0.0)
    the results are getting better for sure using tessdata(4.00)...

@jxc928

This comment has been minimized.

Copy link
Author

commented Jun 29, 2018

@alexanderAnexsys
Hey there~
eum......bad news.....it doesn't work for me...

and if i add export export LC_ALL=C in /etc/profile ,,,more side effects:

  1. when i reboot the system, i can't call my terminal anymore..
@6FA3T

This comment has been minimized.

Copy link

commented Jul 13, 2018

@jxc928 which version tess4j you used before. i saw your leptonica version is 1.77.0. But tess4j-4.0.2 is depent on Lept4J 1.9.4(work on Upgrade to Leptonica 1.74.4). My program with tess4j -4.0.2 work with tesseract 4.0.0-beta.1 and leptonica-1.76.0 and set Lept4J to 1.10.0

@nguyenq

This comment has been minimized.

Copy link
Owner

commented Jul 14, 2018

Executing export LC_ALL=C at the terminal helped my unit tests run to completion for tesseract 4.0.0-beta.3, whose recent commit seems to have caused the reported issue.

@Xunnozza-Xenx

This comment has been minimized.

Copy link

commented Jul 16, 2018

Had the same problem.

Ubuntu 18, Intellij, Tesseract 4.0.0-beta.3 and leptonica-1.75.3
=> !strcmp(locale, "C"):Error:Assert failed:in file baseapi.cpp, line 201

export LC_ALL=C

For the IDE Intellij can this set as environment variables for the Run Configuration.

@IntraCherche

This comment has been minimized.

Copy link

commented Aug 17, 2018

Same problem on Debian 9, Netbeans, Tess4j 4.2.1.
export LC_ALL=C (or setting the environment variables via Set Project Configuration) worked but then every accented characters (à, é,...) were replaced by ? in the Netbeans console. leading to that kind of results.
Ils ont ?t? se promener ? l'oc?an instead of the expected Ils ont été se promener à l'océan.

Setting only LC_NUMERIC=C as with Tess4j 4.0.2 to keep accents is NOT sufficient anymore. It will only work when LANG is set to C (ie export LANG=C), messing up with accents.

However the accents are well stored in the index that uses the output of tess4j. So in my case this is not blocking.

Please note: the issue has been reported on Tesseract side.

@silverspell

This comment has been minimized.

Copy link

commented Aug 28, 2018

@Xunnozza-Xenx This also works on Eclipse / Run Configurations / Environment

@Raven98

This comment has been minimized.

Copy link

commented Sep 9, 2018

Same problem on CentOS 7, Eclipse, Tesseract 4.0.0-beta.4-138-g2093

@qingtianyu2013

This comment has been minimized.

Copy link

commented Nov 2, 2018

you can try
import locale
locale.setlocale(locale.LC_ALL, 'C')

@tomkan

This comment has been minimized.

Copy link

commented Nov 6, 2018

I have similar problem.
Tess4j-4.3.1-SNAPSHOT
Tesseract-4.0.0-17-g361f3
Fedora 25

When I run app with LC_ALL=pl_PL.UTF-8 crashes with known error.
When run with LC_ALL=C works, but accented characters aren't recognized. For example ó is recognized as 6. I use pol.traindata downloaded from tesseract-ocr github.

Tesseract run from command-line works properly regardless of locale settings.

@jujiaocx

This comment has been minimized.

Copy link

commented Dec 17, 2018

you can try
export LC_ALL=C

@nicodn

This comment has been minimized.

Copy link

commented Jan 9, 2019

I'm on macOS 10.13.6 and with homebrew I worked around this issue by installing tesseract version 3.05.02

brew install https://raw.githubusercontent.com/Homebrew/homebrew-core/f46fe1e0d95a9b262408f6ea9c33e3620b44a9ee/Formula/tesseract.rb

Hope it gets resolved soon so we can upgrade to the latest release!
Thanks! 🙂

@de-ocampo

This comment has been minimized.

Copy link

commented Jan 21, 2019

@nicodn Thanks for posting the link to the 3.05.02 install. Had a very similar error and was tearing my hair out trying to find the older formulae on brew.

@nguyenq

This comment has been minimized.

Copy link
Owner

commented Jan 30, 2019

You can set the system locale within Java code using JNA as depicted in #106 (comment).

@arun0009

This comment has been minimized.

Copy link

commented Feb 19, 2019

I had the same issue, I resolved it by setting environment variable in my docker file. If anyone is interested in running it on docker https://github.com/arun0009/ocr-tess4j-rest

@belkevglaz

This comment has been minimized.

Copy link

commented Apr 13, 2019

Tesseract 4 requires LC_ALL, LC_CTYPE and LC_NUMERIC to be set to C: https://github.com/tesseract-ocr/tesseract/blob/4.0.0-beta.4/src/api/baseapi.cpp#L203

Setting up this enviroment variable for the whole system is not good idea. After that you will get unworkable terminal and several others application
There are some ways to solve this issue in partial cases:

  1. use sh script where set this variable before call java -jar <any.jar>
  2. if you are using IDEA, you can set this in block "Environment variables" for running configuration.
  3. if you use systemd as start service manager, set it in Environment row

nuessgens added a commit to nuessgens/tess4j that referenced this issue Apr 16, 2019

@agnelvishal

This comment has been minimized.

Copy link

commented Jul 24, 2019

After entering export LC_ALL=C in the terminal, run the java code in the same terminal window. If you are using IDE, open the IDE in the same terminal.

@nguyenq

This comment has been minimized.

Copy link
Owner

commented Jul 24, 2019

This restriction to C locale has been lifted in Tesseract 4.1.0 since tesseract-ocr/tesseract@96f6fc2 commit.

@nguyenq nguyenq closed this Jul 31, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.