Skip to content

Problem with using other languages on OS X with tesseract installed with brew #23

Open
p7r opened this Issue Oct 24, 2013 · 7 comments

3 participants

@p7r
p7r commented Oct 24, 2013

When trying to do tesseract.rb -l ara or when setting up an Engine as follows:

tesseract = Tesseract::Engine.new{|e| 
# Note this fails for multiple values of e.path and for no value at all
    e.path = "/usr/local/Cellar/tesseract/3.02.02/share/"
    e.language = :ara 
  }

I'm getting this:

Failed loading language 'ara'
Tesseract couldn't load any languages!
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/api.rb:104:in `init': the API did not Init correctly (RuntimeError)
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:234:in `_init'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:54:in `initialize'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/call-me-0.0.2.3/lib/call-me/named.rb:207:in `block in named'
    from ./img2txt.rb:14:in `new'
    from ./img2txt.rb:14:in `<main>'

Tesseract itself is installed correctly and using the compiled binary that comes in the package, I am able to load Arabic language files and get OCR output.

Any suggestions gratefully received.

@p7r
p7r commented Oct 25, 2013

Further to this I removed all tesseract libraries on my machine and reinstalled them and the tesseract-ocr gem.

It seems with English it's fine, but it can't find the language files:

$ tesseract.rb 
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:248:in `_setup': you have to set an image first (ArgumentError)
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:149:in `text_for'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/call-me-0.0.2.3/lib/call-me/named.rb:207:in `block in named'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:77:in `block in <top (required)>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `tap'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `<top (required)>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `load'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `<main>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `eval'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `<main>'

$ tesseract.rb --help
Usage: tesseract [options]
        --path PATH                  datapath to set
    -l, --language LANGUAGE          language to use
    -m, --mode MODE                  mode to use
    -p, --psm MODE                   page segmentation mode to use
    -u, --unlv                       output in UNLV format
    -c, --confidence                 output the mean confidence of the recognition
    -C, --config PATH...             config files to load
    -b, --blacklist LIST             blacklist the following chars
    -w, --whitelist LIST             whitelist the following chars
    -s, --scale VALUE                scale the image before analyzing it
    -r, --resize VALUE               resize the image before analyzing it

$ tesseract.rb -l ara image.png
Failed loading language 'ara'
Tesseract couldn't load any languages!
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/api.rb:104:in `init': the API did not Init correctly (RuntimeError)
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:234:in `_init'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:54:in `initialize'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/call-me-0.0.2.3/lib/call-me/named.rb:207:in `block in named'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `new'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `<top (required)>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `load'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `<main>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `eval'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `<main>'

$ tesseract.rb -l ara --path /usr/local/Cellar/tesseract/3.02.02/share image.png
Failed loading language 'ara'
Tesseract couldn't load any languages!
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/api.rb:104:in `init': the API did not Init correctly (RuntimeError)
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:234:in `_init'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:54:in `initialize'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/call-me-0.0.2.3/lib/call-me/named.rb:207:in `block in named'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `new'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `<top (required)>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `load'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `<main>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `eval'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `<main>'

$ tesseract.rb -l ara --path /usr/local/Cellar/tesseract/3.02.02/share/tessdata image.png
Failed loading language 'ara'
Tesseract couldn't load any languages!
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/api.rb:104:in `init': the API did not Init correctly (RuntimeError)
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:234:in `_init'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/lib/tesseract/engine.rb:54:in `initialize'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/call-me-0.0.2.3/lib/call-me/named.rb:207:in `block in named'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `new'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/tesseract-ocr-0.1.5/bin/tesseract.rb:57:in `<top (required)>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `load'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/tesseract.rb:19:in `<main>'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `eval'
    from /usr/local/rvm/gems/ruby-1.9.2-p290/bin/ruby_noexec_wrapper:14:in `<main>'
Pauls-Mac-mini:arabicocrtest paul$ tesseract.rb -l eng --path /usr/local/Cellar/tesseract/3.02.02/share/tessdata image.png
V L: _ i _ if __ r
., - 7-; f"::"'=:,  ’
‘HQ.’ .9 9 " x_. ‘
' .' ”- « >3)’   »
'5--4 war; -11-!  2.! u-r‘J:“fi-&“‘->s’9":‘;’,,‘,’ .4» ma

The garbage output is expected the only text in that image is Arabic.

@meh
Owner
meh commented Oct 25, 2013

I'll have a look very soon (likely toward the end of the weekend).

@meh
Owner
meh commented Nov 24, 2013

I'm very sorry I haven't looked into this yet, I've been very busy but I promise I will as soon as I have time.

@juniorjp

I have the same problem trying the Nerdz example in this repo. The :lol language is not loaded.

@meh
Owner
meh commented Feb 24, 2014

I think this is an OS X specific issue, and I don't have such a machine to fix this problem.

@juniorjp

In my case my problem is using Ubuntu. If I change the :lol language( in the Nerdz example) to default :en everything works fine.

And it's the same error "the API did not Init correctly (RuntimeError)"

@meh
Owner
meh commented Feb 24, 2014

@juniorjp1989 oh, that's almost good to know then, guess it's a problem with non Arch Linux systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.