Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubuntu 14.04 docker image error: Failed to init API, possible an invalid tessdata path: /usr/local/share/ #101

Closed
vatsal28 opened this issue Apr 9, 2018 · 10 comments

Comments

@vatsal28
Copy link

vatsal28 commented Apr 9, 2018

I'm using tesseract-ocr (version 4.00 beta) with tesserocr (2.2.2) and the location of tessdata folder is: /usr/local/share/

But I'm still getting the invalid tessdata path error. I've tried the following to fix it:

  1. Assigned the environment variable as TESSDATA_PREFIX ='/usr/local/share/'
  2. Added path='/usr/local/share/' in PyTessBaseAPI()

The location of the tessdata folder is correct but I'm still not able to use this.
Note: I'm using docker with ubuntu 14.04 image and tesserocr version is 2.2.2 and tesseract version is 4.00 beta

How do I resolve this issue?

@vatsal28 vatsal28 changed the title Ubuntu 14.04 docker image error: Ubuntu 14.04 docker image error: Failed to init API, possible an invalid tessdata path: /usr/local/share/ Apr 9, 2018
@simonflueckiger
Copy link
Contributor

can you give PyTessBaseAPI(path='/usr/local/share/tessdata/') a try?

@vatsal28
Copy link
Author

Tried that, but that also did not work :/

@simonflueckiger
Copy link
Contributor

Ok, there might be an issue with the handling of the tessdata path if you build tesserocr with a tesseract version past commit dba13db and I'm currently working on a pull request which should fix that. What build of tesseract are you using? Did you build it yourself or did you download the binaries from somewhere?

@vatsal28
Copy link
Author

I've used the following commands to build Tesseract.


git clone https://github.com/tesseract-ocr/tesseract.git
cd tesseract
./autogen.sh
./configure --enable-debug
LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make
sudo make install
sudo ldconfig

For building Tesserocr, I'm using the following commands:

git clone https://github.com/sirfz/tesserocr.git
cd tesserocr
pip install .

So since I'm not specifying any particular build version, it's taking the latest one I guess.
So is this issue happening because of the latest build version?

@simonflueckiger
Copy link
Contributor

Exactly, git clone https://github.com/tesseract-ocr/tesseract.git will pull the most recent commit and this might be the issue here, not entirely sure though.

Can you try the following:

pip uninstall tesserocr
git clone https://github.com/simonflueckiger/tesserocr.git
cd tesserocr
pip install .

now set TESSDATA_PREFIX to '/usr/local/share/tessdata/' and init without the path argument set in PyTessBaseAPI()

@vatsal28
Copy link
Author

Tried this, it's not working :/ . Any other solution?

@simonflueckiger
Copy link
Contributor

simonflueckiger commented Apr 10, 2018

let's chat [tlk.io url expired]

@vatsal28
Copy link
Author

Can we chat now ?

@simonflueckiger
Copy link
Contributor

simonflueckiger commented Apr 11, 2018

so it ended up (most likely) being a corrupted eng.traineddata file which was fixed by redownloading it:

wget -O /usr/local/share/tessdata/eng.traineddata https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata

@sirfz this issue can be closed now

@sirfz sirfz closed this as completed Apr 11, 2018
@vatsal28
Copy link
Author

Thanks a lot!!

flip111 added a commit to flip111/tesserocr that referenced this issue Jul 12, 2018
Add information about tessdata. There are a lot of issues about this and nothing in the readme yet. The information is just what i gathered from these issues and get from my own experience. Related issues:

sirfz#101
sirfz#28
sirfz#60
sirfz#100
sirfz added a commit that referenced this issue Jun 19, 2021
* Update README.rst

Add information about tessdata. There are a lot of issues about this and nothing in the readme yet. The information is just what i gathered from these issues and get from my own experience. Related issues:

#101
#28
#60
#100

Co-authored-by: Fayez <iamfayez@gmail.com>
softdev050 added a commit to softdev050/tesserocr that referenced this issue Apr 5, 2023
* Update README.rst

Add information about tessdata. There are a lot of issues about this and nothing in the readme yet. The information is just what i gathered from these issues and get from my own experience. Related issues:

sirfz/tesserocr#101
sirfz/tesserocr#28
sirfz/tesserocr#60
sirfz/tesserocr#100

Co-authored-by: Fayez <iamfayez@gmail.com>
sayjun0505 added a commit to sayjun0505/tesserocr that referenced this issue Apr 8, 2023
* Update README.rst

Add information about tessdata. There are a lot of issues about this and nothing in the readme yet. The information is just what i gathered from these issues and get from my own experience. Related issues:

sirfz/tesserocr#101
sirfz/tesserocr#28
sirfz/tesserocr#60
sirfz/tesserocr#100

Co-authored-by: Fayez <iamfayez@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants