You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sorry I am new to docker. I just pull the latest, and want to use language chi_sim in tesseract, but it seems this language support is not installed by default, as it complains:
~/work/tmp$ docker run -v "$(pwd):/home/docker" ocrmypdf 31.pdf 31-ocr.pdf -l chi_sim
The installed version of tesseract does not have language data for the following requested languages:
chi_sim
It seems the tesseract used by the docker image is different from the system's tesseract-ocr package, with which I installed the language package by "apt-get install tesseract-ocr-chi-sim".
How to update the docker image for including the desired language support? And how to check which languages are supported (like "tesseract --list-langs" in the system)?
Thanks a lot.
The text was updated successfully, but these errors were encountered:
I'll add more languages next time I update ocrmypdf.
The Dockerfile specifies how the container was built. It provides its own copy of tesseract and will not use the one on your machine, or anything else about your machine. It's like a lightweight virtual machine.
You can jump inside an ocrmypdf container, modify it, and save the changes as your own private image. (A container is an instance of image.)
In your case it would go something like this (not tested, made up on the spot):
$ docker run -t -i ocrmypdf /bin/bash
root@container:/# apt-get install tesseract-ocr-chi-sim
root@container:/# exit
$ docker commit -m "Added Chinese simplified" -a "Your Name"
Sorry I am new to docker. I just pull the latest, and want to use language chi_sim in tesseract, but it seems this language support is not installed by default, as it complains:
~/work/tmp$ docker run -v "$(pwd):/home/docker" ocrmypdf 31.pdf 31-ocr.pdf -l chi_sim
The installed version of tesseract does not have language data for the following requested languages:
chi_sim
It seems the tesseract used by the docker image is different from the system's tesseract-ocr package, with which I installed the language package by "apt-get install tesseract-ocr-chi-sim".
How to update the docker image for including the desired language support? And how to check which languages are supported (like "tesseract --list-langs" in the system)?
Thanks a lot.
The text was updated successfully, but these errors were encountered: