-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for OCR'ing more languages #942
Comments
Hi @wallace11 , For adding more lanugages, I could indeed write a little guide. The only difficulty is, that I try to recognize dates by using date format patterns of the specific locale for the language. If you can give me the lanugage(s) that you miss and the date patterns (like in #679) I could add it with only little work. |
What about the idea to install additional languages on demand? This might reduce the basic image size dramatically (probably 700-800MB less) |
That would be really nice - I think I don't get how it would work. Do you mean that when the container starts it installs additional packages and then starts the app? It would be really nice to reduce the image size! OTOH I myself have plenty space 😃 … and so wouldn't spent much time on it. The downsides are that users could select languages in the UI that are not supported. The languages are currently hard-coded. But a subset of those could be put into the config file so that the ui hides non-supported ones. Then it would still be possible to mess it up, but I think that's ok. But then you need to configure more stuff in the docker-compose.yml. |
Added a little guide to https://docspell.org/docs/dev/add-language/ (will be published with 0.36.0) |
Hi there,
I see that even though tesseract supports over 100 languages already, only a handful are available in docspell.
I was wondering if it was possible to add more languages to the OCR.
It looks like most of the job needs to be done on configuring the language objects in https://github.com/eikek/docspell/blob/master/modules/common/src/main/scala/docspell/common/Language.scala and then the UI needs to be adjusted accordingly.
If it's a more complicated task to add a new language, then would it be possible to add a contribution guide and the community will take care of it via PRs?
Cheers.
The text was updated successfully, but these errors were encountered: