-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training kraken and RTL support? #36
Comments
Since @mittagessen is using kraken for Arabic*, I believe it supports RTL out-of-the-box, thus OCR and training for a RTL script like Hebrew are basically the same as for LTR scripts. |
This would indeed be fantastic and the published results are promising (@amitdo thanks for the link). I will have a look at |
A list of Hebrew fonts from the Open Siddur Project |
Thanks again @amitdo ! Regarding the training process: Did anyone stumble upon the following error?
|
There is BiDi (and subsequently RTL) support for training and recognition; the label sequence always gets trained/recognized left-to-right and is then reordered using the (reverse) Unicode BiDi algorithm. There's also top-to-bottom support since a few days ago. The training subcommand is only partially working because the CLSTM python bindings are incomplete and won't allow creating a model from scratch. On the other hand it is possible the train a BiDi model with the clstmocrtrain tool by reordering the transcription to display order and just using it with kraken. To make this a little bit easier I just added a reordering switch to the linegen subcommand. ketos train on the other hand just assumes everything is in correct order and will automatically convert to display order internally. PS: display order is the code point ordering where the n-th code point is the n-th leftmost grapheme. For pure RTL texts it is just the reverse of the correct order, for mixed texts there are interleaved RTL and LTR sections. |
My answer was somewhat misleading. Sorry about that. |
Recognition is exactly the same, training will be eventually. If you just create the model file using the clstm tools directly and train solely with |
@wrznr: There was a bug that should be fixed now. Apparently, it is a good idea to not just write code but even commit and push it sometimes. |
What's the reason for using clstm for training instead of doing it in python llke ocropy does? |
The main reason is speed, flexibility, and potentially GPU acceleration. Recognition and training is around twice as fast, models are smaller/serialize faster, and CLSTM throws away peephole connection which are just unneeded parameter ballast. Real flexibility in net architectures would of course require a generic machine learning framework like tensorflow but those have a bunch of drawbacks like exorbitant model compile times and unusable serializations. |
Let's see if I got that correctly: 1. I initialize a model file using CLSTM. 2. I run |
Yes, that is basically it. The only part that isn't possible through the python bindings is initializing the codec. |
|
@mittagessen Is it necessary that I use exactly the same files (resp. the same "alphabet") for steps 1 (i.e. initialization) and 2 (i.e. ketos training)? The term "codec" somehow suggests it. |
Yes, that is the whole point. |
@mittagessen I have seen the training documentation for Kraken and it's still not clear to me. |
@mittagessen your replies regarding the training process are so confusing to me, please provide a video to train a new model and to use the Kraken training interface. |
@amitdo commented here on the specific RTL support in kraken. Since I am unsucessfully training OCR models for Hebrew with ocropy, I wonder if kraken could do the job. Can anyone introduce me to the details of kraken's RLT support? I could not find the related information in the documentation. Many thanks in advance!
The text was updated successfully, but these errors were encountered: