Tesseract ocr training data for Danish written in fraktur script and a few other languages
Shell
Latest commit aba6e6b Aug 28, 2014 @paalberti Merge pull request #2 from zdenop/master
rename lang name to tesseract traindata pattern

README

Various training data files for Tesseract OCR (version 3.02)

* dan_frak/: Danish written in fraktur script (orthography prior to ca. 1867).
* deu_frak/: German written in fraktur script.
* swe_frak/: Swedish written in fraktur script. The wordlists for Swedish are from 
Projekt Runeberg, http://runeberg.org/words/
* dan/: slightly manipulated version of the Danish .traineddata shipped with upstream tesseract
to not output annoying fi- and fl-ligatures all the time. Since tesseract version 3.02, this 
has become outdated. If you have this problem, it is a better solution to upgrade tesseract.

The *_frak/ directories have a primitive script to compile the data files that only works on
unix-like machines. If you aren't interested in working on training tesseract yourself, just
find the *.traineddata that is relevant for your language, save it to your tesseract
installation's data directory and you should be ready for ocr.