-
Notifications
You must be signed in to change notification settings - Fork 9
Tesseract ocr training data for Danish written in fraktur script and a few other languages
License
paalberti/tesseract-dan-fraktur
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Various training data files for Tesseract OCR (version 3.02) * dan_frak/: Danish written in fraktur script (orthography prior to ca. 1867). * deu_frak/: German written in fraktur script. * swe_frak/: Swedish written in fraktur script. The wordlists for Swedish are from Projekt Runeberg, http://runeberg.org/words/ * dan/: slightly manipulated version of the Danish .traineddata shipped with upstream tesseract to not output annoying fi- and fl-ligatures all the time. Since tesseract version 3.02, this has become outdated. If you have this problem, it is a better solution to upgrade tesseract. The *_frak/ directories have a primitive script to compile the data files that only works on unix-like machines. If you aren't interested in working on training tesseract yourself, just find the *.traineddata that is relevant for your language, save it to your tesseract installation's data directory and you should be ready for ocr.
About
Tesseract ocr training data for Danish written in fraktur script and a few other languages
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published