Skip to content

Tesseract ocr training data for Danish written in fraktur script and a few other languages

License

Notifications You must be signed in to change notification settings

paalberti/tesseract-dan-fraktur

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Various training data files for Tesseract OCR (version 3.02)

* dan_frak/: Danish written in fraktur script (orthography prior to ca. 1867).
* deu_frak/: German written in fraktur script.
* swe_frak/: Swedish written in fraktur script. The wordlists for Swedish are from 
Projekt Runeberg, http://runeberg.org/words/
* dan/: slightly manipulated version of the Danish .traineddata shipped with upstream tesseract
to not output annoying fi- and fl-ligatures all the time. Since tesseract version 3.02, this 
has become outdated. If you have this problem, it is a better solution to upgrade tesseract.

The *_frak/ directories have a primitive script to compile the data files that only works on
unix-like machines. If you aren't interested in working on training tesseract yourself, just
find the *.traineddata that is relevant for your language, save it to your tesseract
installation's data directory and you should be ready for ocr.

About

Tesseract ocr training data for Danish written in fraktur script and a few other languages

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages