Skip to content


Repository files navigation

Latin OCR Training for Tesseract

Produces: lat.traineddata

You need wget, unzip and the Tesseract training tools to make this

The following files have been automatically generated using the
tools in the lattraining git repository located at

- training_text.txt
- lat.word.txt
- lat.freq.txt
- lat.unicharambigs

You can see the exact process for generating them in the lattraining

The Latin.unicharset file has been copied from Tesseract's
tesseract-ocr.langdata git repository.