GitHub - ryanfb/latinocr-lat: 'lat' repository, forked from https://github.com/ryanfb/ancientgreekocr-grc. The final training process for lat.traineddata

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
latinocr-lattraining @ 24a812b		latinocr-lattraining @ 24a812b
tools		tools
unicharambigs		unicharambigs
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
Makefile		Makefile
README		README
font_properties		font_properties
lat.config		lat.config
lat.numbers.txt		lat.numbers.txt
lat.punc.txt		lat.punc.txt
training_text.txt		training_text.txt

Repository files navigation

Latin OCR Training for Tesseract
================================

Produces: lat.traineddata

You need wget, unzip and the Tesseract training tools to make this
training.

The following files have been automatically generated using the
tools in the lattraining git repository located at
  https://github.com/ryanfb/latinocr-lattraining

- training_text.txt
- lat.word.txt
- lat.freq.txt
- lat.unicharambigs

You can see the exact process for generating them in the lattraining
Makefile.

The Latin.unicharset file has been copied from Tesseract's
tesseract-ocr.langdata git repository.