Skip to content
Rules and tools to deterministically generate all prerequisites for the final training process. Adapted from https://github.com/ryanfb/ancientgreekocr-grctraining/
Makefile
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
tools
LICENSE
Makefile
Makefile.pleiades
README
allchars.txt
charsforambigs.txt
lat.pleiades.word.txt
seed

README

Source files for some automatically generated parts of the
Latin (lat) training for Tesseract OCR. Specifically, this contains
the Makefile and its prerequisites to build the following files
needed for the lat training:

- training_text.txt
- lat.word.txt
- lat.freq.txt
- lat.unicharambigs
- lat.wordlist


# Dependencies

On a Mac with homebrew, install coreutils and gnu-sed (needed for
gsed, gmktemp, gshuf).

# To build the training parts

Note that the build starts by downloading and unpacking a text
corpus from which to generate the wordlists.

Make all of the parts with the command:
  make
You can’t perform that action at this time.