Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
Source files for some automatically generated parts of the
Latin (lat) training for Tesseract OCR. Specifically, this contains
the Makefile and its prerequisites to build the following files
needed for the lat training:

- training_text.txt
- lat.word.txt
- lat.freq.txt
- lat.unicharambigs
- lat.wordlist


# Dependencies

On a Mac with homebrew, install coreutils and gnu-sed (needed for
gsed, gmktemp, gshuf).

# To build the training parts

Note that the build starts by downloading and unpacking a text
corpus from which to generate the wordlists.

Make all of the parts with the command:
  make

About

Rules and tools to deterministically generate all prerequisites for the final training process. Adapted from https://github.com/ryanfb/ancientgreekocr-grctraining/

Resources

License

Packages

No packages published