Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Rules and tools to deterministically generate all prerequisites for the final training process. Adapted from https://github.com/ryanfb/ancientgreekocr-grctraining/
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Type||Name||Latest commit message||Commit time|
|Failed to load latest commit information.|
Source files for some automatically generated parts of the Latin (lat) training for Tesseract OCR. Specifically, this contains the Makefile and its prerequisites to build the following files needed for the lat training: - training_text.txt - lat.word.txt - lat.freq.txt - lat.unicharambigs - lat.wordlist # Dependencies On a Mac with homebrew, install coreutils and gnu-sed (needed for gsed, gmktemp, gshuf). # To build the training parts Note that the build starts by downloading and unpacking a text corpus from which to generate the wordlists. Make all of the parts with the command: make