A collection of page scans and corresponding text files of Latin.
These files are designed for use in testing OCR quality, using the tools from https://gitorious.org/ancient-greek-training-for-tesseract/ocr-evaluation-tools, in particular the
The naming of the files is quite straightforward:
<name>.png- the page scan
<name>.txt- the correct UTF-8 encoded text corresponding to the page scan
<name>.src- a text file describing the provenance of the page scan