A collection of page scans and corresponding text files of Latin.
These files are designed for use in testing OCR quality, using the tools from https://gitorious.org/ancient-greek-training-for-tesseract/ocr-evaluation-tools, in particular the tessaccsummary
script.
The naming of the files is quite straightforward:
<name>.png
- the page scan<name>.txt
- the correct UTF-8 encoded text corresponding to the page scan<name>.src
- a text file describing the provenance of the page scan