Skip to content

ryanfb/latinocr-lattestfodder

Repository files navigation

A collection of page scans and corresponding text files of Latin.

These files are designed for use in testing OCR quality, using the tools from https://gitorious.org/ancient-greek-training-for-tesseract/ocr-evaluation-tools, in particular the tessaccsummary script.

The naming of the files is quite straightforward:

  • <name>.png - the page scan
  • <name>.txt - the correct UTF-8 encoded text corresponding to the page scan
  • <name>.src - a text file describing the provenance of the page scan

About

Latin page scans and ground truth text for testing OCR accuracy.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published