Skip to content
master
Switch branches/tags
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.

README.md

A collection of page scans and corresponding text files of Latin.

These files are designed for use in testing OCR quality, using the tools from https://gitorious.org/ancient-greek-training-for-tesseract/ocr-evaluation-tools, in particular the tessaccsummary script.

The naming of the files is quite straightforward:

  • <name>.png - the page scan
  • <name>.txt - the correct UTF-8 encoded text corresponding to the page scan
  • <name>.src - a text file describing the provenance of the page scan

About

Latin page scans and ground truth text for testing OCR accuracy.

Resources

Releases

No releases published

Packages

No packages published