Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
api
 
 
 
 
src
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

ocrevalUAtion Build Status

This set of classes provides basic support to perform the comparison of two text files: a reference file (a ground-truth document) and a the output from an OCR engine (a text file).

Options for specific behavior include: ignore case, ignore diacritics, ignore punctuation, ignore stop-words, Unicode and user-defined equivalences between characters.

It can be used with the graphic user interface (GUI) provided, in addition to command line interface usage.

Supported input formats include: plain text, FineReader 10 XML, PAGE XML, ALTO XML and hOCR HTML.

The output generates a report with statistics (including CER and WER error rates) and a table with the parallell input texts where the differences are highlighted.

A gentle introduction to OCR evaluation and to this tool can be found at https://sites.google.com/site/textdigitisation/

You can download the latest release from here.

Instructions on how to use ocrevalUAtion can be found in the wiki.

About

OCR evaluation brought to you by University of Alicante

Resources

License

Packages

No packages published

Languages

You can’t perform that action at this time.