Skip to content

How to use ocrevalUAtion

Mike Gerber edited this page Jul 4, 2019 · 14 revisions

Once you have installed a version of ocrevaluation.jar, you can run it as follows:

java -cp ocrevaluation.jar eu.digitisation.Main \
    -gt {ground_truth_file} [{encoding}] \
    -ocr {ocr_file} [encoding] \
    -d {output_directory} [-r {equivalences_file}] 

Where:

  • {ground_truth_file} = the full path to a ground truth file. Supported formats: Text, PAGE.

  • {ocr_file} = the full path to an OCR result file. Supported formats: Text, PAGE XML, FineReader10 XML, hOCR HTML

  • {output_directory} = the folder where the report (HTML format) will be generated.

  • {encoding} = the preceding file encoding type (optional).

  • {equivalences_file} = an optional text file describing equivalences between Unicode characters (two sequences, separated by a comma, of hexadecimal code points per line).

Example:

java -cp ocrevaluation.jar eu.digitisation.Main \
    -gt groundtruth.xml -ocr ocr.txt utf8 \
    -d output -r equivalences.csv
Clone this wiki locally