WordSequenceAligner is a Java class that aligns two string sequences
and calculates metrics such as word error rate (WER). Pretty-printing
enables human-readable logging of alignments and metrics.
This class is intended to reproduce the main functionality of the
NIST sclite tool. The Sphinx 4 source for the class
edu.cmu.sphinx.util.NISTAlign was referenced when writing the
Feedback and bugfixes are welcomed.
This code is licensed under one of the BSD variants, please see
LICENSE.txt for full details.
WordSequenceAligner werEval = new WordSequenceAligner(); String  ref = "the quick brown cow jumped over the moon".split(" "); String  hyp = "quick brown cows jumped way over the moon dude".split(" "); Alignment a = werEval.align(ref, hyp); System.out.println(a);
Produces the output:
# seq # ref # hyp # cor # sub # ins # del acc WER # seq cor STATS: 1 8 9 6 1 2 1 0.75 0.5 0 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- REF: THE quick brown COW jumped *** over the moon **** HYP: *** quick brown COWS jumped WAY over the moon DUDE
Where the top portion of the output are the statistics for the given
pair of reference/hypothesis sentences, and the lower portion
displays the alignment, visually.