Fix word boundaries in ctm dumps of lattices with subsampling #72

SimBe195 · 2024-03-20T14:22:46Z

When performing recognition, the traceback contains word boundaries in the form of integers that represent output time frames of the model in the case of time-synchronous decoding. These word boundaries are then written into the lattice.

Later, when dumping the lattice to a .ctm file, these boundaries are divided by 100 to form start- and end-times as well as durations of words. This assumes that a frame shift on the output time axis is 1/100 seconds (i.e. 10ms) which is wrong when subsampling is performed in the model (or if one was to use feature extraction with a different frame shift). This leads to wrong word boundaries in the .ctm file.

This PR replaces the / 100.0 by a multiplication with a configurable frame shift.

Simon Berger added 2 commits March 20, 2024 15:07

Option to configure frame shift in traceback ctm dump

e9bd0ac

Remove leftover cmake definition

907107c

SimBe195 requested review from curufinwe, mmz33 and AtanasGruev March 20, 2024 14:23

curufinwe approved these changes Mar 22, 2024

View reviewed changes

SimBe195 merged commit a942e39 into master Mar 27, 2024

SimBe195 deleted the subsampled_wordboundaries_ctm_fix branch March 27, 2024 10:13

SimBe195 added a commit that referenced this pull request Jul 31, 2024

Fix word boundaries in ctm dumps of lattices with subsampling (#72)

c8efd80

Marvin84 pushed a commit that referenced this pull request Oct 9, 2024

Fix word boundaries in ctm dumps of lattices with subsampling (#72)

e4e0bba

Marvin84 pushed a commit that referenced this pull request Oct 10, 2024

Fix word boundaries in ctm dumps of lattices with subsampling (#72)

f76977c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix word boundaries in ctm dumps of lattices with subsampling #72

Fix word boundaries in ctm dumps of lattices with subsampling #72

SimBe195 commented Mar 20, 2024

Fix word boundaries in ctm dumps of lattices with subsampling #72

Fix word boundaries in ctm dumps of lattices with subsampling #72

Conversation

SimBe195 commented Mar 20, 2024