Nnet3 online decoder endpointing doesn't use frame subsampling rate #1184

alumae · 2016-11-12T12:13:55Z

I believe the endpointing code, originally developed for nnet2, doesn't take into account the frame subsampling rate used in chain models. Thus, when using chain models, the silence needs to be 3x as long as it should to be identified as an endpoint.

ognjentodic · 2016-11-12T16:11:48Z

Yes, same for the timings (e.g. word start times); they all need to be adjusted by frame subsampling factor. I "solved" this at the higher level (outside Kaldi); didn't see an easy way to access this config param as is.

danpovey · 2016-11-12T19:20:28Z

Tanel, do you have time to work on a fix for the endpointing issue? It
does seem closer to a bug than a mis-feature, because those times are
expressed in seconds, not in frames.

Ognjen: regarding things like word start times, I'm not quite sure what
tools you are referring to, but things like lattice-to-ctm-conf take a
--frame-shift parameter that should be set to 0.03 for chain systems.

On Sat, Nov 12, 2016 at 11:11 AM, ognjentodic notifications@github.com
wrote:

Yes, same for the timings (e.g. word start times); they all need to be
adjusted by frame subsampling factor. I "solved" this at the higher level
(outside Kaldi); didn't see an easy way to access this config param as is.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1184 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVuw_J55Kci-YcaKCMzSMipoU4qfWTks5q9eVFgaJpZM4KwZzn
.

alumae · 2016-11-13T08:35:02Z

Yes, I can work on this.

Ognjen probably refers to kaldi-gstreamer-server that can output timing information. I'll fix this too.

ognjentodic · 2016-11-13T17:45:55Z

I was actually referring to methods that return timing information via number of frames (for example, CompactLatticeToWordAlignment); but, those "issues" are not of the same nature as the endpointing thresholds issue since the latter are specified in seconds (vs frames), so my comment is really a false alarm for this issue.

Perhaps just an explanation/note in various places in a method description on what the frame (rate) really means would have been useful. (and it's quite possible that's already nicely described somewhere, but I missed it)

vince62s · 2016-11-13T19:37:44Z

I happen to use quite often the get_ctm.sh script and a modified version with lattice-to-ctm-conf.
@alumae I also use a modified version of kaldi-offline-transcriber which calls get_ctm.sh (just a reminder)

I didn't know about this --frame-shift parameter, so I still have default 0.01 with my chain models, but why don't I get garbage then ? what is the exact impact in lattice-to-ctm-conf for instance ?

danpovey · 2016-11-13T19:41:14Z

The only impact is that the times in the ctm would be wrong. This might
affect some NIST scoring scripts, for instance.

On Sun, Nov 13, 2016 at 2:37 PM, vince62s notifications@github.com wrote:

I happen to use quite often the get_ctm.sh script and a modified version
with lattice-to-ctm-conf.
@alumae https://github.com/alumae I also use a modified version of
kaldi-offline-transcriber which calls get_ctm.sh (just a reminder)

I didn't know about this --frame-shift parameter, so I still have default
0.01 with my chain models, but why don't I get garbage then ? what is the
exact impact in lattice-to-ctm-conf for instance ?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1184 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVuwBpqDjMihNgvJnit0-sVCekasSPks5q92cKgaJpZM4KwZzn
.

…i-asr#1184)

Fix nnet3 endpointing to correctly use frame subsampling factor (#1184)

alumae pushed a commit to alumae/kaldi that referenced this issue Nov 15, 2016

Fix nnet3 endpointing to correctly use frame subsampling factor (kald…

465b73c

…i-asr#1184)

alumae mentioned this issue Nov 15, 2016

Fix nnet3 endpointing to correctly use frame subsampling factor (#1184) #1196

Merged

danpovey added a commit that referenced this issue Nov 16, 2016

Merge pull request #1196 from alumae/chain-online-endpointing-fix

f66e83e

Fix nnet3 endpointing to correctly use frame subsampling factor (#1184)

alumae closed this as completed Nov 17, 2016

russlevy mentioned this issue Dec 18, 2016

segment-length and word timings off in nnet3 alumae/kaldi-gstreamer-server#62

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nnet3 online decoder endpointing doesn't use frame subsampling rate #1184

Nnet3 online decoder endpointing doesn't use frame subsampling rate #1184

alumae commented Nov 12, 2016

ognjentodic commented Nov 12, 2016

danpovey commented Nov 12, 2016

alumae commented Nov 13, 2016

ognjentodic commented Nov 13, 2016

vince62s commented Nov 13, 2016

danpovey commented Nov 13, 2016

Nnet3 online decoder endpointing doesn't use frame subsampling rate #1184

Nnet3 online decoder endpointing doesn't use frame subsampling rate #1184

Comments

alumae commented Nov 12, 2016

ognjentodic commented Nov 12, 2016

danpovey commented Nov 12, 2016

alumae commented Nov 13, 2016

ognjentodic commented Nov 13, 2016

vince62s commented Nov 13, 2016

danpovey commented Nov 13, 2016