-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nnet3 online decoder endpointing doesn't use frame subsampling rate #1184
Comments
Yes, same for the timings (e.g. word start times); they all need to be adjusted by frame subsampling factor. I "solved" this at the higher level (outside Kaldi); didn't see an easy way to access this config param as is. |
Tanel, do you have time to work on a fix for the endpointing issue? It Ognjen: regarding things like word start times, I'm not quite sure what On Sat, Nov 12, 2016 at 11:11 AM, ognjentodic notifications@github.com
|
Yes, I can work on this. Ognjen probably refers to kaldi-gstreamer-server that can output timing information. I'll fix this too. |
I was actually referring to methods that return timing information via number of frames (for example, CompactLatticeToWordAlignment); but, those "issues" are not of the same nature as the endpointing thresholds issue since the latter are specified in seconds (vs frames), so my comment is really a false alarm for this issue. Perhaps just an explanation/note in various places in a method description on what the frame (rate) really means would have been useful. (and it's quite possible that's already nicely described somewhere, but I missed it) |
I happen to use quite often the get_ctm.sh script and a modified version with lattice-to-ctm-conf. I didn't know about this --frame-shift parameter, so I still have default 0.01 with my chain models, but why don't I get garbage then ? what is the exact impact in lattice-to-ctm-conf for instance ? |
The only impact is that the times in the ctm would be wrong. This might On Sun, Nov 13, 2016 at 2:37 PM, vince62s notifications@github.com wrote:
|
Fix nnet3 endpointing to correctly use frame subsampling factor (#1184)
I believe the endpointing code, originally developed for nnet2, doesn't take into account the frame subsampling rate used in chain models. Thus, when using chain models, the silence needs to be 3x as long as it should to be identified as an endpoint.
The text was updated successfully, but these errors were encountered: