Use a modified ctc_topo. #209

csukuangfj · 2021-06-08T08:02:05Z

This implements the topo mentioned in
k2-fsa/k2#746 (comment)

Some examples (assuming there are three phones: a, b, c)

CAUTION: There are no mandatory blanks between the two consecutive symbols aa in aabc.

csukuangfj · 2021-06-08T08:07:09Z

The following shows the same example using the existing ctc_topo:

danpovey · 2021-06-08T08:17:14Z

Wow that was fast!
Merge when you think it makes sense, looks good to me!
(BTW at some point we should change the args of those function to be just an integer saying the number of phones; the list input is not good because we require the list to be contiguous.)

pzelasko · 2021-06-08T11:21:22Z

Maybe I’m missing sth but if we allow no blank between repeated phones, then isn’t the blank redundant? Can we simply use a 1 state pure self loop phone topo (+ final state) instead with the same result?

danpovey · 2021-06-08T14:52:44Z

I think empirically the shared blank helps. If the nnet doesn't want to use it, it can just make it very improbable. (in LF-MMI).

…

On Tue, Jun 8, 2021 at 7:21 PM Piotr Żelasko ***@***.***> wrote: Maybe I’m missing sth but if we allow no blank between repeated phones, then isn’t the blank redundant? Can we simply use a 1 state pure self loop phone topo (+ final state) instead with the same result? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#209 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO2KQZ67FQYFCLWE2CLTRX4MDANCNFSM46JKH3BQ> .

xiaohui-zhang · 2021-06-10T04:39:35Z

Nice to see this non-standard topo implemented finally (as we discussed before @danpovey , Fig 1a in http://oa.ee.tsinghua.edu.cn/~ouzhijian/pdf/ctc-crf.pdf). This doesn't matter much during training (because even with the standard topo, we can use rule-based numerator FST construction rather than composing topo FST with tokenized transcripts, to lower the computation cost of numerator construction). But this significantly improves decoding speed (much smaller HLG). The WER degradation is minimal. The main issue is that words like "met" and "meet" will be more confusable during both training and decoding. @pzelasko Yeah having a shared blank is important regarding training performance, especially when we use specAug (we can achieve similar effects by allowing skippable silence phones within a word, which is hacky for HMM). But silence/HMM still has its advantage when we need a model to produce accurate alignments/decoding time-stamps.

pzelasko · 2021-06-10T11:25:31Z

interesting, thanks for the explanation.

Use a modified ctc_topo.

4b6c4b3

Apply arc_sort to ctc_topo.

1fd3d43

csukuangfj merged commit bce7330 into k2-fsa:master Jun 8, 2021

csukuangfj deleted the ctc_topo branch June 8, 2021 08:22

yuekaizhang mentioned this pull request Jun 28, 2021

CTC training speed question #220

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a modified ctc_topo. #209

Use a modified ctc_topo. #209

csukuangfj commented Jun 8, 2021 •

edited

csukuangfj commented Jun 8, 2021

danpovey commented Jun 8, 2021

pzelasko commented Jun 8, 2021

danpovey commented Jun 8, 2021 via email

xiaohui-zhang commented Jun 10, 2021

pzelasko commented Jun 10, 2021

Use a modified ctc_topo. #209

Use a modified ctc_topo. #209

Conversation

csukuangfj commented Jun 8, 2021 • edited

csukuangfj commented Jun 8, 2021

danpovey commented Jun 8, 2021

pzelasko commented Jun 8, 2021

danpovey commented Jun 8, 2021 via email

xiaohui-zhang commented Jun 10, 2021

pzelasko commented Jun 10, 2021

csukuangfj commented Jun 8, 2021 •

edited