Skip to content
This repository has been archived by the owner on Oct 13, 2022. It is now read-only.

Use a modified ctc_topo. #209

Merged
merged 2 commits into from Jun 8, 2021
Merged

Use a modified ctc_topo. #209

merged 2 commits into from Jun 8, 2021

Conversation

csukuangfj
Copy link
Collaborator

@csukuangfj csukuangfj commented Jun 8, 2021

This implements the topo mentioned in
k2-fsa/k2#746 (comment)

Some examples (assuming there are three phones: a, b, c)

Screen Shot 2021-06-08 at 4 16 26 PM

Screen Shot 2021-06-08 at 3 58 19 PM

Screen Shot 2021-06-08 at 3 58 25 PM

Screen Shot 2021-06-08 at 3 58 30 PM

Screen Shot 2021-06-08 at 3 58 35 PM

CAUTION: There are no mandatory blanks between the two consecutive symbols aa in aabc.

@csukuangfj
Copy link
Collaborator Author

The following shows the same example using the existing ctc_topo:

Screen Shot 2021-06-08 at 4 05 22 PM

Screen Shot 2021-06-08 at 4 05 28 PM

Screen Shot 2021-06-08 at 4 05 33 PM

Screen Shot 2021-06-08 at 4 05 38 PM

Screen Shot 2021-06-08 at 4 05 44 PM

@danpovey
Copy link
Contributor

danpovey commented Jun 8, 2021

Wow that was fast!
Merge when you think it makes sense, looks good to me!
(BTW at some point we should change the args of those function to be just an integer saying the number of phones; the list input is not good because we require the list to be contiguous.)

@csukuangfj csukuangfj merged commit bce7330 into k2-fsa:master Jun 8, 2021
@csukuangfj csukuangfj deleted the ctc_topo branch June 8, 2021 08:22
@pzelasko
Copy link
Collaborator

pzelasko commented Jun 8, 2021

Maybe I’m missing sth but if we allow no blank between repeated phones, then isn’t the blank redundant? Can we simply use a 1 state pure self loop phone topo (+ final state) instead with the same result?

@danpovey
Copy link
Contributor

danpovey commented Jun 8, 2021 via email

@xiaohui-zhang
Copy link

Nice to see this non-standard topo implemented finally (as we discussed before @danpovey , Fig 1a in http://oa.ee.tsinghua.edu.cn/~ouzhijian/pdf/ctc-crf.pdf). This doesn't matter much during training (because even with the standard topo, we can use rule-based numerator FST construction rather than composing topo FST with tokenized transcripts, to lower the computation cost of numerator construction). But this significantly improves decoding speed (much smaller HLG). The WER degradation is minimal. The main issue is that words like "met" and "meet" will be more confusable during both training and decoding. @pzelasko Yeah having a shared blank is important regarding training performance, especially when we use specAug (we can achieve similar effects by allowing skippable silence phones within a word, which is hacky for HMM). But silence/HMM still has its advantage when we need a model to produce accurate alignments/decoding time-stamps.

@pzelasko
Copy link
Collaborator

interesting, thanks for the explanation.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants