-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Language model at word level #3
Comments
|
I have done 1.A together with long words penalty, but there is no word bi-gram level prior knowledge to this method which means it is only an autocorrect. Example: "milk the cous" are all words in the dictionary but it does not make sense, whereas the true label we want is "milk the cows". For 2, I have tried giving bi-gram scores whenever I see a space label, but then it will push the beam out of beam width and what I get is a long single word a lot of time. Currently I am reading on WFSTpdf and trying to implement a CTC decoder using WSFT so that I can include bi-gram word level, have you tried these methods? |
no, I haven't tried WFST yet. |
I've implemented an algorithm which uses beam search on word-level (dictionary, unigrams/bigrams) and which runs faster than token passing: https://github.com/githubharald/CTCWordBeamSearch |
Hi, did you add word level language model for beam search?
Currently its easy to add character level bi-gram, but I find it much harder to add word level. I tried CTC token passing algorithm but its just way too slow comparing beam search.
The text was updated successfully, but these errors were encountered: