Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug report: Ngram training in SLU #5312

Closed
rxpwang opened this issue Jul 20, 2023 · 3 comments · Fixed by #5364
Closed

Bug report: Ngram training in SLU #5312

rxpwang opened this issue Jul 20, 2023 · 3 comments · Fixed by #5364
Labels
Bug bug should be fixed SLU Spoken language understanding

Comments

@rxpwang
Copy link

rxpwang commented Jul 20, 2023

Describe the bug
ngram training stage in espnet/egs2/TEMPLATE/slu1/slu.sh

Current command in the script:
cut -f 2 -d " " ${data_feats}/lm_train.txt | lmplz -S "20%" --discount_fallback -o ${ngram_num} - >${ngram_exp}/${ngram_num}gram.arpa

This only take the first token in the transcripts for ngram training

Seems like should be following to take all the tokens in transcripts
cut -f 2- -d " " ${data_feats}/lm_train.txt | lmplz -S "20%" --discount_fallback -o ${ngram_num} - >${ngram_exp}/${ngram_num}gram.arpa

@rxpwang rxpwang added the Bug bug should be fixed label Jul 20, 2023
@sw005320 sw005320 added the SLU Spoken language understanding label Jul 20, 2023
@sw005320
Copy link
Contributor

Thanks for pointing it out, @rxpwang.
This is critical.
@siddhu001, can you check it?

@siddhu001
Copy link
Collaborator

@rxpwang Thank you for bringing this to my attention. I am currently reviewing the issue and will open a PR if necessary to address it as soon as possible.

@siddhu001
Copy link
Collaborator

Thanks @rxpwang for reporting this bug. I have created a PR https://github.com/espnet/espnet/pull/5364/files to fix this.

@sw005320 sw005320 linked a pull request Jul 23, 2023 that will close this issue
@mergify mergify bot closed this as completed in #5364 Jul 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug bug should be fixed SLU Spoken language understanding
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants