Bug report: Ngram training in SLU #5312

rxpwang · 2023-07-20T14:59:39Z

Describe the bug
ngram training stage in espnet/egs2/TEMPLATE/slu1/slu.sh

Current command in the script:
cut -f 2 -d " " ${data_feats}/lm_train.txt | lmplz -S "20%" --discount_fallback -o ${ngram_num} - >${ngram_exp}/${ngram_num}gram.arpa

This only take the first token in the transcripts for ngram training

Seems like should be following to take all the tokens in transcripts
cut -f 2- -d " " ${data_feats}/lm_train.txt | lmplz -S "20%" --discount_fallback -o ${ngram_num} - >${ngram_exp}/${ngram_num}gram.arpa

The text was updated successfully, but these errors were encountered:

sw005320 · 2023-07-20T15:01:48Z

Thanks for pointing it out, @rxpwang.
This is critical.
@siddhu001, can you check it?

siddhu001 · 2023-07-21T00:57:33Z

@rxpwang Thank you for bringing this to my attention. I am currently reviewing the issue and will open a PR if necessary to address it as soon as possible.

siddhu001 · 2023-07-23T20:26:29Z

Thanks @rxpwang for reporting this bug. I have created a PR https://github.com/espnet/espnet/pull/5364/files to fix this.

rxpwang added the Bug bug should be fixed label Jul 20, 2023

sw005320 added the SLU Spoken language understanding label Jul 20, 2023

siddhu001 mentioned this issue Jul 23, 2023

Fix bug in ngram training in slu.sh #5364

Merged

sw005320 linked a pull request Jul 23, 2023 that will close this issue

Fix bug in ngram training in slu.sh #5364

Merged

mergify bot closed this as completed in #5364 Jul 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug report: Ngram training in SLU #5312

Bug report: Ngram training in SLU #5312

rxpwang commented Jul 20, 2023

sw005320 commented Jul 20, 2023

siddhu001 commented Jul 21, 2023

siddhu001 commented Jul 23, 2023

Bug report: Ngram training in SLU #5312

Bug report: Ngram training in SLU #5312

Comments

rxpwang commented Jul 20, 2023

sw005320 commented Jul 20, 2023

siddhu001 commented Jul 21, 2023

siddhu001 commented Jul 23, 2023