Implement Ngram scorer #1946

qmpzzpmq · 2020-05-20T10:40:23Z

hi,

Here is Haoyu Tang from BIGO speech. I implement ngram with kenlm, and test it in aishell test since it works badly in BPE.

It could improve only e2e model, but still not as good as RNNLM. it might since I didn't tuning the decoding parameters.

mergify · 2020-05-20T10:41:02Z

This pull request is now in conflict :(

espnet/nets/pytorch_backend/lm/ngram.py

egs/aishell/asr1/path.sh

ShigekiKarita · 2020-05-20T12:40:22Z

egs/aishell/asr1/RESULTS.md

@@ -36,3 +36,21 @@ exp/train_sp_pytorch_no_patience/decode_test_beam20_emodel.acc.best_p0.0_len0.0-
 |    SPKR       |     # Snt         # Wrd     |     Corr           Sub            Del           Ins            Err         S.Err     |
 |    Sum/Avg    |     7176         104765     |     92.2           7.6            0.2           0.2            8.0          50.2     |
 ```
+
+# Ngram related 
+   - there is no RNN not ngram 


This result looks promising. Have you ever tried ngram-RNNLM interpolation (joint decoding)?

That's my recommendation, too, and @qmpzzpmq is trying now.

I tried, but there is no result better than currently one.
I guess it takes time to tuning the decoding weight parameters.

qmpzzpmq · 2020-05-21T01:41:44Z

@ShigekiKarita
Hi, I am not I toally understand your means about test_ngram.py, but I still tried my best to finised to do it. This program to check is the score same with I got

test/test_ngram.py

tools/Makefile

test/test_ngram.py

espnet/asr/pytorch_backend/recog.py

qmpzzpmq · 2020-05-21T12:56:57Z

@ShigekiKarita is everything alright?

tools/Makefile

Co-authored-by: b-flo <41155456+b-flo@users.noreply.github.com>

kamo-naoyuki · 2020-05-25T13:07:44Z

egs/tedlium2/asr1/conf/tuning/decode_ngram.yaml

@@ -0,0 +1,6 @@
+ngram-weight: 1.0
+beam-size: 20


Didn't you modify run.sh for tedliums2?
Is this a garbage file?

kamo-naoyuki · 2020-05-25T13:08:01Z

egs/tedlium2/asr1/path.sh

@@ -2,10 +2,11 @@ MAIN_ROOT=$PWD/../../..
 KALDI_ROOT=$MAIN_ROOT/tools/kaldi



kamo-naoyuki · 2020-05-25T13:12:18Z

@ShigekiKarita What do you think about the path of ngram scorer? I think espnet/nets/scorers is better.

sw005320

LGTM.

egs/aishell/asr1/run.sh

sw005320 · 2020-05-25T13:49:50Z

egs/aishell/asr1/run.sh

@@ -178,6 +182,26 @@ if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
        --dict ${dict}
 fi

+ngramexpname=train_ngram


Can you include this ngram LM training in stage 3?
I want to keep the role of the stage.

sw005320 · 2020-05-25T13:53:37Z

@ShigekiKarita What do you think about the path of ngram scorer? I think espnet/nets/scorers is better.

I agree.

Co-authored-by: Shinji Watanabe <sw005320@gmail.com>

sw005320 · 2020-05-25T14:35:18Z

egs/aishell/asr1/run.sh

@@ -176,8 +187,20 @@ if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
        --valid-label ${lmdatadir}/valid.txt \
        --resume ${lm_resume} \
        --dict ${dict}
+
+    echo "stage 4: Ngram Preparation"


Suggested change

echo "stage 4: Ngram Preparation"

echo "stage 3: Ngram Preparation"

sw005320 · 2020-05-25T14:38:17Z

Very cool!
One more request.
Since this is a very cool feature, please add this n-gram extension in main REAME.md (https://github.com/espnet/espnet#key-features)

qmpzzpmq · 2020-05-25T14:42:06Z

Very cool!
One more request.
Since this is a very cool feature, please add this n-gram extension in main REAME.md (https://github.com/espnet/espnet#key-features)

@sw005320 done

sw005320 · 2020-05-25T14:43:32Z

I think this PR is almost ready. I just want to make sure that @ShigekiKarita would agree with putting this in espnet/nets/scorers.

b-flo · 2020-05-25T14:59:20Z

Great work!
On another note, for your next PR you should try to group your commits a bit more (e.g.: consecutive commits with same target/purpose but small changes in each). I wanted to follow the changes but I had to unsubscribe to the PR because an email is sent for each commit you did (50+ in a few days!)...

qmpzzpmq · 2020-05-25T15:05:04Z

@b-flo
yes, I will. since it is my first time to add a total new feature to espnet. I am not family with your test process, but now I know something about it.

ShigekiKarita

LGTM. Thanks a lot for resolving our many requests!

CharlieTang and others added 7 commits February 23, 2020 15:33

0223

43c26bc

first edition of ngram

d5fb482

ngram bug fix

8d25e68

ngram bug fix

aca479c

ngram bug fix

3fb2b60

RESULT add

4e37eae

RESULT add

eb97466

mergify bot added the conflicts label May 20, 2020

Merge branch 'develop' into ngramdev

75ad12e

mergify bot removed the conflicts label May 20, 2020

consistency

a45ad5d

sw005320 requested a review from ShigekiKarita May 20, 2020 11:13

sw005320 added ASR Automatic speech recogntion New Features labels May 20, 2020

sw005320 added this to the v.0.8.0 milestone May 20, 2020

ShigekiKarita reviewed May 20, 2020

View reviewed changes

espnet/nets/pytorch_backend/lm/ngram.py Outdated Show resolved Hide resolved

ShigekiKarita reviewed May 20, 2020

View reviewed changes

egs/aishell/asr1/path.sh Show resolved Hide resolved

ShigekiKarita reviewed May 20, 2020

View reviewed changes

test and reboust

2952b54

ShigekiKarita reviewed May 21, 2020

View reviewed changes

test/test_ngram.py Outdated Show resolved Hide resolved

ShigekiKarita reviewed May 21, 2020

View reviewed changes

tools/Makefile Outdated Show resolved Hide resolved

ShigekiKarita reviewed May 21, 2020

View reviewed changes

test/test_ngram.py Outdated Show resolved Hide resolved

ShigekiKarita reviewed May 21, 2020

View reviewed changes

espnet/asr/pytorch_backend/recog.py Show resolved Hide resolved

tanghaoyu added 2 commits May 21, 2020 15:39

test and reboust

7beb37b

test and reboust

316c9ae

b-flo reviewed May 22, 2020

View reviewed changes

tools/Makefile Outdated Show resolved Hide resolved

smart make

025d4a1

Co-authored-by: b-flo <41155456+b-flo@users.noreply.github.com>

qmpzzpmq and others added 9 commits May 23, 2020 23:00

path update

2e93121

bug fix

75abbb4

Merge remote-tracking branch 'mygithub/ngramdev' into ngramdev

5f0db8d

install bug fix

dc60733

format

b3ce79d

format

3d991ae

test bug fix

141c585

test bug fix

d3b8ada

Update espnet/nets/pytorch_backend/lm/ngram.py

a8c67fe

kamo-naoyuki self-requested a review May 25, 2020 13:04

kamo-naoyuki reviewed May 25, 2020

View reviewed changes

sw005320 approved these changes May 25, 2020

View reviewed changes

qmpzzpmq and others added 4 commits May 25, 2020 22:20

Update egs/aishell/asr1/run.sh

9ee8a35

Co-authored-by: Shinji Watanabe <sw005320@gmail.com>

Update path.sh

96d5611

Delete decode_ngram.yaml

56f4587

as advise

10d632f

sw005320 reviewed May 25, 2020

View reviewed changes

Update README.md

e6f818c

mergify bot added the README label May 25, 2020

ShigekiKarita approved these changes May 25, 2020

View reviewed changes

sw005320 merged commit 263e3c6 into espnet:develop May 25, 2020

This was referenced May 27, 2020

ngram scorer issues #1974

Closed

Toward fast decoding #1384

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Ngram scorer #1946

Implement Ngram scorer #1946

qmpzzpmq commented May 20, 2020 •

edited

mergify bot commented May 20, 2020

ShigekiKarita May 20, 2020

sw005320 May 20, 2020

qmpzzpmq May 20, 2020

qmpzzpmq commented May 21, 2020

qmpzzpmq commented May 21, 2020

kamo-naoyuki May 25, 2020 •

edited

qmpzzpmq May 25, 2020

kamo-naoyuki May 25, 2020

qmpzzpmq May 25, 2020

kamo-naoyuki commented May 25, 2020

sw005320 left a comment

sw005320 May 25, 2020

qmpzzpmq May 25, 2020

sw005320 commented May 25, 2020

sw005320 May 25, 2020

sw005320 commented May 25, 2020

qmpzzpmq commented May 25, 2020 •

edited

sw005320 commented May 25, 2020

b-flo commented May 25, 2020

qmpzzpmq commented May 25, 2020 •

edited

ShigekiKarita left a comment

		@@ -2,10 +2,11 @@ MAIN_ROOT=$PWD/../../..
		KALDI_ROOT=$MAIN_ROOT/tools/kaldi

	echo "stage 4: Ngram Preparation"
	echo "stage 3: Ngram Preparation"

Implement Ngram scorer #1946

Implement Ngram scorer #1946

Conversation

qmpzzpmq commented May 20, 2020 • edited

mergify bot commented May 20, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qmpzzpmq commented May 21, 2020

qmpzzpmq commented May 21, 2020

kamo-naoyuki May 25, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kamo-naoyuki commented May 25, 2020

sw005320 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sw005320 commented May 25, 2020

Choose a reason for hiding this comment

sw005320 commented May 25, 2020

qmpzzpmq commented May 25, 2020 • edited

sw005320 commented May 25, 2020

b-flo commented May 25, 2020

qmpzzpmq commented May 25, 2020 • edited

ShigekiKarita left a comment

Choose a reason for hiding this comment

qmpzzpmq commented May 20, 2020 •

edited

kamo-naoyuki May 25, 2020 •

edited

qmpzzpmq commented May 25, 2020 •

edited

qmpzzpmq commented May 25, 2020 •

edited