OPGRU and PGRU PR [exclude special fast component] #1950

GaofengCheng · 2017-10-20T01:04:09Z

related to #1799
the reference could be added later

danpovey · 2017-10-25T04:05:47Z

egs/ami/s5b/local/chain/tuning/run_tdnn_pgru_1a.sh

+  dir=exp/$mic/chain${nnet3_affix}/tdnn_pgru${tgru_affix}_sp_bi
+fi
+
+if [ $label_delay -gt 0 ]; then dir=${dir}_ld$label_delay; fi


I know this was a pre-existing issue in the scripts, but I want to address this starting from now:
These directory names are too decorated. Can you please remove the _ld5 and the _bi from these directory names for these new scripts? let's try to do the same for all new AMI scripts.

@danpovey will it be OK to remain only 'sp' but deprecate 'ld5' and 'bi' in dir naming ?

Yes, you can keep _sp, but remove _ld5 and _bi.

danpovey

I have a few small comments.
But a bigger thing that concerns me is that from this PR, I don't see clarity on which topologies you recommend. I don't want to have to decide for each database whether PGRU or OPGRU is recommended-- I think we should be recommending one or the other, but probably not both. Don't do anything about this yet though-- just respond and let me know.

And did OPGRU not work for the bidirectional case? Just wondering why its BPGRU and not BOPGRU.

danpovey · 2017-10-25T04:08:41Z

egs/fisher_swbd/s5/local/chain/run_tdnn_opgru_1a.sh

@@ -0,0 +1,269 @@
+#!/bin/bash
+# ./local/chain/compare_wer_general.sh tdnn_opgru_1a_sp
+# System                tdnn_opgru_1a_sp


there is no comparison here with other topologies. Can't you at least fake it based on the info that's checked into other scripts?
Please reorganize this directory to create a tuning directory, move the scripts into tuning/, and make soft links.

danpovey · 2017-10-25T04:12:42Z

egs/fisher_swbd/s5/local/chain/run_tdnn_opgru_1a.sh

+fi
+
+dir=$dir${affix:+_$affix}
+if [ $label_delay -gt 0 ]; then dir=${dir}_ld$label_delay; fi


Please remove this _ld$label_delay decoration.
In fact, I'm a little confused about where you got this sript because a lot of aspects of it seem to not be the same as other scripts in the fisher_swbd directory.
The lang directory is differently named and you are using a different size tree.
I think something went wrong here.
Of course, if there is a good reason for these changes, or the results are better, I'll think about that.

The experiments are OK under Fisher+Switchboard, all model use same tree "tri6_tree". The misguiding information in the TDNN-OPGRU script is because I used switchboard scripts directly (same for TDNN-PGRU, TDNN-LSTMP). But during training, I just used model training part. This will not influence results.

I will clean up the wrong scripts on Fisher+SWBD and re-upload.

danpovey · 2017-10-25T04:14:23Z

egs/swbd/s5c/local/chain/tuning/run_tdnn_opgru_1a.sh

+fi
+
+dir=$dir${affix:+_$affix}
+if [ $label_delay -gt 0 ]; then dir=${dir}_ld$label_delay; fi


Again the label_delay affix.

danpovey · 2017-10-27T14:50:10Z

egs/wsj/s5/steps/libs/nnet3/xconfig/gru.py

+        configs.append("component name={0}.y type=NoOpComponent dim={1}".format(name, cell_dim))
+
+        configs.append("# Defining fixed scale/bias component for (1 - z_t)")
+        configs.append("component name={0}.fixed_scale_minus_one type=FixedScaleComponent scales={1}".format(name, self.config['vars_path']+"/minus_one"))


there is no need for 'vars_path' here. FixedScaleComponent defaults to a scale of 1.0, you just have to supply the dim; and FixedBiasComponent can have a constant bias specified by "bias=1.0 dim=xxx".

As soon as I merge this to the RNNLM branch, though, I'll ask you to simplify these config-generation scripts to use the offset and scale capabilities I added there.

danpovey

Some comments on the recipes.
There are some things that could be improved- but I don't know if you have time, and now may not be the right time to add these things.
These comments are applicable in general, not just to the specific files I added the comments on.

danpovey · 2017-10-28T03:08:01Z

egs/ami/s5b/local/chain/tuning/run_tdnn_pgru_1a.sh

+
+  # check steps/libs/nnet3/xconfig/gru.py for the other options and defaults
+  pgru-layer name=pgru1 cell-dim=1024 recurrent-projection-dim=256 non-recurrent-projection-dim=256 delay=-3 vars_path="$dir/configs"
+  relu-renorm-layer name=tdnn4 input=Append(-3,0,3) dim=1024


I notice these scripts are using relu-renorm-layer. In general we prefer relu-batchnorm-layer which tends to be better. It would be good to have a TDNN+LSTM example checked in here too, since it is one of the baselines in your paper. That also should have relu-batchnorm-layer.

danpovey · 2017-10-28T03:09:53Z

egs/ami/s5b/local/chain/tuning/run_tdnn_pgru_1a.sh

+nnet3_affix=_cleaned  # cleanup affix for nnet3 and chain dirs, e.g. _cleaned
+num_epochs=4
+
+chunk_width=150


It would be better instead of 150, to make this 150,140,90.

danpovey · 2017-10-28T03:13:33Z

egs/fisher_swbd/s5/local/chain/run_tdnn_opgru_1a.sh

+  fixed-affine-layer name=lda input=Append(-2,-1,0,1,2, ReplaceIndex(ivector, t, 0)) affine-transform-file=$dir/configs/lda.mat
+
+  # the first splicing is moved before the lda layer, so no splicing here
+  relu-renorm-layer name=tdnn1 dim=1024


I know this is kind of not relevant to this PR, and you might want to look into it later, but I think you'd probably get improvements by adding l2 regularization to this setup. (it's not really about regularization, it's about helping it to learn faster). You'd probably want l2 constants that were about 10 times smaller than the ones in mini_librispeech-- maybe even smaller than that.
You would also probably get improvements in all these setups by adding non-splicing TDNN layers at the beginning after layers tdnn2 and tdnn3. Again, not closely related but it would be great if you could try it at some point.

danpovey · 2017-12-04T21:03:04Z

egs/wsj/s5/steps/libs/nnet3/xconfig/gru.py

+
+        configs.append("# Defining fixed scale/bias component for (1 - z_t)")
+        configs.append("component name={0}.fixed_scale_minus_one type=FixedScaleComponent scales={1}".format(name, self.config['vars_path']+"/minus_one"))
+        configs.append("component name={0}.fixed_bias_one type=FixedBiasComponent bias={1}".format(name, self.config['vars_path']+"/bias_one"))


It should not be necessary to use fixed scale and bias now that I have merged those upgrades to Descriptors. Can you upgrade these scripts?

Also, do you have any plan to implement a fast version of the GRU comopnent?

Will it be OK to change the config into Descriptors but use the exps ran by scale-bias ?
It's time consuming to rerun all the scripts.
I'm not sure when I am OK to update a fast version OPGRU (it may need some tuning), I'm a little busy this month. I will ask Huang Lu whether he is OK.

yes that's OK, it's totally equivalent-- just run something small to make sure it doesn't crash and the objectives aren't wildly different than before for the 1st iteration.

Testing the new config

ngoel17

Geofeng - if we wanted to run GRU recipe on Fisher+swbd, what would be the best starting parameter setup?

GaofengCheng · 2017-12-06T01:14:48Z

@ngoel17 search my latest commit under egs/fisher+swbd for tdnn_opgru. The results is showed. I have not double check the scripts, if there exist some small bugs in scripts, I believe it's not difficult to solve by yourself.

GaofengCheng · 2017-12-07T01:59:29Z

Comparison between descriptor and fix-scale-bias ( swbd ):
The difference between fix-scale-bias and descriptor is very small
@danpovey
left: fix-scale-bias; right: descriptor
1.TDNN-PGRU：

# ./local/chain/compare_wer_general.sh tdnn_gru_1a_like_1e_adding_initial0_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_descriptor_ld5_sp
# System                tdnn_gru_1a_like_1e_adding_initial0_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_descriptor_ld5_sp
# WER on train_dev(tg)      12.70     12.61
# WER on train_dev(fg)      11.77     11.61
# WER on eval2000(tg)        14.9      14.8
# WER on eval2000(fg)        13.4      13.4
# Final train prob         -0.077    -0.075
# Final valid prob         -0.092    -0.093
# Final train prob (xent)        -0.929    -0.918
# Final valid prob (xent)       -0.9934   -0.9905

2.TDNN-NormPGRU:

# ./local/chain/compare_wer_general.sh tdnn_gru_1a_like_1e_adding_initial0_batchnorm_vertical_renorm_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_batchnorm_vertical_renorm_descriptor_ld5_sp
# System                tdnn_gru_1a_like_1e_adding_initial0_batchnorm_vertical_renorm_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_batchnorm_vertical_renorm_descriptor_ld5_sp
# WER on train_dev(tg)      12.75     12.92
# WER on train_dev(fg)      11.90     11.95
# WER on eval2000(tg)        15.1      14.9
# WER on eval2000(fg)        13.5      13.5
# Final train prob         -0.062    -0.062
# Final valid prob         -0.082    -0.083
# Final train prob (xent)        -0.837    -0.830
# Final valid prob (xent)       -0.9326   -0.9351

TDNN-OPGRU:

# ./local/chain/compare_wer_general.sh tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_descriptor_ld5_sp
# System                tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_descriptor_ld5_sp
# WER on train_dev(tg)      12.61     12.62
# WER on train_dev(fg)      11.55     11.65
# WER on eval2000(tg)        14.9      14.9
# WER on eval2000(fg)        13.4      13.6
# Final train prob         -0.065    -0.067
# Final valid prob         -0.086    -0.087
# Final train prob (xent)        -0.871    -0.877
# Final valid prob (xent)       -0.9628   -0.9783

BatchTDNN-NormOPGRU (recommended)

# ./local/chain/compare_wer_general.sh --looped tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_batch_renorm_tdnnbatch_perframe0.2_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_batch_renorm_tdnnbatch_perframe0.2_descriptor_ld5_sp
# System                tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_batch_renorm_tdnnbatch_perframe0.2_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_batch_renorm_tdnnbatch_perframe0.2_descriptor_ld5_sp
# WER on train_dev(tg)      12.39     12.29
#           [looped:]       12.32     12.27
# WER on train_dev(fg)      11.39     11.32
#           [looped:]       11.35     11.34
# WER on eval2000(tg)        15.1      14.8
#           [looped:]        15.1      14.9
# WER on eval2000(fg)        13.6      13.5
#           [looped:]        13.5      13.5
# Final train prob         -0.066    -0.065
# Final valid prob         -0.085    -0.084
# Final train prob (xent)        -0.889    -0.888
# Final valid prob (xent)       -0.9837   -0.9824

danpovey · 2017-12-07T02:03:34Z

OK, I think that's just the noise from running it again.

…

On Wed, Dec 6, 2017 at 8:59 PM, Gaofeng Cheng ***@***.***> wrote: Comparison between descriptor and fix-scale-bias ( swbd ): The difference between fix-scale-bias and descriptor is very small @danpovey <https://github.com/danpovey> left: fix-scale-bias; right: descriptor 1.TDNN-PGRU： # ./local/chain/compare_wer_general.sh tdnn_gru_1a_like_1e_adding_initial0_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_descriptor_ld5_sp # System tdnn_gru_1a_like_1e_adding_initial0_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_descriptor_ld5_sp # WER on train_dev(tg) 12.70 12.61 # WER on train_dev(fg) 11.77 11.61 # WER on eval2000(tg) 14.9 14.8 # WER on eval2000(fg) 13.4 13.4 # Final train prob -0.077 -0.075 # Final valid prob -0.092 -0.093 # Final train prob (xent) -0.929 -0.918 # Final valid prob (xent) -0.9934 -0.9905 2.TDNN-NormPGRU: # ./local/chain/compare_wer_general.sh tdnn_gru_1a_like_1e_adding_initial0_batchnorm_vertical_renorm_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_batchnorm_vertical_renorm_descriptor_ld5_sp # System tdnn_gru_1a_like_1e_adding_initial0_batchnorm_vertical_renorm_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_batchnorm_vertical_renorm_descriptor_ld5_sp # WER on train_dev(tg) 12.75 12.92 # WER on train_dev(fg) 11.90 11.95 # WER on eval2000(tg) 15.1 14.9 # WER on eval2000(fg) 13.5 13.5 # Final train prob -0.062 -0.062 # Final valid prob -0.082 -0.083 # Final train prob (xent) -0.837 -0.830 # Final valid prob (xent) -0.9326 -0.9351 1. TDNN-OPGRU: # ./local/chain/compare_wer_general.sh tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_descriptor_ld5_sp # System tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_descriptor_ld5_sp # WER on train_dev(tg) 12.61 12.62 # WER on train_dev(fg) 11.55 11.65 # WER on eval2000(tg) 14.9 14.9 # WER on eval2000(fg) 13.4 13.6 # Final train prob -0.065 -0.067 # Final valid prob -0.086 -0.087 # Final train prob (xent) -0.871 -0.877 # Final valid prob (xent) -0.9628 -0.9783 1. BatchTDNN-NormOPGRU (recommended) # ./local/chain/compare_wer_general.sh --looped tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_batch_renorm_tdnnbatch_perframe0.2_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_batch_renorm_tdnnbatch_perframe0.2_descriptor_ld5_sp # System tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_batch_renorm_tdnnbatch_perframe0.2_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_batch_renorm_tdnnbatch_perframe0.2_descriptor_ld5_sp # WER on train_dev(tg) 12.39 12.29 # [looped:] 12.32 12.27 # WER on train_dev(fg) 11.39 11.32 # [looped:] 11.35 11.34 # WER on eval2000(tg) 15.1 14.8 # [looped:] 15.1 14.9 # WER on eval2000(fg) 13.6 13.5 # [looped:] 13.5 13.5 # Final train prob -0.066 -0.065 # Final valid prob -0.085 -0.084 # Final train prob (xent) -0.889 -0.888 # Final valid prob (xent) -0.9837 -0.9824 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1950 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu04xjRpYxScwOWu3MTBAMAiIVeFxks5s90aEgaJpZM4QADpv> .

chaoweihuang · 2017-12-13T10:25:08Z

egs/ami/s5b/local/chain/tuning/run_tdnn_opgru_1c.sh

+    --egs.chunk-right-context $chunk_right_context \
+    --trainer.dropout-schedule $dropout_schedule \
+    --trainer.optimization.backstitch-training-scale 1 \
+    --trainer.optimization.backstitch-training-interval 4 \


Should it be scale=0.3 and interval=1, as said in the comment above?

yes, I will make a fix commit.

@rickychanhoyin

* 'master' of https://github.com/kaldi-asr/kaldi: (58 commits) [src] Fix bug in nnet3 optimization, affecting Scale() operation; cosmetic fixes. (kaldi-asr#2088) [egs] Mac compatibility fix to SGMM+MMI: remove -T option to cp (kaldi-asr#2087) [egs] Copy dictionary-preparation-script fix from fisher-english(8e7793f) to fisher-swbd and ami (kaldi-asr#2084) [egs] Small fix to backstitch in AMI scripts (kaldi-asr#2083) [scripts] Fix augment_data_dir.py (relates to non-pipe case of wav.scp) (kaldi-asr#2081) [egs,scripts] Add OPGRU scripts and recipes (kaldi-asr#1950) [egs] Add an l2-regularize-based recipe for image recognition setups (kaldi-asr#2066) [src] Bug-fix to assertion in cu-sparse-matrix.cc (RE large matrices) (kaldi-asr#2077) [egs] Add a tdnn+lstm+attention+backstitch recipe for tedlium (kaldi-asr#1982) [src,egs] Small cosmetic fixes (kaldi-asr#2074) [src] Small fix RE CuSparse error code printing (kaldi-asr#2070) [src] Fix compilation error on MSVC: missing include. (kaldi-asr#2064) [egs] Update to CSJ example scripts, with chain+TDNN recipes. Thanks: @rickychanhoyin (kaldi-asr#2035) [scripts,egs] Convert ". path.sh" to ". ./path.sh" (kaldi-asr#2061) [doc] Add documentation about matrix row and column ranges in scp files. [egs] Add recipe for Mozilla Common Voice corpus v1 (kaldi-asr#2057) [scripts] Fix bug in slurm.pl affecting log format (kaldi-asr#2063) [src] Fix some small typos (kaldi-asr#2060) [scripts] Adding --num-threads option to ivector extraction scripts; script fixes (kaldi-asr#2055) [src] Bug-fix to conceptual bug in Minimum Bayes Risk/sausage code. Thanks:@jtrmal (kaldi-asr#2056) ...

GaofengCheng added 2 commits October 20, 2017 08:54

OPGRU

5cae2a4

scripts

42e5379

GaofengCheng mentioned this pull request Oct 20, 2017

[WIP] gru.py and related scripts #1799

Closed

small fix

a7b0333

danpovey reviewed Oct 25, 2017

View reviewed changes

Update to OPGRU

c6ec561

danpovey reviewed Oct 27, 2017

View reviewed changes

danpovey reviewed Oct 28, 2017

View reviewed changes

GaofengCheng added 3 commits December 4, 2017 12:08

add Norm-{PGRU, OPGRU} in gru.py

c5c69a6

add Norm-{PGRU, OPGRU} parser

be6adee

small fix

1f25965

danpovey reviewed Dec 4, 2017

View reviewed changes

GaofengCheng added 5 commits December 5, 2017 09:53

Merge branch 'master' into OPGRU_pr

b4a60fb

Change from fix-scale-bias to Descriptors

c538f8d

Testing the new config

fix to gru.py [dropout related]

41bcf4c

Adding a bunch of TDNN-NormOPGRU scripts

2849d15

small fix

5254be4

ngoel17 reviewed Dec 5, 2017

View reviewed changes

GaofengCheng added 4 commits December 8, 2017 14:00

adding backstitch for TDNN-NormOPGRU under AMI

0756d6a

name fix

81c2a8a

deleting all simple pgru scripts

1f5963f

deleting pgru scripts

557dd6d

danpovey merged commit 4e3c183 into kaldi-asr:master Dec 12, 2017

chaoweihuang reviewed Dec 13, 2017

View reviewed changes

GaofengCheng mentioned this pull request Dec 14, 2017

Small fix to backstitch in AMI scripts #2083

Merged

mahsa7823 pushed a commit to mahsa7823/kaldi that referenced this pull request Feb 28, 2018

[egs,scripts] Add OPGRU scripts and recipes (kaldi-asr#1950)

50847dd

Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018

[egs,scripts] Add OPGRU scripts and recipes (kaldi-asr#1950)

87a68ac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OPGRU and PGRU PR [exclude special fast component] #1950

OPGRU and PGRU PR [exclude special fast component] #1950

GaofengCheng commented Oct 20, 2017

danpovey Oct 25, 2017

GaofengCheng Dec 5, 2017

danpovey Dec 5, 2017

danpovey left a comment

danpovey Oct 25, 2017

GaofengCheng Oct 26, 2017

danpovey Oct 25, 2017

GaofengCheng Oct 27, 2017

danpovey Oct 25, 2017

GaofengCheng Oct 26, 2017

danpovey Oct 27, 2017

danpovey left a comment

danpovey Oct 28, 2017

danpovey Oct 28, 2017

danpovey Oct 28, 2017

danpovey Dec 4, 2017

GaofengCheng Dec 5, 2017

danpovey Dec 5, 2017

ngoel17 left a comment

GaofengCheng commented Dec 6, 2017

GaofengCheng commented Dec 7, 2017

danpovey commented Dec 7, 2017 via email

chaoweihuang Dec 13, 2017

GaofengCheng Dec 13, 2017

OPGRU and PGRU PR [exclude special fast component] #1950

OPGRU and PGRU PR [exclude special fast component] #1950

Conversation

GaofengCheng commented Oct 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danpovey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danpovey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ngoel17 left a comment

Choose a reason for hiding this comment

GaofengCheng commented Dec 6, 2017

GaofengCheng commented Dec 7, 2017

danpovey commented Dec 7, 2017 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment