Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPGRU and PGRU PR [exclude special fast component] #1950

Merged
merged 16 commits into from Dec 12, 2017

Conversation

GaofengCheng
Copy link
Contributor

related to #1799
the reference could be added later

dir=exp/$mic/chain${nnet3_affix}/tdnn_pgru${tgru_affix}_sp_bi
fi

if [ $label_delay -gt 0 ]; then dir=${dir}_ld$label_delay; fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was a pre-existing issue in the scripts, but I want to address this starting from now:
These directory names are too decorated. Can you please remove the _ld5 and the _bi from these directory names for these new scripts? let's try to do the same for all new AMI scripts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danpovey will it be OK to remain only 'sp' but deprecate 'ld5' and 'bi' in dir naming ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you can keep _sp, but remove _ld5 and _bi.

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few small comments.
But a bigger thing that concerns me is that from this PR, I don't see clarity on which topologies you recommend. I don't want to have to decide for each database whether PGRU or OPGRU is recommended-- I think we should be recommending one or the other, but probably not both. Don't do anything about this yet though-- just respond and let me know.

And did OPGRU not work for the bidirectional case? Just wondering why its BPGRU and not BOPGRU.

@@ -0,0 +1,269 @@
#!/bin/bash
# ./local/chain/compare_wer_general.sh tdnn_opgru_1a_sp
# System tdnn_opgru_1a_sp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no comparison here with other topologies. Can't you at least fake it based on the info that's checked into other scripts?
Please reorganize this directory to create a tuning directory, move the scripts into tuning/, and make soft links.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok will do

fi

dir=$dir${affix:+_$affix}
if [ $label_delay -gt 0 ]; then dir=${dir}_ld$label_delay; fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this _ld$label_delay decoration.
In fact, I'm a little confused about where you got this sript because a lot of aspects of it seem to not be the same as other scripts in the fisher_swbd directory.
The lang directory is differently named and you are using a different size tree.
I think something went wrong here.
Of course, if there is a good reason for these changes, or the results are better, I'll think about that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The experiments are OK under Fisher+Switchboard, all model use same tree "tri6_tree". The misguiding information in the TDNN-OPGRU script is because I used switchboard scripts directly (same for TDNN-PGRU, TDNN-LSTMP). But during training, I just used model training part. This will not influence results.

I will clean up the wrong scripts on Fisher+SWBD and re-upload.

fi

dir=$dir${affix:+_$affix}
if [ $label_delay -gt 0 ]; then dir=${dir}_ld$label_delay; fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again the label_delay affix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will do

configs.append("component name={0}.y type=NoOpComponent dim={1}".format(name, cell_dim))

configs.append("# Defining fixed scale/bias component for (1 - z_t)")
configs.append("component name={0}.fixed_scale_minus_one type=FixedScaleComponent scales={1}".format(name, self.config['vars_path']+"/minus_one"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no need for 'vars_path' here. FixedScaleComponent defaults to a scale of 1.0, you just have to supply the dim; and FixedBiasComponent can have a constant bias specified by "bias=1.0 dim=xxx".

As soon as I merge this to the RNNLM branch, though, I'll ask you to simplify these config-generation scripts to use the offset and scale capabilities I added there.

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on the recipes.
There are some things that could be improved- but I don't know if you have time, and now may not be the right time to add these things.
These comments are applicable in general, not just to the specific files I added the comments on.


# check steps/libs/nnet3/xconfig/gru.py for the other options and defaults
pgru-layer name=pgru1 cell-dim=1024 recurrent-projection-dim=256 non-recurrent-projection-dim=256 delay=-3 vars_path="$dir/configs"
relu-renorm-layer name=tdnn4 input=Append(-3,0,3) dim=1024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice these scripts are using relu-renorm-layer. In general we prefer relu-batchnorm-layer which tends to be better. It would be good to have a TDNN+LSTM example checked in here too, since it is one of the baselines in your paper. That also should have relu-batchnorm-layer.

nnet3_affix=_cleaned # cleanup affix for nnet3 and chain dirs, e.g. _cleaned
num_epochs=4

chunk_width=150
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better instead of 150, to make this 150,140,90.

fixed-affine-layer name=lda input=Append(-2,-1,0,1,2, ReplaceIndex(ivector, t, 0)) affine-transform-file=$dir/configs/lda.mat

# the first splicing is moved before the lda layer, so no splicing here
relu-renorm-layer name=tdnn1 dim=1024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is kind of not relevant to this PR, and you might want to look into it later, but I think you'd probably get improvements by adding l2 regularization to this setup. (it's not really about regularization, it's about helping it to learn faster). You'd probably want l2 constants that were about 10 times smaller than the ones in mini_librispeech-- maybe even smaller than that.
You would also probably get improvements in all these setups by adding non-splicing TDNN layers at the beginning after layers tdnn2 and tdnn3. Again, not closely related but it would be great if you could try it at some point.


configs.append("# Defining fixed scale/bias component for (1 - z_t)")
configs.append("component name={0}.fixed_scale_minus_one type=FixedScaleComponent scales={1}".format(name, self.config['vars_path']+"/minus_one"))
configs.append("component name={0}.fixed_bias_one type=FixedBiasComponent bias={1}".format(name, self.config['vars_path']+"/bias_one"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not be necessary to use fixed scale and bias now that I have merged those upgrades to Descriptors. Can you upgrade these scripts?

Also, do you have any plan to implement a fast version of the GRU comopnent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be OK to change the config into Descriptors but use the exps ran by scale-bias ?
It's time consuming to rerun all the scripts.
I'm not sure when I am OK to update a fast version OPGRU (it may need some tuning), I'm a little busy this month. I will ask Huang Lu whether he is OK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's OK, it's totally equivalent-- just run something small to make sure it doesn't crash and the objectives aren't wildly different than before for the 1st iteration.

Copy link
Contributor

@ngoel17 ngoel17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Geofeng - if we wanted to run GRU recipe on Fisher+swbd, what would be the best starting parameter setup?

@GaofengCheng
Copy link
Contributor Author

@ngoel17 search my latest commit under egs/fisher+swbd for tdnn_opgru. The results is showed. I have not double check the scripts, if there exist some small bugs in scripts, I believe it's not difficult to solve by yourself.

@GaofengCheng
Copy link
Contributor Author

Comparison between descriptor and fix-scale-bias ( swbd ):
The difference between fix-scale-bias and descriptor is very small
@danpovey
left: fix-scale-bias; right: descriptor
1.TDNN-PGRU:

# ./local/chain/compare_wer_general.sh tdnn_gru_1a_like_1e_adding_initial0_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_descriptor_ld5_sp
# System                tdnn_gru_1a_like_1e_adding_initial0_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_descriptor_ld5_sp
# WER on train_dev(tg)      12.70     12.61
# WER on train_dev(fg)      11.77     11.61
# WER on eval2000(tg)        14.9      14.8
# WER on eval2000(fg)        13.4      13.4
# Final train prob         -0.077    -0.075
# Final valid prob         -0.092    -0.093
# Final train prob (xent)        -0.929    -0.918
# Final valid prob (xent)       -0.9934   -0.9905

2.TDNN-NormPGRU:

# ./local/chain/compare_wer_general.sh tdnn_gru_1a_like_1e_adding_initial0_batchnorm_vertical_renorm_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_batchnorm_vertical_renorm_descriptor_ld5_sp
# System                tdnn_gru_1a_like_1e_adding_initial0_batchnorm_vertical_renorm_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_batchnorm_vertical_renorm_descriptor_ld5_sp
# WER on train_dev(tg)      12.75     12.92
# WER on train_dev(fg)      11.90     11.95
# WER on eval2000(tg)        15.1      14.9
# WER on eval2000(fg)        13.5      13.5
# Final train prob         -0.062    -0.062
# Final valid prob         -0.082    -0.083
# Final train prob (xent)        -0.837    -0.830
# Final valid prob (xent)       -0.9326   -0.9351
  1. TDNN-OPGRU:
# ./local/chain/compare_wer_general.sh tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_descriptor_ld5_sp
# System                tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_descriptor_ld5_sp
# WER on train_dev(tg)      12.61     12.62
# WER on train_dev(fg)      11.55     11.65
# WER on eval2000(tg)        14.9      14.9
# WER on eval2000(fg)        13.4      13.6
# Final train prob         -0.065    -0.067
# Final valid prob         -0.086    -0.087
# Final train prob (xent)        -0.871    -0.877
# Final valid prob (xent)       -0.9628   -0.9783
  1. BatchTDNN-NormOPGRU (recommended)
# ./local/chain/compare_wer_general.sh --looped tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_batch_renorm_tdnnbatch_perframe0.2_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_batch_renorm_tdnnbatch_perframe0.2_descriptor_ld5_sp
# System                tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_batch_renorm_tdnnbatch_perframe0.2_ld5_sp tdnn_gru_1a_like_1e_adding_initial0_ogate4_notrunc_batch_renorm_tdnnbatch_perframe0.2_descriptor_ld5_sp
# WER on train_dev(tg)      12.39     12.29
#           [looped:]       12.32     12.27
# WER on train_dev(fg)      11.39     11.32
#           [looped:]       11.35     11.34
# WER on eval2000(tg)        15.1      14.8
#           [looped:]        15.1      14.9
# WER on eval2000(fg)        13.6      13.5
#           [looped:]        13.5      13.5
# Final train prob         -0.066    -0.065
# Final valid prob         -0.085    -0.084
# Final train prob (xent)        -0.889    -0.888
# Final valid prob (xent)       -0.9837   -0.9824

@danpovey
Copy link
Contributor

danpovey commented Dec 7, 2017 via email

@danpovey danpovey merged commit 4e3c183 into kaldi-asr:master Dec 12, 2017
--egs.chunk-right-context $chunk_right_context \
--trainer.dropout-schedule $dropout_schedule \
--trainer.optimization.backstitch-training-scale 1 \
--trainer.optimization.backstitch-training-interval 4 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be scale=0.3 and interval=1, as said in the comment above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I will make a fix commit.

kronos-cm added a commit to kronos-cm/kaldi that referenced this pull request Dec 18, 2017
* 'master' of https://github.com/kaldi-asr/kaldi: (58 commits)
  [src] Fix bug in nnet3 optimization, affecting Scale() operation; cosmetic fixes. (kaldi-asr#2088)
  [egs] Mac compatibility fix to SGMM+MMI: remove -T option to cp (kaldi-asr#2087)
  [egs] Copy dictionary-preparation-script fix from fisher-english(8e7793f) to fisher-swbd and ami (kaldi-asr#2084)
  [egs] Small fix to backstitch in AMI scripts (kaldi-asr#2083)
  [scripts] Fix augment_data_dir.py (relates to non-pipe case of wav.scp) (kaldi-asr#2081)
  [egs,scripts] Add OPGRU scripts and recipes (kaldi-asr#1950)
  [egs] Add an l2-regularize-based recipe for image recognition setups (kaldi-asr#2066)
  [src] Bug-fix to assertion in cu-sparse-matrix.cc (RE large matrices) (kaldi-asr#2077)
  [egs] Add a tdnn+lstm+attention+backstitch recipe for tedlium (kaldi-asr#1982)
  [src,egs] Small cosmetic fixes (kaldi-asr#2074)
  [src] Small fix RE CuSparse error code printing (kaldi-asr#2070)
  [src] Fix compilation error on MSVC: missing include. (kaldi-asr#2064)
  [egs] Update to CSJ example scripts, with chain+TDNN recipes.  Thanks: @rickychanhoyin (kaldi-asr#2035)
  [scripts,egs] Convert ". path.sh" to ". ./path.sh" (kaldi-asr#2061)
  [doc] Add documentation about matrix row and column ranges in scp files.
  [egs] Add recipe for Mozilla Common Voice corpus v1 (kaldi-asr#2057)
  [scripts] Fix bug in slurm.pl affecting log format (kaldi-asr#2063)
  [src] Fix some small typos (kaldi-asr#2060)
  [scripts] Adding --num-threads option to ivector extraction scripts; script fixes (kaldi-asr#2055)
  [src] Bug-fix to conceptual bug in Minimum Bayes Risk/sausage code.  Thanks:@jtrmal (kaldi-asr#2056)
  ...
mahsa7823 pushed a commit to mahsa7823/kaldi that referenced this pull request Feb 28, 2018
Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants