Skip to content

Commit

Permalink
[egs,scripts] Misc script fixes; refactor wsj/s5 examples; update ted…
Browse files Browse the repository at this point in the history
…lium/s5_r2 (#1456)
  • Loading branch information
danpovey committed Feb 27, 2017
1 parent 25b1299 commit d60e3cc
Show file tree
Hide file tree
Showing 39 changed files with 3,310 additions and 1,083 deletions.
198 changes: 0 additions & 198 deletions egs/tedlium/s5_r2/local/chain/run_tdnn_d.sh

This file was deleted.

4 changes: 2 additions & 2 deletions egs/tedlium/s5_r2/local/chain/tuning/run_tdnn_lstm_1e.sh
Expand Up @@ -259,14 +259,14 @@ fi
if [ $stage -le 18 ]; then
if [[ $(hostname -f) == *.clsp.jhu.edu ]] && [ ! -d $dir/egs/storage ]; then
utils/create_split_dir.pl \
/export/b0{5,6,7,8}/$USER/kaldi-data/egs/ami-$(date +'%m_%d_%H_%M')/s5/$dir/egs/storage $dir/egs/storage
/export/b0{5,6,7,8}/$USER/kaldi-data/egs/tedlium-$(date +'%m_%d_%H_%M')/s5_r2/$dir/egs/storage $dir/egs/storage
fi

steps/nnet3/chain/train.py --stage $train_stage \
--cmd "$decode_cmd" \
--feat.online-ivector-dir $train_ivector_dir \
--feat.cmvn-opts "--norm-means=false --norm-vars=false" \
--chain.xent-regularize 0.1 \
--chain.xent-regularize $xent_regularize \
--chain.leaky-hmm-coefficient 0.1 \
--chain.l2-regularize 0.00005 \
--chain.apply-deriv-weights false \
Expand Down
25 changes: 24 additions & 1 deletion egs/tedlium/s5_r2/local/nnet3/compare_wer.sh
@@ -1,19 +1,32 @@
#!/bin/bash

# this script is used for comparing decoding results between systems.
# e.g. local/nnet3/compare_wer_general.sh exp/nnet3_cleaned/tdnn_{c,d}_sp
# e.g. local/nnet3/compare_wer.sh exp/nnet3_cleaned/tdnn_{c,d}_sp
# For use with discriminatively trained systems you specify the epochs after a colon:
# for instance,
# local/nnet3/compare_wer.sh exp/nnet3_cleaned/tdnn_c_sp exp/nnet3_cleaned/tdnn_c_sp_smbr:{1,2,3}


if [ $# == 0 ]; then
echo "Usage: $0: [--looped] [--online] <dir1> [<dir2> ... ]"
echo "e.g.: $0 exp/nnet3_cleaned/tdnn_{b,c}_sp"
echo "or (with epoch numbers for discriminative training):"
echo "$0 exp/nnet3_cleaned/tdnn_b_sp_disc:{1,2,3}"
exit 1
fi

echo "# $0 $*"

include_looped=false
if [ "$1" == "--looped" ]; then
include_looped=true
shift
fi
include_online=false
if [ "$1" == "--online" ]; then
include_online=true
shift
fi



Expand Down Expand Up @@ -71,6 +84,16 @@ for n in 0 1 2 3; do
done
echo
fi
if $include_online; then
echo -n "# [online:] "
for x in $*; do
set_names $x # sets $dirname and $epoch_infix
decode_names=(dev${epoch_infix} dev${epoch_infix}_rescore test${epoch_infix} test${epoch_infix}_rescore)
wer=$(grep Sum ${dirname}_online/decode_${decode_names[$n]}/score*/*ys | utils/best_wer.sh | awk '{print $2}')
printf "% 10s" $wer
done
echo
fi
done


Expand Down
4 changes: 2 additions & 2 deletions egs/tedlium/s5_r2/local/nnet3/run_ivector_common.sh
Expand Up @@ -21,9 +21,9 @@ num_threads_ubm=32
nnet3_affix=_cleaned # affix for exp/nnet3 directory to put iVector stuff in, so it
# becomes exp/nnet3_cleaned or whatever.

. cmd.sh
. ./cmd.sh
. ./path.sh
. ./utils/parse_options.sh
. utils/parse_options.sh


gmm_dir=exp/${gmm}
Expand Down
1 change: 1 addition & 0 deletions egs/tedlium/s5_r2/local/nnet3/run_tdnn.sh
1 change: 1 addition & 0 deletions egs/tedlium/s5_r2/local/nnet3/run_tdnn_lstm_lfr.sh
3 changes: 3 additions & 0 deletions egs/tedlium/s5_r2/local/nnet3/tuning/run_tdnn_1b.sh
@@ -1,5 +1,8 @@
#!/bin/bash


# 1b is as 1a but uses xconfigs.

# This is the standard "tdnn" system, built in nnet3; this script
# is the version that's meant to run with data-cleanup, that doesn't
# support parallel alignments.
Expand Down

5 comments on commit d60e3cc

@mikenewman1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason that the -1 option (for the end time) in the segments file was dropped here? It looks to me like extract-segments still supports this feature.

@danpovey
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still supported at the code level but no longer supported at the script level. Having that supported created complications when doing things like merging data dirs and working out utt2dur files.

@mikenewman1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shame. But I see your point.
Does Kaldi already have a utility that can simply compute the lengths of arbitrary audio files (eg mp3) for use in the segments file? I don't see one. (It's easy to write one using sox -n stats but no point if it already exists)

@danpovey
Copy link
Contributor Author

@danpovey danpovey commented on d60e3cc Dec 13, 2017 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikenewman1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I knew it had to exist. And yes, there are pipes (of course)

Please sign in to comment.