New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue training using the Aspire recipe #4041
Comments
Did you change in path.sh where it says
export LC_ALL=C?
use utils/validate_data_dir.sh to validate that data-dir it's using.
…On Wed, Apr 15, 2020 at 12:32 AM Andre Natal ***@***.***> wrote:
Hello,
I'm trying to train a model using the Aspire recipe, using the latest code
from the master branch, but am encountering the following error when
running local/chain/run_tdnn_lstm.sh. When I trained using
local/chain/run_tdnn.sh, it worked fine.
steps/nnet3/chain/get_egs.sh --frames-overlap-per-eg 0 --generate-egs-scp true --cmd "run.pl" --cmvn-opts "--norm-means=false --norm-vars=false" --online-ivector-dir "exp/nnet3/ivectors_train_rvb" --left-context 58 --right-context 28 --left-context-initial 18 --right-context-final 28 --left-tolerance '5' --right-tolerance '5' --frame-subsampling-factor 3 --alignment-subsampling-factor 3 --stage -10 --frames-per-iter 1500000 --frames-per-eg 160,140,110,80 --srand 0 data/train_rvb_hires exp/chain/tdnn_lstm_1a exp/chain/tri5a_train_rvb_lats exp/chain/tdnn_lstm_1a/egs
steps/nnet3/chain/get_egs.sh --frames-overlap-per-eg 0 --generate-egs-scp true --cmd run.pl --cmvn-opts --norm-means=false --norm-vars=false --online-ivector-dir exp/nnet3/ivectors_train_rvb --left-context 58 --right-context 28 --left-context-initial 18 --right-context-final 28 --left-tolerance 5 --right-tolerance 5 --frame-subsampling-factor 3 --alignment-subsampling-factor 3 --stage -10 --frames-per-iter 1500000 --frames-per-eg 160,140,110,80 --srand 0 data/train_rvb_hires exp/chain/tdnn_lstm_1a exp/chain/tri5a_train_rvb_lats exp/chain/tdnn_lstm_1a/egs
steps/nnet3/chain/get_egs.sh: File data/train_rvb_hires/utt2uniq exists, so ensuring the hold-out set includes all perturbed versions of the same source utterance.
steps/nnet3/chain/get_egs.sh: Holding out 300 utterances in validation set and 300 in training diagnostic set, out of total 5614836.
steps/nnet3/chain/get_egs.sh: creating egs. To ensure they are not deleted later you can do: touch exp/chain/tdnn_lstm_1a/egs/.nodelete
steps/nnet3/chain/get_egs.sh: feature type is raw, with 'apply-cmvn'
tree-info exp/chain/tdnn_lstm_1a/tree
feat-to-dim scp:exp/nnet3/ivectors_train_rvb/ivector_online.scp -
steps/nnet3/chain/get_egs.sh: working out number of frames of training data
steps/nnet3/chain/get_egs.sh: working out feature dim
steps/nnet3/chain/get_egs.sh: creating 1374 archives, each with 18749 egs, with
steps/nnet3/chain/get_egs.sh: 160,140,110,80 labels per example, and (left,right) context = (58,28)
steps/nnet3/chain/get_egs.sh: ... and (left-context-initial,right-context-final) = (18,28)
steps/nnet3/chain/get_egs.sh: Getting validation and training subset examples in background.
steps/nnet3/chain/get_egs.sh: Generating training examples on diskrun.pl: job failed, log is in exp/chain/tdnn_lstm_1a/egs/log/create_valid_subset.log
When I inspect the aforementioned log file, I see this:
# utils/filter_scp.pl exp/chain/tdnn_lstm_1a/egs/valid_uttlist exp/chain/tdnn_lstm_1a/egs/lat_special.scp | lattice-align-phones --replace-output-symb
ols=true exp/chain/tri5a_train_rvb_lats/final.mdl scp:- ark:- | chain-get-supervision --lattice-input=true --frame-subsampling-factor=3 --right-tolera
nce=5 --left-tolerance=5 exp/chain/tdnn_lstm_1a/tree exp/chain/tdnn_lstm_1a/0.trans_mdl ark:- ark:- | nnet3-chain-get-egs --online-ivectors=scp:exp/nn
et3/ivectors_train_rvb/ivector_online.scp --online-ivector-period=10 --srand=0 --left-context=58 --right-context=28 --num-frames=160,140,110,80 --fram
e-subsampling-factor=3 --compress=true --left-context-initial=18 --right-context-final=28 --normalization-fst-scale=1.0 exp/chain/tdnn_lstm_1a/normali
zation.fst "ark,s,cs:utils/filter_scp.pl exp/chain/tdnn_lstm_1a/egs/valid_uttlist data/train_rvb_hires/feats.scp | apply-cmvn --norm-means=false --nor
m-vars=false --utt2spk=ark:data/train_rvb_hires/utt2spk scp:data/train_rvb_hires/cmvn.scp scp:- ark:- |" ark,s,cs:- ark:exp/chain/tdnn_lstm_1a/egs/val
id_all.cegs
# Started at Mon Apr 13 22:01:36 PDT 2020
#
chain-get-supervision --lattice-input=true --frame-subsampling-factor=3 --right-tolerance=5 --left-tolerance=5 exp/chain/tdnn_lstm_1a/tree exp/chain/t
dnn_lstm_1a/0.trans_mdl ark:- ark:-
nnet3-chain-get-egs --online-ivectors=scp:exp/nnet3/ivectors_train_rvb/ivector_online.scp --online-ivector-period=10 --srand=0 --left-context=58 --rig
ht-context=28 --num-frames=160,140,110,80 --frame-subsampling-factor=3 --compress=true --left-context-initial=18 --right-context-final=28 --normalizat
ion-fst-scale=1.0 exp/chain/tdnn_lstm_1a/normalization.fst 'ark,s,cs:utils/filter_scp.pl exp/chain/tdnn_lstm_1a/egs/valid_uttlist data/train_rvb_hires
/feats.scp | apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/train_rvb_hires/utt2spk scp:data/train_rvb_hires/cmvn.scp scp:- ark:-
|' ark,s,cs:- ark:exp/chain/tdnn_lstm_1a/egs/valid_all.cegs
LOG (nnet3-chain-get-egs[5.5.569~1-6f329]:ComputeDerived():nnet-example-utils.cc:335) Rounding up --num-frames=160,140,110,80 to multiples of --frame-
subsampling-factor=3, to: 162,141,111,81
lattice-align-phones --replace-output-symbols=true exp/chain/tri5a_train_rvb_lats/final.mdl scp:- ark:-
apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/train_rvb_hires/utt2spk scp:data/train_rvb_hires/cmvn.scp scp:- ark:-
WARNING (nnet3-chain-get-egs[5.5.569~1-6f329]:ProcessFile():nnet3-chain-get-egs.cc:134) Not producing egs for utterance rev1-fe_03_00123-A-041128-0411
78 because it is too short: 48 frames.
WARNING (nnet3-chain-get-egs[5.5.569~1-6f329]:ProcessFile():nnet3-chain-get-egs.cc:134) Not producing egs for utterance rev1-fe_03_00325-A-034285-0343
60 because it is too short: 73 frames.
WARNING (nnet3-chain-get-egs[5.5.569~1-6f329]:ProcessFile():nnet3-chain-get-egs.cc:134) Not producing egs for utterance rev1-fe_03_04633-B-000179-0002
59 because it is too short: 78 frames.
WARNING (nnet3-chain-get-egs[5.5.569~1-6f329]:ProcessFile():nnet3-chain-get-egs.cc:134) Not producing egs for utterance rev1-fe_03_05509-B-049806-0498
81 because it is too short: 73 frames.
WARNING (nnet3-chain-get-egs[5.5.569~1-6f329]:ProcessFile():nnet3-chain-get-egs.cc:134) Not producing egs for utterance rev1-fe_03_11038-B-022852-0229
26 because it is too short: 72 frames.
WARNING (nnet3-chain-get-egs[5.5.569~1-6f329]:ProcessFile():nnet3-chain-get-egs.cc:134) Not producing egs for utterance rev1-fe_03_11661-A-052912-0529
41 because it is too short: 27 frames.
WARNING (nnet3-chain-get-egs[5.5.569~1-6f329]:ProcessFile():nnet3-chain-get-egs.cc:134) Not producing egs for utterance rev2-fe_03_00123-A-041128-0411
78 because it is too short: 48 frames.
WARNING (nnet3-chain-get-egs[5.5.569~1-6f329]:ProcessFile():nnet3-chain-get-egs.cc:134) Not producing egs for utterance rev2-fe_03_00325-A-034285-0343
60 because it is too short: 73 frames.
WARNING (nnet3-chain-get-egs[5.5.569~1-6f329]:main():nnet3-chain-get-egs.cc:386) No pdf-level posterior for key rev2-fe_03_03392-A-000991-001172
ERROR (nnet3-chain-get-egs[5.5.569~1-6f329]:FindKeyInternal():util/kaldi-table-inl.h:2149) You provided the "s" option (sorted order), but keys are o
ut of order or duplicated: rev2-fe_03_03635-B-013708-013832 is followed by rev2-fe_03_03392-A-000991-001172: rspecifier is ark,s,cs:-
[ Stack-Trace: ]
nnet3-chain-get-egs(kaldi::MessageLogger::LogMessage() const+0xb42) [0x56104d2e1960]
nnet3-chain-get-egs(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x21) [0x56104cf7a447]
nnet3-chain-get-egs(kaldi::RandomAccessTableReaderDSortedArchiveImpl<kaldi::KaldiObjectHolder<kaldi::chain::Supervision> >::FindKeyInternal(std::__cxx
11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x469) [0x56104cf8d909]
nnet3-chain-get-egs(kaldi::RandomAccessTableReaderDSortedArchiveImpl<kaldi::KaldiObjectHolder<kaldi::chain::Supervision> >::HasKey(std::__cxx11::basic
_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x9) [0x56104cf8db61]
nnet3-chain-get-egs(kaldi::RandomAccessTableReader<kaldi::KaldiObjectHolder<kaldi::chain::Supervision> >::HasKey(std::__cxx11::basic_string<char, std:
:char_traits<char>, std::allocator<char> > const&)+0x40) [0x56104cf81a62]
nnet3-chain-get-egs(main+0xe24) [0x56104cf76bfe]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f9ad2216b97]
nnet3-chain-get-egs(_start+0x2a) [0x56104cf75cfa]
WARNING (nnet3-chain-get-egs[5.5.569~1-6f329]:Close():kaldi-io.cc:515) Pipe utils/filter_scp.pl exp/chain/tdnn_lstm_1a/egs/valid_uttlist data/train_rv
b_hires/feats.scp | apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/train_rvb_hires/utt2spk scp:data/train_rvb_hires/cmvn.scp scp:-
ark:- | had nonzero return status 36096
LOG (nnet3-chain-get-egs[5.5.569~1-6f329]:~UtteranceSplitter():nnet-example-utils.cc:357) Split 127 utts, with total length 46459 frames (0.129053 hou
rs assuming 100 frames per second)
LOG (nnet3-chain-get-egs[5.5.569~1-6f329]:~UtteranceSplitter():nnet-example-utils.cc:366) Average chunk length was 132.473 frames; overlap between adj
acent chunks was 1.12357% of input length; length of output was 99.5135% of input length (minus overlap = 98.39%).
LOG (nnet3-chain-get-egs[5.5.569~1-6f329]:~UtteranceSplitter():nnet-example-utils.cc:382) Output frames are distributed among chunk-sizes as follows:
81 = 14.89%, 111 = 12.24%, 141 = 11.89%, 162 = 60.97%
kaldi::KaldiFatalError
# Accounting: time=10 threads=1
# Ended (code 255) at Mon Apr 13 22:01:46 PDT 2020, elapsed time 10 seconds
So is the recipe updated and currently working with master or should I
just use fisher_english?
Thanks
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#4041>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO2NH7E7NCBZIPEBJPDRMSF2NANCNFSM4MH4D7NA>
.
|
Hi @danpovey, thanks for the response. Yes, I'll run Thanks. |
Looks to me like that file lat_special.scp may not be in sorted order. You'll have to trace back into how it was created and figure out why. |
Ok, thanks, will try to see what happened to this file. I just ran
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I just saw that this is the same issue I recently posted. |
lat_special.scp is generated by lattice-copy. Should the ,s be added to the wspecifier there? |
This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open. |
Hello,
I'm trying to train a model using the Aspire recipe, using the latest code from the master branch, but am encountering the following error when running
local/chain/run_tdnn_lstm.sh
. When I trained usinglocal/chain/run_tdnn.sh
, it worked fine.When I inspect the aforementioned log file, I see this:
So is the recipe updated and currently working with master or should I just use fisher_english?
Thanks
The text was updated successfully, but these errors were encountered: