WIP: add DFSMN related codes and example scripts #2437

tramphero · 2018-05-21T11:22:41Z

The implementation of DFSMN (https://arxiv.org/pdf/1803.05030.pdf) in Nnet1.
I have evaluated three configurations of DFSMN (DFSMN_S, DFSMN_M, DFSMN_L) with different model size in the librispeech task. The detailed experimental results are as shown in the RESULTS file.

danpovey

Thanks for the PR.
I don't think there is a good chance that I will want to commit this, as it's a lot of extra code, it uses the nnet1 framework which is kind of deprecated, and it's not really clear that DFSMN has an advantage over the current TDNN-F scripts of the 'chain' models (which are presumably much faster to train and test).
But it's definitely interesting work; @GaofengCheng has been experimenting with things based on this.

danpovey · 2018-05-21T22:40:34Z

src/featbin/append-ivector-to-feats.cc

@@ -0,0 +1,231 @@
+// featbin/append-ivector-to-feats.cc


I think this file is probably not necessary; the nnet2 scripts already do this task based on existing binaries.

danpovey · 2018-05-21T22:40:59Z

src/cudamatrix/cu-vector-test.cc

@@ -345,16 +345,7 @@ template<typename Real> void CuVectorUnitTestSum() {
    A.SetRandn();
    ones.Set(1.0);

-    Real x = VecVec(A, ones);
-    Real y = A.Sum();


Phis and a couple of other files seem to have some changes that are reversions of recent changes to master. Possibly a sign of some weirdness in your use of git?

danpovey · 2018-05-21T22:41:42Z

src/cudamatrix/cu-matrix.h

+  ////           FSMN kernel functions          ///////
+  ////////////////////////////////////////////////////
+
+  // forward operation in memory block


If we were to commit this stuff I'd probably want these to be non-member functions as they are pretty specialized-- maybe in cu-math.h. But see my overall review.

Thanks a lot for the review and suggestion. I am agree that add these DFSMN specialized kernel function is conflict with the code style of kaldi. We will try to think of some alternative methods although it may not efficient as the CUDA kernel function.

I was just saying they should be non-member functions, they would still be CUDA kernels.
But that was a minor point, and fixing it would likely not make me want to merge this.
My problem is a little more fundamental-- it's that there is a lot of code, it's based on a setup that is basically deprecated, and it provides no clear advantage versus our current best models based on TDNN-F.

KarelVesely84 · 2018-05-24T13:10:44Z

Hi, sorry for not reacting sooner. I like the idea of a having a new 'nnet1' model that will be better than the current one... For the moment, I am busy with other project, but I'll put a note to return to this later... Thanks, K.

KarelVesely84 · 2018-05-28T14:27:28Z

src/nnet/nnet-linear-transform.h

-    linearity_.Resize(OutputDim(), InputDim());
-    RandGauss(0.0, param_stddev, &linearity_);
+    //  if Xavier_flag=1, use the “Xavier” initialization
+    if(xavier_flag){


Why do you need this 'special' initialization hard-coded?

Xavier Glorot's initialization was for the networks only.

It is possible to set the mean/variance of the initialization externally.
Was it helpful in some way?

Xavier Glorot's initialization is suitable for networks with ReLU activation. In our previous works (https://pdfs.semanticscholar.org/d39f/468f285fd4034bceaa37b9f3b7e2285f99fe.pdf), we found that ReLU-DNN is more sensitive to dynamic range of initial weights and the scaled-Xavier Glorot's initialization (by adding a beta=0.5) provides a good initialization.

Setting the mean/variance of the initialization externally is OK. But for NN with different hidden layer size we need to calculate the dynamic range and then set the Gaussian variance.

then -> done fi

Modify the ivector folder name

revise proto dir

danpovey · 2018-08-07T18:23:56Z

Guys, I just wanted to mention that @GaofengCheng has been working on various DFSMN-related ideas (also incorporating some ideas from our latest TDNN-F models with resnet-style skip connections) and he has now been able to match our latest TDNN-F numbers with a DFSMN-style system. I hope with more tuning he will get even better results. If it seems to outperform TDNN-F more generally, we will definitely switch to more DFSMN-type models.

zhuanaa · 2020-02-25T02:03:47Z

Hi，@tramphero. After change the filename and relative path, I got some error log as below:
nnet-train-fsmn-streams --cross-validate=true --randomize=false --verbose=0 --minibatch-size=4096 --feature-transform=exp/tri7b_DFSMN_M/final.feature_transform "ark:copy-feats scp:exp/tri7b_DFSMN_M/cv.scp ark:- | apply-cmvn --norm-means=true --norm-vars=false --utt2spk=ark:data_fbank/dev_clean/utt2spk scp:data_fbank/dev_clean/cmvn.scp ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- | append-ivector-to-feats --online-ivector-period=10 ark:- 'scp:exp/nnet3_cleaned/ivectors_train_960_dev_hires/ivector_online.scp' ark:- |" 'ark:ali-to-pdf exp/tri6b_cleaned_ali_train_960_cleaned/final.mdl "ark:gunzip -c exp/tri6b_cleaned_ali_dev_clean/ali..gz |" ark:- | ali-to-post ark:- ark:- |' exp/tri7b_DFSMN_M/nnet.init
LOG (nnet-train-fsmn-streams[5.4.1641-43822]:SelectGpuId():cu-device.cc:191) CUDA setup operating under Compute Exclusive Mode.
LOG (nnet-train-fsmn-streams[5.4.1641-43822]:FinalizeActiveGpu():cu-device.cc:247) The active GPU is [0]: Tesla P40 free:22688M, used:231M, total:22919M, free/total:0.989921 version 6.1
copy-feats scp:exp/tri7b_DFSMN_M/cv.scp ark:-
add-deltas --delta-order=2 ark:- ark:-
append-ivector-to-feats --online-ivector-period=10 ark:- scp:exp/nnet3_cleaned/ivectors_train_960_dev_hires/ivector_online.scp ark:-
apply-cmvn --norm-means=true --norm-vars=false --utt2spk=ark:data_fbank/dev_clean/utt2spk scp:data_fbank/dev_clean/cmvn.scp ark:- ark:-
LOG (nnet-train-fsmn-streams[5.4.1641-43822]:Init():nnet-randomizer.cc:32) Seeding by srand with : 777
LOG (nnet-train-fsmn-streams[5.4.1641-43822]:main():nnet-train-fsmn-streams.cc:163) CROSS-VALIDATION STARTED
ali-to-pdf exp/tri6b_cleaned_ali_train_960_cleaned/final.mdl 'ark:gunzip -c exp/tri6b_cleaned_ali_dev_clean/ali..gz |' ark:-
ali-to-post ark:- ark:-
ERROR (nnet-train-fsmn-streams[5.4.164~1-43822]:PosteriorToMatrix():posterior.cc:484) Out-of-bound Posterior element with index 5799, higher than number of columns 5777

[ Stack-Trace: ]
nnet-train-fsmn-streams() [0x767010]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
void kaldi::PosteriorToMatrix(std::vector<std::vector<std::pair<int, float>, std::allocator<std::pair<int, float> > >, std::allocator<std::vector<std::pair<int, float>, std::allocator<std::pair<int, float> > > > > const&, int, kaldi::Matrix)
void kaldi::nnet1::PosteriorToMatrix(std::vector<std::vector<std::pair<int, float>, std::allocator<std::pair<int, float> > >, std::allocator<std::vector<std::pair<int, float>, std::allocator<std::pair<int, float> > > > > const&, int, kaldi::CuMatrix)
kaldi::nnet1::Xent::Eval(kaldi::VectorBase const&, kaldi::CuMatrixBase const&, std::vector<std::vector<std::pair<int, float>, std::allocator<std::pair<int, float> > >, std::allocator<std::vector<std::pair<int, float>, std::allocator<std::pair<int, float> > > > > const&, kaldi::CuMatrix*)
main
__libc_start_main
nnet-train-fsmn-streams() [0x580009]

WARNING (nnet-train-fsmn-streams[5.4.1641-43822]:Close():kaldi-io.cc:515) Pipe ali-to-pdf exp/tri6b_cleaned_ali_train_960_cleaned/final.mdl "ark:gunzip -c exp/tri6b_cleaned_ali_dev_clean/ali.*.gz |" ark:- | ali-to-post ark:- ark:- | had nonzero return status 36096
WARNING (nnet-train-fsmn-streams[5.4.1641-43822]:Close():kaldi-io.cc:515) Pipe copy-feats scp:exp/tri7b_DFSMN_M/cv.scp ark:- | apply-cmvn --norm-means=true --norm-vars=false --utt2spk=ark:data_fbank/dev_clean/utt2spk scp:data_fbank/dev_clean/cmvn.scp ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- | append-ivector-to-feats --online-ivector-period=10 ark:- 'scp:exp/nnet3_cleaned/ivectors_train_960_dev_hires/ivector_online.scp' ark:- | had nonzero return status 36096
Have you come across this? Thank you for help me?

tramphero · 2020-02-25T02:20:58Z

"ERROR (nnet-train-fsmn-streams[5.4.164~1-43822]:PosteriorToMatrix():posterior.cc:484) Out-of-bound Posterior element with index 5799, higher than number of columns 5777" You need to modify the output layer of the net.proto to match the number of modeling units.

…

------------------------------------------------------------------ 发件人：zhuanaa <notifications@github.com> 发送时间：2020年2月25日(星期二) 10:03 收件人：kaldi-asr/kaldi <kaldi@noreply.github.com> 抄　送：张仕良(谵良) <sly.zsl@alibaba-inc.com>; Mention <mention@noreply.github.com> 主　题：Re: [kaldi-asr/kaldi] WIP: add DFSMN related codes and example scripts (#2437) Hi，@tramphero. After change the filename and relative path, I got some error log as below: nnet-train-fsmn-streams --cross-validate=true --randomize=false --verbose=0 --minibatch-size=4096 --feature-transform=exp/tri7b_DFSMN_M/final.feature_transform "ark:copy-feats scp:exp/tri7b_DFSMN_M/cv.scp ark:- | apply-cmvn --norm-means=true --norm-vars=false --utt2spk=ark:data_fbank/dev_clean/utt2spk scp:data_fbank/dev_clean/cmvn.scp ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- | append-ivector-to-feats --online-ivector-period=10 ark:- 'scp:exp/nnet3_cleaned/ivectors_train_960_dev_hires/ivector_online.scp' ark:- |" 'ark:ali-to-pdf exp/tri6b_cleaned_ali_train_960_cleaned/final.mdl "ark:gunzip -c exp/tri6b_cleaned_ali_dev_clean/ali..gz |" ark:- | ali-to-post ark:- ark:- |' exp/tri7b_DFSMN_M/nnet.init LOG (nnet-train-fsmn-streams[5.4.1641-43822]:SelectGpuId():cu-device.cc:191) CUDA setup operating under Compute Exclusive Mode. LOG (nnet-train-fsmn-streams[5.4.1641-43822]:FinalizeActiveGpu():cu-device.cc:247) The active GPU is [0]: Tesla P40 free:22688M, used:231M, total:22919M, free/total:0.989921 version 6.1 copy-feats scp:exp/tri7b_DFSMN_M/cv.scp ark:- add-deltas --delta-order=2 ark:- ark:- append-ivector-to-feats --online-ivector-period=10 ark:- scp:exp/nnet3_cleaned/ivectors_train_960_dev_hires/ivector_online.scp ark:- apply-cmvn --norm-means=true --norm-vars=false --utt2spk=ark:data_fbank/dev_clean/utt2spk scp:data_fbank/dev_clean/cmvn.scp ark:- ark:- LOG (nnet-train-fsmn-streams[5.4.1641-43822]:Init():nnet-randomizer.cc:32) Seeding by srand with : 777 LOG (nnet-train-fsmn-streams[5.4.1641-43822]:main():nnet-train-fsmn-streams.cc:163) CROSS-VALIDATION STARTED ali-to-pdf exp/tri6b_cleaned_ali_train_960_cleaned/final.mdl 'ark:gunzip -c exp/tri6b_cleaned_ali_dev_clean/ali..gz |' ark:- ali-to-post ark:- ark:- ERROR (nnet-train-fsmn-streams[5.4.164~1-43822]:PosteriorToMatrix():posterior.cc:484) Out-of-bound Posterior element with index 5799, higher than number of columns 5777 [ Stack-Trace: ] nnet-train-fsmn-streams() [0x767010] kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*) kaldi::MessageLogger::~MessageLogger() void kaldi::PosteriorToMatrix(std::vector<std::vector<std::pair<int, float>, std::allocator<std::pair<int, float> > >, std::allocator<std::vector<std::pair<int, float>, std::allocator<std::pair<int, float> > > > > const&, int, kaldi::Matrix) void kaldi::nnet1::PosteriorToMatrix(std::vector<std::vector<std::pair<int, float>, std::allocator<std::pair<int, float> > >, std::allocator<std::vector<std::pair<int, float>, std::allocator<std::pair<int, float> > > > > const&, int, kaldi::CuMatrix) kaldi::nnet1::Xent::Eval(kaldi::VectorBase const&, kaldi::CuMatrixBase const&, std::vector<std::vector<std::pair<int, float>, std::allocator<std::pair<int, float> > >, std::allocator<std::vector<std::pair<int, float>, std::allocator<std::pair<int, float> > > > > const&, kaldi::CuMatrix*) main __libc_start_main nnet-train-fsmn-streams() [0x580009] WARNING (nnet-train-fsmn-streams[5.4.1641-43822]:Close():kaldi-io.cc:515) Pipe ali-to-pdf exp/tri6b_cleaned_ali_train_960_cleaned/final.mdl "ark:gunzip -c exp/tri6b_cleaned_ali_dev_clean/ali.*.gz |" ark:- | ali-to-post ark:- ark:- | had nonzero return status 36096 WARNING (nnet-train-fsmn-streams[5.4.1641-43822]:Close():kaldi-io.cc:515) Pipe copy-feats scp:exp/tri7b_DFSMN_M/cv.scp ark:- | apply-cmvn --norm-means=true --norm-vars=false --utt2spk=ark:data_fbank/dev_clean/utt2spk scp:data_fbank/dev_clean/cmvn.scp ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- | append-ivector-to-feats --online-ivector-period=10 ark:- 'scp:exp/nnet3_cleaned/ivectors_train_960_dev_hires/ivector_online.scp' ark:- | had nonzero return status 36096 Have you come across this? Thank you for help me? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

zhuanaa · 2020-02-25T03:21:49Z

have fixed it according to #2530
thank you

add DFSMN related codes and example scripts

f484e57

danpovey reviewed May 21, 2018

View reviewed changes

Merge branch 'master' into master

bbf8fc9

KarelVesely84 reviewed May 28, 2018

View reviewed changes

tramphero added 9 commits June 7, 2018 11:01

add fbank.conf

6d9e7b4

update the scripts

c905b0f

update the run_fsmn_ivector.sh

582c287

Merge branch 'master' of https://github.com/tramphero/kaldi

71ef1ca

bug fix

15b299d

Merge branch 'master' of https://github.com/tramphero/kaldi

e94912e

update the run_fsmn_ivector.sh

be37f8a

then -> done fi

Update run_fsmn_ivector.sh

f9dc85b

Update nnet-forward.cc

4c63588

KarelVesely84 changed the title ~~add DFSMN related codes and example scripts~~ WIP: add DFSMN related codes and example scripts Jul 24, 2018

tramphero added 2 commits August 2, 2018 11:42

Update run_fsmn_ivector.sh

48a83dc

Modify the ivector folder name

Update run_fsmn_ivector.sh

512768d

revise proto dir

tramphero added 3 commits August 14, 2018 15:29

update

0a3f32e

update

58c88e6

update

fdd8441

danpovey closed this Dec 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: add DFSMN related codes and example scripts #2437

WIP: add DFSMN related codes and example scripts #2437

tramphero commented May 21, 2018 •

edited

danpovey left a comment

danpovey May 21, 2018

danpovey May 21, 2018

danpovey May 21, 2018

tramphero May 22, 2018

danpovey May 22, 2018

KarelVesely84 commented May 24, 2018

KarelVesely84 May 28, 2018

tramphero May 29, 2018

danpovey commented Aug 7, 2018

zhuanaa commented Feb 25, 2020

tramphero commented Feb 25, 2020 via email

zhuanaa commented Feb 25, 2020

WIP: add DFSMN related codes and example scripts #2437

WIP: add DFSMN related codes and example scripts #2437

Conversation

tramphero commented May 21, 2018 • edited

danpovey left a comment

Choose a reason for hiding this comment

danpovey May 21, 2018

Choose a reason for hiding this comment

danpovey May 21, 2018

Choose a reason for hiding this comment

danpovey May 21, 2018

Choose a reason for hiding this comment

tramphero May 22, 2018

Choose a reason for hiding this comment

danpovey May 22, 2018

Choose a reason for hiding this comment

KarelVesely84 commented May 24, 2018

KarelVesely84 May 28, 2018

Choose a reason for hiding this comment

tramphero May 29, 2018

Choose a reason for hiding this comment

danpovey commented Aug 7, 2018

zhuanaa commented Feb 25, 2020

tramphero commented Feb 25, 2020 via email

zhuanaa commented Feb 25, 2020

tramphero commented May 21, 2018 •

edited