WIP: add TDNNF to pytorch. #3892

csukuangfj · 2020-01-30T02:45:20Z

We are trying to replace TDNN with TDNNF in kaldi pybind training with PyTorch.

danpovey · 2020-01-30T05:34:41Z

Cool!
The orthonormalization is fairly important. It should probably be implemented as some kind of post-update-hook, not sure what those are called? Or as some kind of modification to the trainer, but I think post-update-hook would be easier.

csukuangfj · 2020-01-30T05:42:23Z

I find kaldi invokes

kaldi/src/nnet3/nnet-training.cc

Lines 120 to 122 in 5882dc5

    
           // The following will only do something if we have a LinearComponent 
        
           // or AffineComponent with orthonormal-constraint set to a nonzero value. 
        
           ConstrainOrthonormal(nnet_);

right after the update of parameters.

I am going to invoke it after calling optimizer.step()

danpovey · 2020-01-30T05:52:51Z

Yeah that makes sense.

…

On Thu, Jan 30, 2020 at 1:42 PM Fangjun Kuang ***@***.***> wrote: I find kaldi invokes https://github.com/kaldi-asr/kaldi/blob/5882dc51724b25d41799cad16a7c06c52a259503/src/nnet3/nnet-training.cc#L120-L122 right after the update of parameters. I am going to invoke it after calling optimizer.step() — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3892?email_source=notifications&email_token=AAZFLO6LR6GGGM4VN6RM7DDRAJSD7A5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKJYFCY#issuecomment-580092555>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO7JCND7T5TQKFPH2J3RAJSD7ANCNFSM4KNO5YQA> .

csukuangfj · 2020-01-31T06:23:55Z

decoding result for TDNNF is as follows:

	TDNN (PyTorch)	TDNNF (PyTorch)	TDNNF (kaldi, tdnn_1c)
dev_cer	8.22	7.26	5.71
dev_wer	16.66	15.49	13.49
test_cer	9.98	9.21	6.65
test_wer	18.89	17.98	15.18

The first column is from
https://github.com/kaldi-asr/kaldi/blob/pybind11/egs/aishell/s10/local/run_chain.sh#L236

The second column is the result from this pullrequest.

The third column comes from #3868

The second column has greater number of layers and larger hidden dim than the first column.
I am not sure whether the improvement in cer/wer is due to factorized TDNN or the adoption of a larger network.

The second column has almost the same topology as the third column. The differences are

we use high-resolution MFCC (40 dim) + pitch (3 dim) = 43 dim features
we do not use GeneralDropoutComponent
we use [-1, 0, 1] for the orthonormalization layer

I am not sure whether the above differences cause inferior results for PyTorch.

Another difference is the alignment information:

kaldi uses exp/tri3_ali
PyTorch uses exp/tri5a_ali

csukuangfj · 2020-01-31T08:36:55Z

@danpovey

I have removed pitch and am running the training again. Now the feature part
in PyTorch is the same as kaldi's.

Regarding the [-1, 0, 1], TDNN networks use input=Append(-1,0,1) and I call it [-1, 0, 1].

As for the weight matrix M of a TDNN layer, this paper https://www.danielpovey.com/files/2018_interspeech_tdnnf.pdf splits
M into two parts: M = A B, i.e., splits one TDNN layer into two layers:

the first layer is a linear layer with weight matrix B, where B B^T == Identity. The input of this layer is [-1, 0]
the second layer is an affine layer with weight matrix A; the input of this layer is [0, 1]

In the PyTorch implementation, we use [-1, 0, 1] for the first linear layer and there is no splicing
in the second affine layer.

The above paper also proposes 3-stage splicing, i.e., inserting a 2x1 conv layer between
the first and the second layer. But I find that kaldi has not implemented 3-stage splicing.
I guess it is for computation efficiency reasons that kaldi does not implement it. If both3-stage splicing and frame subsampling factor == 3 are used, you have to perform computation at every layer for every frame.

csukuangfj · 2020-01-31T09:48:17Z

after removing pitch, the result becomes a little worse

	with pitch (PyTorch)	without pitch (PyTorch)
dev_cer	7.26	7.52
dev_wer	15.49	15.70
test_cer	9.21	9.27
test_wer	17.98	18.04

The above table is copied here for better comparison

	TDNN (PyTorch)	TDNNF (PyTorch)	TDNNF (kaldi, tdnn_1c)
dev_cer	8.22	7.26	5.71
dev_wer	16.66	15.49	13.49
test_cer	9.98	9.21	6.65
test_wer	18.89	17.98	15.18

jtrmal · 2020-01-31T09:51:01Z

@csukuangfj -- did you look at the likelihoods? I wonder if it is overtraining or undertraining?

csukuangfj · 2020-01-31T09:59:06Z

@jtrmal
Did you mean the objective function value?

Part of the training log is as follows:

2020-01-31 16:17:23,226 INFO [train.py:185] epoch 0, learning rate 0.001
2020-01-31 16:17:23,536 INFO [train.py:102] Process 0/3161(0.000000%) global average objf: -1.195890 over 6400.0 frames, current batch average objf: -1.195890 over 6400 frames, epoch 0
2020-01-31 16:17:44,072 INFO [train.py:102] Process 100/3161(3.163556%) global average objf: -0.687263 over 573696.0 frames, current batch average objf: -0.457086 over 6400 frames, epoch 0
2020-01-31 16:18:04,479 INFO [train.py:102] Process 200/3161(6.327112%) global average objf: -0.535040 over 1138432.0 frames, current batch average objf: -0.338968 over 6400 frames, epoch 0
2020-01-31 16:18:24,999 INFO [train.py:102] Process 300/3161(9.490668%) global average objf: -0.453431 over 1704064.0 frames, current batch average objf: -0.261345 over 6400 frames, epoch 0
2020-01-31 16:18:45,192 INFO [train.py:102] Process 400/3161(12.654223%) global average objf: -0.402034 over 2267136.0 frames, current batch average objf: -0.242083 over 6400 frames, epoch 0
....
2020-01-31 17:21:32,249 INFO [train.py:102] Process 2800/3161(88.579563%) global average objf: -0.060549 over 15840896.0 frames, current batch average objf: -0.064717 over 3840 frames, epoch 5
2020-01-31 17:21:53,120 INFO [train.py:102] Process 2900/3161(91.743119%) global average objf: -0.060385 over 16406528.0 frames, current batch average objf: -0.066644 over 3840 frames, epoch 5
2020-01-31 17:22:14,151 INFO [train.py:102] Process 3000/3161(94.906675%) global average objf: -0.060270 over 16973824.0 frames, current batch average objf: -0.047593 over 6400 frames, epoch 5
2020-01-31 17:22:34,801 INFO [train.py:102] Process 3100/3161(98.070231%) global average objf: -0.060135 over 17539456.0 frames, current batch average objf: -0.050985 over 6400 frames, epoch 5

The screenshot of the tensorboard is

How can you tell whether it is underfitting or overfitting from the objective function value?

jtrmal · 2020-01-31T10:02:55Z

well, we compute statistics from a small held-out subset the training data (in the original kaldi training). Those are the 'valid' logs (iirc). I was wondering if something would be visible using those... y,

…

On Fri, Jan 31, 2020 at 10:59 AM Fangjun Kuang ***@***.***> wrote: @jtrmal <https://github.com/jtrmal> Did you mean the objective function value? Part of the training log is as follows: 2020-01-31 16:17:23,226 INFO [train.py:185] epoch 0, learning rate 0.001 2020-01-31 16:17:23,536 INFO [train.py:102] Process 0/3161(0.000000%) global average objf: -1.195890 over 6400.0 frames, current batch average objf: -1.195890 over 6400 frames, epoch 0 2020-01-31 16:17:44,072 INFO [train.py:102] Process 100/3161(3.163556%) global average objf: -0.687263 over 573696.0 frames, current batch average objf: -0.457086 over 6400 frames, epoch 0 2020-01-31 16:18:04,479 INFO [train.py:102] Process 200/3161(6.327112%) global average objf: -0.535040 over 1138432.0 frames, current batch average objf: -0.338968 over 6400 frames, epoch 0 2020-01-31 16:18:24,999 INFO [train.py:102] Process 300/3161(9.490668%) global average objf: -0.453431 over 1704064.0 frames, current batch average objf: -0.261345 over 6400 frames, epoch 0 2020-01-31 16:18:45,192 INFO [train.py:102] Process 400/3161(12.654223%) global average objf: -0.402034 over 2267136.0 frames, current batch average objf: -0.242083 over 6400 frames, epoch 0 .... 2020-01-31 17:21:32,249 INFO [train.py:102] Process 2800/3161(88.579563%) global average objf: -0.060549 over 15840896.0 frames, current batch average objf: -0.064717 over 3840 frames, epoch 5 2020-01-31 17:21:53,120 INFO [train.py:102] Process 2900/3161(91.743119%) global average objf: -0.060385 over 16406528.0 frames, current batch average objf: -0.066644 over 3840 frames, epoch 5 2020-01-31 17:22:14,151 INFO [train.py:102] Process 3000/3161(94.906675%) global average objf: -0.060270 over 16973824.0 frames, current batch average objf: -0.047593 over 6400 frames, epoch 5 2020-01-31 17:22:34,801 INFO [train.py:102] Process 3100/3161(98.070231%) global average objf: -0.060135 over 17539456.0 frames, current batch average objf: -0.050985 over 6400 frames, epoch 5 The screenshot of the tensorboard is [image: Screen Shot 2020-01-31 at 17 55 07] <https://user-images.githubusercontent.com/5284924/73530317-3592aa00-4453-11ea-9052-22026343cd0a.png> How can you tell whether it is underfitting or overfitting from the objective function value? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3892?email_source=notifications&email_token=ACUKYX5EQBNRYFGQPYGKNJ3RAPY6XA5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKOEL7I#issuecomment-580666877>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACUKYX3FEOF7DT5YBGYXVUTRAPY6XANCNFSM4KNO5YQA> .

RuABraun · 2020-01-31T10:22:30Z

Could the difference by explained by the different optimizers (Adam v NSGD) ?

danpovey · 2020-01-31T10:33:49Z

Let's keep the features the same for now while we work out the other differences.
There are likely quite a few differences and I want to add more diagnostics to the PyTorch setup to help track it down in more detail.

jtrmal · 2020-01-31T10:37:12Z

I agree. Great progress nonetheless!. y.

…

On Fri, Jan 31, 2020 at 11:33 AM Daniel Povey ***@***.***> wrote: Let's keep the features the same for now while we work out the other differences. There are likely quite a few differences and I want to add more diagnostics to the PyTorch setup to help track it down in more detail. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3892?email_source=notifications&email_token=ACUKYX3ND2Q4SBXCD6QRQHDRAP5A5A5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKOHL6Y#issuecomment-580679163>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACUKYX3LW27XXKKO6LTVEVDRAP5A5ANCNFSM4KNO5YQA> .

qindazhu · 2020-02-01T10:29:10Z

Run tdnn_1c by removing DropoutComponent， the corresponding neural net config (replacing relu-batchnorm-dropout-layer with relu-batchnorm-layer, removing GeneralDropoutComponent in tdnnf) is as below.

  num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
  learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
  affine_opts="l2-regularize=0.008"
  tdnnf_opts="l2-regularize=0.008  bypass-scale=0.66"
  linear_opts="l2-regularize=0.008 orthonormal-constraint=-1.0"
  prefinal_opts="l2-regularize=0.008"
  output_opts="l2-regularize=0.002"

  input dim=40 name=input
  fixed-affine-layer name=lda input=Append(-1,0,1) affine-transform-file=$dir/configs/lda.mat
  relu-batchnorm-layer name=tdnn1 $affine_opts dim=1024
  tdnnf-layer name=tdnnf2 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=1
  tdnnf-layer name=tdnnf3 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=1
  tdnnf-layer name=tdnnf4 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=1
  tdnnf-layer name=tdnnf5 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=0
  tdnnf-layer name=tdnnf6 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3
  tdnnf-layer name=tdnnf7 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3
  tdnnf-layer name=tdnnf8 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3
  tdnnf-layer name=tdnnf9 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3
  tdnnf-layer name=tdnnf10 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3
  tdnnf-layer name=tdnnf11 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3
  tdnnf-layer name=tdnnf12 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3
  tdnnf-layer name=tdnnf13 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3
  linear-component name=prefinal-l dim=256 $linear_opts

  prefinal-layer name=prefinal-chain input=prefinal-l $prefinal_opts big-dim=1024 small-dim=256
  output-layer name=output include-log-softmax=false dim=$num_targets $output_opts

  prefinal-layer name=prefinal-xent input=prefinal-l $prefinal_opts big-dim=1024 small-dim=256
  output-layer name=output-xent dim=$num_targets learning-rate-factor=$learning_rate_factor $output_opts

Result

==> exp/chain_cleaned_1c_without_dropout/tdnn1c_sp/decode_test/scoring_kaldi/best_cer <==
%WER 6.62 [ 6935 / 104765, 149 ins, 264 del, 6522 sub ] exp/chain_cleaned_1c_without_dropout/tdnn1c_sp/decode_test/cer_12_1.0

==> exp/chain_cleaned_1c_without_dropout/tdnn1c_sp/decode_test/scoring_kaldi/best_wer <==
%WER 15.18 [ 9782 / 64428, 1010 ins, 1290 del, 7482 sub ] exp/chain_cleaned_1c_without_dropout/tdnn1c_sp/decode_test/wer_14_0.0

==> exp/chain_cleaned_1c_without_dropout/tdnn1c_sp/decode_dev/scoring_kaldi/best_cer <==
%WER 5.66 [ 11626 / 205341, 236 ins, 370 del, 11020 sub ] exp/chain_cleaned_1c_without_dropout/tdnn1c_sp/decode_dev/cer_11_0.5

==> exp/chain_cleaned_1c_without_dropout/tdnn1c_sp/decode_dev/scoring_kaldi/best_wer <==
%WER 13.41 [ 17120 / 127698, 1542 ins, 2463 del, 13115 sub ] exp/chain_cleaned_1c_without_dropout/tdnn1c_sp/decode_dev/wer_12_0.0

All result until now

TDNN

	TDNN(Pytorch)	tdnn_1b(Kaldi)
dev_cer	8.22	7.06
dev_wer	16.66	15.11
test_cer	9.98	8.63
test_wer	18.89	17.40

Both of them hold same config with

  input dim=$feat_dim name=input
  fixed-affine-layer name=lda input=Append(-1,0,1) affine-transform-file=$dir/configs/lda.mat
  relu-batchnorm-layer name=tdnn1 dim=625
  relu-batchnorm-layer name=tdnn2 input=Append(-1,0,1) dim=625
  relu-batchnorm-layer name=tdnn3 input=Append(-1,0,1) dim=625
  relu-batchnorm-layer name=tdnn4 input=Append(-3,0,3) dim=625
  relu-batchnorm-layer name=tdnn5 input=Append(-3,0,3) dim=625
  relu-batchnorm-layer name=tdnn6 input=Append(-3,0,3) dim=625
  relu-batchnorm-layer name=prefinal-chain input=tdnn6 dim=625 target-rms=0.5
  output-layer name=output include-log-softmax=false dim=$num_targets max-change=1.5

TDNN-F

	TDNN-F(Pytorch)	tdnn_1c_r_d(Kaldi)	tdnn_1c(Kaldi)	tdnn_1d(Kaldi)
dev_cer	7.26	5.66	5.71	5.51
dev_wer	15.49	13.41	13.49	13.19
test_cer	9.21	6.62	6.65	6.46
test_wer	17.98	15.18	15.18	14.91

They hold the same TDNN-F config at the top, the difference is:

TDNN-F(Pytorch)
without i-vector, without dropout, with pitch (a LITTLE better than the version without pitch), using [-1, 0, 1] for the orthonormalization layer. See above comments from @csukuangfj for more details.
tdnn_1c_r_d
without i-vector, without dropout. BTW, r in the name means removed, d means dropout
tdnn_1c
without i-vector, with dropout
tdnn_1d
with i-vector, with dropout.

It seems that dropout will not make a difference on this dataset(aishell).

csukuangfj · 2020-02-01T10:34:16Z

@qindazhu thanks.
tdnnf-layer is expanded to contain GeneralDropoutComponent.
You can refer to exp/chain_cleaned_1c/tdnn1c_sp/configs/ref.config.

qindazhu · 2020-02-01T10:45:02Z

@qindazhu thanks.
tdnnf-layer is expanded to contain GeneralDropoutComponent.
You can refer to exp/chain_cleaned_1c/tdnn1c_sp/configs/ref.config.

NO, it will not. if you leave parameter dropout-proportion with -1(default value), the result config will not include Dropout.

ref.config of tdnn_1c

component name=tdnnf2.noop type=NoOpComponent dim=1024
component-node name=tdnnf2.noop component=tdnnf2.noop input=Sum(Scale(0.66, tdnn1.dropout), tdnnf2.dropout)
component name=tdnnf3.linear type=TdnnComponent input-dim=1024 output-dim=128 l2-regularize=0.008 max-change=0.75 use-bias=false time-offsets=-1,0 orthonormal-constraint=-1.0
component-node name=tdnnf3.linear component=tdnnf3.linear input=tdnnf2.noop
component name=tdnnf3.affine type=TdnnComponent input-dim=128 output-dim=1024 l2-regularize=0.008 max-change=0.75 time-offsets=0,1
component-node name=tdnnf3.affine component=tdnnf3.affine input=tdnnf3.linear
component name=tdnnf3.relu type=RectifiedLinearComponent dim=1024 self-repair-scale=1e-05
component-node name=tdnnf3.relu component=tdnnf3.relu input=tdnnf3.affine
component name=tdnnf3.batchnorm type=BatchNormComponent dim=1024
component-node name=tdnnf3.batchnorm component=tdnnf3.batchnorm input=tdnnf3.relu
component name=tdnnf3.dropout type=GeneralDropoutComponent dim=1024 dropout-proportion=0.0 continuous=true
component-node name=tdnnf3.dropout component=tdnnf3.dropout input=tdnnf3.batchnorm
component name=tdnnf3.noop type=NoOpComponent dim=1024

ref.config of tdnn_1c_r_d

component-node name=tdnnf2.noop component=tdnnf2.noop input=Sum(Scale(0.66, tdnn1.batchnorm), tdnnf2.batchnorm)
component name=tdnnf3.linear type=TdnnComponent input-dim=1024 output-dim=128 l2-regularize=0.008 max-change=0.75 use-bias=false time-offsets=-1,0 orthonormal-constraint=-1.0
component-node name=tdnnf3.linear component=tdnnf3.linear input=tdnnf2.noop
component name=tdnnf3.affine type=TdnnComponent input-dim=128 output-dim=1024 l2-regularize=0.008 max-change=0.75 time-offsets=0,1
component-node name=tdnnf3.affine component=tdnnf3.affine input=tdnnf3.linear
component name=tdnnf3.relu type=RectifiedLinearComponent dim=1024 self-repair-scale=1e-05
component-node name=tdnnf3.relu component=tdnnf3.relu input=tdnnf3.affine
component name=tdnnf3.batchnorm type=BatchNormComponent dim=1024
component-node name=tdnnf3.batchnorm component=tdnnf3.batchnorm input=tdnnf3.relu
component name=tdnnf3.noop type=NoOpComponent dim=1024
component-node name=tdnnf3.noop component=tdnnf3.noop input=Sum(Scale(0.66, tdnnf2.noop), tdnnf3.batchnorm)
component name=tdnnf4.linear type=TdnnComponent input-dim=1024 output-dim=128 l2-regularize=0.008 max-change=0

csukuangfj · 2020-02-01T11:06:12Z

I see.

danpovey · 2020-02-01T12:25:31Z

Cool, so it looks like dropout was not making a difference in this situation.

…

On Sat, Feb 1, 2020 at 7:06 PM Fangjun Kuang ***@***.***> wrote: I see. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3892?email_source=notifications&email_token=AAZFLO555H7LOOZYBDJW22DRAVJSLA5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKQ2LAQ#issuecomment-581019010>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLOYRXLDXQG72NMP4A7TRAVJSLANCNFSM4KNO5YQA> .

csukuangfj · 2020-02-04T05:22:33Z

@qindazhu
could you turn off max-change and natural gradient optimizer ?

csukuangfj · 2020-02-04T05:43:00Z

By the way, PyTorch is significantly faster than kaldi.

It took about 1 hour in total for 6 epochs in the current pullrequest.

@fanlu reported in this pullrequest that kaldi took about 4 hours in total for 6 epochs.

danpovey · 2020-02-04T05:46:52Z

It might require script level changes to turn off natural gradient. Let me figure out how to do that, I can do it within a couple hours.

…

On Tue, Feb 4, 2020 at 1:43 PM Fangjun Kuang ***@***.***> wrote: By the way, PyTorch is significantly faster than kaldi. It took about 1 hour in total for 6 epochs in the current pullrequest. @fanlu <https://github.com/fanlu> reported in this pullrequest <#3868> that kaldi took about 4 hours in total for 6 epochs. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3892?email_source=notifications&email_token=AAZFLO73QGAY2HZU7SFVHSDRBD56LA5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKWNMQA#issuecomment-581752384>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO4ZQEBD5MCKLX7NRODRBD56LANCNFSM4KNO5YQA> .

csukuangfj · 2020-02-04T05:47:59Z

thanks a lot.

danpovey · 2020-02-04T07:09:00Z

Haowen, I'll give you some general directions... you should probably do the changes to the natural-gradient stuff and the max-change separately, at least at first; if you do both, with the learning rates as they are, it will likely diverge. Also there are two max-change values: one is in the individual layers, passed through to their components (which you could set to -1 in the xconfig (`max-change=-1`) to disable), and one is a global one that's passed into the binary via the python script, probably via --max-change=2.0 to train.py, but run with no args to check. Probably setting to -1 will disable that as well. (It may diverse unless lrates are decreased). To disable natural gradient will require script changes in steps/libs/nnet3/xconfig/composite_layers.py, to XconfigTdnnfLayer, XconfigFinalLayer and XconfigPrefinalLayer. Basically, any instances of NaturalGradientAffineComponent should be change to AffineComponent, and to any layers of type LinearComponent or TdnnComponent, the config `use-natural-gradient=false` should be added to their config lines. For the relu-batchnorm-layer or relu-batchnorm-dropout layer, you can disable natural gradient by adding on the xconfig line: ng-affine-options=alpha=1000000 I know that looks odd... it is a string with value alpha=1000000 (I dont remember if it's single or double quotes to quote a string, but they are not necessary since there are no spaces, I think). After starting training you should search in the progress logs for `alpha` just to identify all potentially natural-gradient components, to verify that either those components have huge alpha or they have use-natural-gradient=false.

…

On Tue, Feb 4, 2020 at 1:48 PM Fangjun Kuang ***@***.***> wrote: thanks a lot. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3892?email_source=notifications&email_token=AAZFLO77O5Q3EKNRZATGTPLRBD6RBA5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKWNU4A#issuecomment-581753456>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO24RVWPC3DIZVP3N4DRBD6RBANCNFSM4KNO5YQA> .

csukuangfj · 2020-02-06T00:16:41Z

Sure, I will draw the L2 norm of all the weight matrices with tensorboard.

fanlu · 2020-02-10T11:55:26Z

I have changed the model structure and forward function, And the result is

	TDNN(Pytorch)
dev_cer	6.67
dev_wer	14.72
test_cer	8.38
test_wer	17.08

But It's slower than before, It's take about 4 hours 20minutes.
And the parameters' shape is like kaldi's

2020-02-10 19:41:18,662 INFO [train4.py:201] name: module.tdnn1_affine.weight, shape: torch.Size([1024, 129])
2020-02-10 19:41:18,662 INFO [train4.py:201] name: module.tdnn1_affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,663 INFO [train4.py:201] name: module.tdnn1_batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,663 INFO [train4.py:201] name: module.tdnn1_batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,663 INFO [train4.py:201] name: module.tdnnfs.0.linear.conv.weight, shape: torch.Size([128, 1024, 2])
2020-02-10 19:41:18,663 INFO [train4.py:201] name: module.tdnnfs.0.affine.weight, shape: torch.Size([1024, 128, 2])
2020-02-10 19:41:18,664 INFO [train4.py:201] name: module.tdnnfs.0.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,664 INFO [train4.py:201] name: module.tdnnfs.0.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,664 INFO [train4.py:201] name: module.tdnnfs.0.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,664 INFO [train4.py:201] name: module.tdnnfs.1.linear.conv.weight, shape: torch.Size([128, 1024, 2])
2020-02-10 19:41:18,665 INFO [train4.py:201] name: module.tdnnfs.1.affine.weight, shape: torch.Size([1024, 128, 2])
2020-02-10 19:41:18,665 INFO [train4.py:201] name: module.tdnnfs.1.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,665 INFO [train4.py:201] name: module.tdnnfs.1.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,665 INFO [train4.py:201] name: module.tdnnfs.1.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,666 INFO [train4.py:201] name: module.tdnnfs.2.linear.conv.weight, shape: torch.Size([128, 1024, 2])
2020-02-10 19:41:18,666 INFO [train4.py:201] name: module.tdnnfs.2.affine.weight, shape: torch.Size([1024, 128, 2])
2020-02-10 19:41:18,666 INFO [train4.py:201] name: module.tdnnfs.2.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,666 INFO [train4.py:201] name: module.tdnnfs.2.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,667 INFO [train4.py:201] name: module.tdnnfs.2.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,667 INFO [train4.py:201] name: module.tdnnfs.3.linear.conv.weight, shape: torch.Size([128, 1024, 1])
2020-02-10 19:41:18,667 INFO [train4.py:201] name: module.tdnnfs.3.affine.weight, shape: torch.Size([1024, 128, 1])
2020-02-10 19:41:18,667 INFO [train4.py:201] name: module.tdnnfs.3.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,668 INFO [train4.py:201] name: module.tdnnfs.3.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,668 INFO [train4.py:201] name: module.tdnnfs.3.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,668 INFO [train4.py:201] name: module.tdnnfs.4.linear.conv.weight, shape: torch.Size([128, 1024, 2])
2020-02-10 19:41:18,668 INFO [train4.py:201] name: module.tdnnfs.4.affine.weight, shape: torch.Size([1024, 128, 2])
2020-02-10 19:41:18,668 INFO [train4.py:201] name: module.tdnnfs.4.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,669 INFO [train4.py:201] name: module.tdnnfs.4.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,669 INFO [train4.py:201] name: module.tdnnfs.4.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,669 INFO [train4.py:201] name: module.tdnnfs.5.linear.conv.weight, shape: torch.Size([128, 1024, 2])
2020-02-10 19:41:18,669 INFO [train4.py:201] name: module.tdnnfs.5.affine.weight, shape: torch.Size([1024, 128, 2])
2020-02-10 19:41:18,670 INFO [train4.py:201] name: module.tdnnfs.5.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,670 INFO [train4.py:201] name: module.tdnnfs.5.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,670 INFO [train4.py:201] name: module.tdnnfs.5.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,670 INFO [train4.py:201] name: module.tdnnfs.6.linear.conv.weight, shape: torch.Size([128, 1024, 2])
2020-02-10 19:41:18,671 INFO [train4.py:201] name: module.tdnnfs.6.affine.weight, shape: torch.Size([1024, 128, 2])
2020-02-10 19:41:18,671 INFO [train4.py:201] name: module.tdnnfs.6.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,671 INFO [train4.py:201] name: module.tdnnfs.6.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,672 INFO [train4.py:201] name: module.tdnnfs.6.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,672 INFO [train4.py:201] name: module.tdnnfs.7.linear.conv.weight, shape: torch.Size([128, 1024, 2])
2020-02-10 19:41:18,672 INFO [train4.py:201] name: module.tdnnfs.7.affine.weight, shape: torch.Size([1024, 128, 2])
2020-02-10 19:41:18,673 INFO [train4.py:201] name: module.tdnnfs.7.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,673 INFO [train4.py:201] name: module.tdnnfs.7.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,673 INFO [train4.py:201] name: module.tdnnfs.7.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,673 INFO [train4.py:201] name: module.tdnnfs.8.linear.conv.weight, shape: torch.Size([128, 1024, 2])
2020-02-10 19:41:18,674 INFO [train4.py:201] name: module.tdnnfs.8.affine.weight, shape: torch.Size([1024, 128, 2])
2020-02-10 19:41:18,674 INFO [train4.py:201] name: module.tdnnfs.8.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,674 INFO [train4.py:201] name: module.tdnnfs.8.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,674 INFO [train4.py:201] name: module.tdnnfs.8.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,674 INFO [train4.py:201] name: module.tdnnfs.9.linear.conv.weight, shape: torch.Size([128, 1024, 2])
2020-02-10 19:41:18,675 INFO [train4.py:201] name: module.tdnnfs.9.affine.weight, shape: torch.Size([1024, 128, 2])
2020-02-10 19:41:18,675 INFO [train4.py:201] name: module.tdnnfs.9.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,675 INFO [train4.py:201] name: module.tdnnfs.9.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,675 INFO [train4.py:201] name: module.tdnnfs.9.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,676 INFO [train4.py:201] name: module.tdnnfs.10.linear.conv.weight, shape: torch.Size([128, 1024, 2])
2020-02-10 19:41:18,676 INFO [train4.py:201] name: module.tdnnfs.10.affine.weight, shape: torch.Size([1024, 128, 2])
2020-02-10 19:41:18,676 INFO [train4.py:201] name: module.tdnnfs.10.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,676 INFO [train4.py:201] name: module.tdnnfs.10.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,677 INFO [train4.py:201] name: module.tdnnfs.10.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,677 INFO [train4.py:201] name: module.tdnnfs.11.linear.conv.weight, shape: torch.Size([128, 1024, 2])
2020-02-10 19:41:18,677 INFO [train4.py:201] name: module.tdnnfs.11.affine.weight, shape: torch.Size([1024, 128, 2])
2020-02-10 19:41:18,677 INFO [train4.py:201] name: module.tdnnfs.11.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,678 INFO [train4.py:201] name: module.tdnnfs.11.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,678 INFO [train4.py:201] name: module.tdnnfs.11.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,678 INFO [train4.py:201] name: module.prefinal_l.conv.weight, shape: torch.Size([256, 1024, 1])
2020-02-10 19:41:18,678 INFO [train4.py:201] name: module.prefinal_chain.affine.weight, shape: torch.Size([1024, 256])
2020-02-10 19:41:18,679 INFO [train4.py:201] name: module.prefinal_chain.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,679 INFO [train4.py:201] name: module.prefinal_chain.batchnorm1.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,679 INFO [train4.py:201] name: module.prefinal_chain.batchnorm1.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,679 INFO [train4.py:201] name: module.prefinal_chain.linear.conv.weight, shape: torch.Size([256, 1024, 1])
2020-02-10 19:41:18,680 INFO [train4.py:201] name: module.prefinal_chain.batchnorm2.weight, shape: torch.Size([256])
2020-02-10 19:41:18,680 INFO [train4.py:201] name: module.prefinal_chain.batchnorm2.bias, shape: torch.Size([256])
2020-02-10 19:41:18,680 INFO [train4.py:201] name: module.output_affine.weight, shape: torch.Size([4336, 256])
2020-02-10 19:41:18,681 INFO [train4.py:201] name: module.output_affine.bias, shape: torch.Size([4336])
2020-02-10 19:41:18,681 INFO [train4.py:201] name: module.prefinal_xent.affine.weight, shape: torch.Size([1024, 256])
2020-02-10 19:41:18,681 INFO [train4.py:201] name: module.prefinal_xent.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,681 INFO [train4.py:201] name: module.prefinal_xent.batchnorm1.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,682 INFO [train4.py:201] name: module.prefinal_xent.batchnorm1.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,682 INFO [train4.py:201] name: module.prefinal_xent.linear.conv.weight, shape: torch.Size([256, 1024, 1])
2020-02-10 19:41:18,682 INFO [train4.py:201] name: module.prefinal_xent.batchnorm2.weight, shape: torch.Size([256])
2020-02-10 19:41:18,682 INFO [train4.py:201] name: module.prefinal_xent.batchnorm2.bias, shape: torch.Size([256])
2020-02-10 19:41:18,683 INFO [train4.py:201] name: module.output_xent_affine.weight, shape: torch.Size([4336, 256])
2020-02-10 19:41:18,683 INFO [train4.py:201] name: module.output_xent_affine.bias, shape: torch.Size([4336])
2020-02-10 19:41:18,683 INFO [train4.py:201] name: module.input_batch_norm.weight, shape: torch.Size([129])
2020-02-10 19:41:18,684 INFO [train4.py:201] name: module.input_batch_norm.bias, shape: torch.Size([129])

danpovey · 2020-02-10T12:38:23Z

Cool! So getting closer. The l2 norm of the parameter matrices, compared with Kaldi's, may tell us what's going on with the optimization and help tune learning rates etc.

fanlu · 2020-02-10T12:53:30Z

I have drawed the distribution and histogram of parameters.
eg:

which layer should I focus on ? And Is there a tool to get l2 norm of kaldi's parameter?

fanlu · 2020-02-10T14:01:51Z

Hi, @csukuangfj
When I use DataParallel to training tdnnf model on multi gpu

model = torch.nn.DataParallel(model.cuda(), device_ids=list(range(args.ngpu)))

I have got an error when criterion called

nnet_output = kaldi.PytorchToCuSubMatrix(to_dlpack(nnet_output_tensor))

and the error msg is below:

ASSERTION_FAILED ([5.5.717~1-e05890d]:ConsumeDLManagedTensor():dlpack/dlpack_pybind.cc:129) Assertion failed: (ctx->device_id == device_id)

should we specify fixed device_id in this function?

csukuangfj · 2020-02-11T00:11:40Z

1. I would use devcice =. ... model = model.to(device) kaldi.SelectGpuDeviceId(get the id from the above device) Do not use model. cuda() 2. I do not have much experience in DataParallel. I think the model forward() can be executed by DataParallel. The loss function has to be executed on the master Gpu, which is usually the one with id==0, and you have to set kaldi to use the same device id as the master Gpu. 3. I would recommend you using ddp, that is, DistributedDataParallel, where you can run model forward () and loss computation in parallel. Sorry I'm busy today. Sent from myMail for iOS Monday, 10 February 2020, 22:13 +0800 from notifications@github.com <notifications@github.com>:

…

Hi, @csukuangfj When I use DataParallel to training tdnnf model on multi gpu model = torch.nn.DataParallel(model.cuda(), device_ids=list(range(args.ngpu))) I have got an error when criterion called nnet_output = kaldi.PytorchToCuSubMatrix(to_dlpack(nnet_output_tensor)) and the error msg is below: ASSERTION_FAILED ([5.5.717~1-e05890d]:ConsumeDLManagedTensor():dlpack/dlpack_pybind.cc:129) Assertion failed: (ctx->device_id == device_id) should we specify fixed device_id in this function? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub , or unsubscribe .

csukuangfj · 2020-02-11T00:43:45Z

You have to set the gpu id of kaldi at the very beginning of the program. No need to set it again inside the loss function. Sent from myMail for iOS Monday 10 February 2020 22:13 +0800 from notifications@github.com <notifications@github.com>:

…

Hi, @csukuangfj When I use DataParallel to training tdnnf model on multi gpu model = torch.nn.DataParallel(model.cuda(), device_ids=list(range(args.ngpu))) I have got an error when criterion called nnet_output = kaldi.PytorchToCuSubMatrix(to_dlpack(nnet_output_tensor)) and the error msg is below: ASSERTION_FAILED ([5.5.717~1-e05890d]:ConsumeDLManagedTensor():dlpack/dlpack_pybind.cc:129) Assertion failed: (ctx->device_id == device_id) should we specify fixed device_id in this function? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub , or unsubscribe .

fanlu · 2020-02-11T01:24:40Z

Ok, I'll try
there is another error when data with speed perturb and mfcc_hires feature training
or I change the weight_decay of adam optimizer from 5e-4 to 8e-3 that is kaldi's default config
there is error msg below.

Traceback (most recent call last):
  File "./chain/train3.py", line 339, in <module>
    main()
  File "./chain/train3.py", line 268, in main
    tf_writer=tf_writer)
  File "./chain/train3.py", line 109, in train_one_epoch
    model.constraint_orthonormal()
  File "/mnt/cfs1_alias1/asr/users/fanlu/task/kaldi_recipe/pybind/s10/chain/model_tdnnf3.py", line 202, in constraint_orthonormal
    self.tdnnfs[i].constraint_orthonormal()
  File "/mnt/cfs1_alias1/asr/users/fanlu/task/kaldi_recipe/pybind/s10/chain/tdnnf_layer2.py", line 221, in constraint_orthonormal
    self.linear.constraint_orthonormal()
  File "/mnt/cfs1_alias1/asr/users/fanlu/task/kaldi_recipe/pybind/s10/chain/tdnnf_layer2.py", line 93, in constraint_orthonormal
    w = _constraint_orthonormal_internal(w)
  File "/mnt/cfs1_alias1/asr/users/fanlu/task/kaldi_recipe/pybind/s10/chain/tdnnf_layer2.py", line 37, in _constraint_orthonormal_internal
    assert ratio > 0.99
AssertionError

danpovey · 2020-02-11T03:52:55Z

I have drawed the distribution and histogram of parameters.
eg:

which layer should I focus on ? And Is there a tool to get l2 norm of kaldi's parameter?

.. Regarding the weights: I don't really understand those plots, but I just wanted the 2-norm, which would be torch.sqrt((some_tensor ** 2).sum()), for each parameter. You might have to write a little code to get it.

danpovey · 2020-02-11T03:54:00Z

Ok, I'll try
there is another error when data with speed perturb and mfcc_hires feature training
or I change the weight_decay of adam optimizer from 5e-4 to 8e-3 that is kaldi's default config
there is error msg below.

Traceback (most recent call last):
  File "./chain/train3.py", line 339, in <module>
    main()
  File "./chain/train3.py", line 268, in main
    tf_writer=tf_writer)
  File "./chain/train3.py", line 109, in train_one_epoch
    model.constraint_orthonormal()
  File "/mnt/cfs1_alias1/asr/users/fanlu/task/kaldi_recipe/pybind/s10/chain/model_tdnnf3.py", line 202, in constraint_orthonormal
    self.tdnnfs[i].constraint_orthonormal()
  File "/mnt/cfs1_alias1/asr/users/fanlu/task/kaldi_recipe/pybind/s10/chain/tdnnf_layer2.py", line 221, in constraint_orthonormal
    self.linear.constraint_orthonormal()
  File "/mnt/cfs1_alias1/asr/users/fanlu/task/kaldi_recipe/pybind/s10/chain/tdnnf_layer2.py", line 93, in constraint_orthonormal
    w = _constraint_orthonormal_internal(w)
  File "/mnt/cfs1_alias1/asr/users/fanlu/task/kaldi_recipe/pybind/s10/chain/tdnnf_layer2.py", line 37, in _constraint_orthonormal_internal
    assert ratio > 0.99
AssertionError

Regarding the weight decay: I would advise to just tune those separately. The constants are defined in quite different ways, and wouldn't even be comparable between adam and SGD, probably.

csukuangfj · 2020-02-11T03:54:09Z

I will do it this week. Sent from myMail for iOS Tuesday, 11 February 2020, 11:52 +0800 from notifications@github.com <notifications@github.com>:

…

>I have drawed the distribution and histogram of parameters. >eg: >which layer should I focus on ? And Is there a tool to get l2 norm of kaldi's parameter? .. Regarding the weights: I don't really understand those plots, but I just wanted the 2-norm, which would be torch.sqrt((some_tensor ** 2).sum()), for each parameter. You might have to write a little code to get it. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub , or unsubscribe .

JiayiFu · 2020-02-11T06:22:34Z

which layer should I focus on ? And Is there a tool to get l2 norm of kaldi's parameter?

Maybe this is a simple way: use kaldi tool: "nnet3-copy --binary=false final.mdl" to convert the mdl file to the text mode and then write a script a compute the 2-norm of weights.

danpovey · 2020-02-11T06:35:28Z

oh sorry for Kaldi's model.. it will be printed in the progress.N.log, search for Norm

…

On Tue, Feb 11, 2020 at 2:22 PM 付嘉懿 ***@***.***> wrote: which layer should I focus on ? And Is there a tool to get l2 norm of kaldi's parameter? Maybe this is a simple way: use kaldi tool: "nnet-am-copy --binary=false final.mdl" to convert the mdl file to the text mode and then write a script a compute the 2-norm of weights. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3892?email_source=notifications&email_token=AAZFLO5VNIK7WCBUT6DKGX3RCI72XA5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELLKSHA#issuecomment-584493340>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO2X4YKGIPDIS7REQFDRCI72XANCNFSM4KNO5YQA> .

fanlu · 2020-02-11T07:07:11Z

I have used https://github.com/XiaoMi/kaldi-onnx.git and torch.norm np.linalg.norm to calculate the l2_norm
@danpovey please have a look and point me sth what you want next. thanks
This is the l2_norm log of kaldi tdnn_1c

tdnn_1c l2-norm
2020-02-11 14:44:31,355 __main__ INFO {'dim': '40', 'name': 'input', 'node_type': 'input-node', 'type': 'Input', 'id': 1}
2020-02-11 14:44:31,356 __main__ INFO {'id': 4, 'type': 'Splice', 'name': 'splice_4', 'input': ['input'], 'context': [-1, 0, 1]}
2020-02-11 14:44:31,372 __main__ INFO {'input': ['splice_4'], 'component': 'lda', 'name': 'lda', 'node_type': 'component-node', 'id': 5, 'params': (120, 120), 'bias': (120,), 'type': 'Gemm', 'raw-type': 'FixedAffine', 'params-l2-norm': 0.3751292, 'bias-l2-norm': 0.02903783}
2020-02-11 14:44:31,375 __main__ INFO {'input': ['lda'], 'component': 'tdnn1.affine', 'name': 'tdnn1.affine', 'node_type': 'component-node', 'id': 6, 'max_change': 0.75, 'params': (1024, 120), 'bias': (1024,), 'rank_in': 20, 'rank_out': 80, 'num_samples_history': 2000.0, 'alpha': 4.0, 'type': 'Gemm', 'raw-type': 'NaturalGradientAffine', 'params-l2-norm': 17.994331, 'bias-l2-norm': 2.1129487}
2020-02-11 14:44:31,375 __main__ INFO {'input': ['tdnn1.affine'], 'component': 'tdnn1.relu', 'name': 'tdnn1.relu', 'node_type': 'component-node', 'id': 7, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 71424.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 1.8933748, 'deriv_avg-l2-norm': 17.662828, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,375 __main__ INFO {'input': ['tdnn1.relu'], 'component': 'tdnn1.batchnorm', 'name': 'tdnn1.batchnorm', 'node_type': 'component-node', 'id': 8, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 179712.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 1.9120138, 'stats_var-l2-norm': 0.18313415}
2020-02-11 14:44:31,376 __main__ INFO {'input': ['tdnn1.batchnorm'], 'component': 'tdnn1.dropout', 'name': 'tdnn1.dropout', 'node_type': 'component-node', 'id': 9, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,376 __main__ INFO {'input': ['tdnn1.dropout'], 'component': 'tdnnf2.linear', 'name': 'tdnnf2.linear', 'node_type': 'component-node', 'id': 10, 'time_offsets': array([-1,  0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 14.65368}
2020-02-11 14:44:31,379 __main__ INFO {'input': ['tdnnf2.linear'], 'component': 'tdnnf2.affine', 'name': 'tdnnf2.affine', 'node_type': 'component-node', 'id': 11, 'time_offsets': array([0, 1]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 12.9894, 'bias-l2-norm': 2.456446}
2020-02-11 14:44:31,380 __main__ INFO {'input': ['tdnnf2.affine'], 'component': 'tdnnf2.relu', 'name': 'tdnnf2.relu', 'node_type': 'component-node', 'id': 12, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 60928.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 12.335568, 'deriv_avg-l2-norm': 15.625699, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,380 __main__ INFO {'input': ['tdnnf2.relu'], 'component': 'tdnnf2.batchnorm', 'name': 'tdnnf2.batchnorm', 'node_type': 'component-node', 'id': 13, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 177792.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 12.284391, 'stats_var-l2-norm': 12.800878}
2020-02-11 14:44:31,380 __main__ INFO {'input': ['tdnnf2.batchnorm'], 'component': 'tdnnf2.dropout', 'name': 'tdnnf2.dropout', 'node_type': 'component-node', 'id': 14, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,380 __main__ INFO {'id': 15, 'type': 'Scale', 'name': 'tdnn1.dropout.Scale.0.66', 'input': ['tdnn1.dropout'], 'scale': 0.66}
2020-02-11 14:44:31,380 __main__ INFO {'id': 16, 'type': 'Sum', 'name': 'tdnn1.dropout.Scale.0.66.Sum.tdnnf2.dropout', 'input': ['tdnn1.dropout.Scale.0.66', 'tdnnf2.dropout']}
2020-02-11 14:44:31,381 __main__ INFO {'input': ['tdnn1.dropout.Scale.0.66.Sum.tdnnf2.dropout'], 'component': 'tdnnf2.noop', 'name': 'tdnnf2.noop', 'node_type': 'component-node', 'id': 17, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,381 __main__ INFO {'input': ['tdnnf2.noop'], 'component': 'tdnnf3.linear', 'name': 'tdnnf3.linear', 'node_type': 'component-node', 'id': 18, 'time_offsets': array([-1,  0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 12.562767}
2020-02-11 14:44:31,382 __main__ INFO {'input': ['tdnnf3.linear'], 'component': 'tdnnf3.affine', 'name': 'tdnnf3.affine', 'node_type': 'component-node', 'id': 19, 'time_offsets': array([0, 1]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.749477, 'bias-l2-norm': 1.5780896}
2020-02-11 14:44:31,382 __main__ INFO {'input': ['tdnnf3.affine'], 'component': 'tdnnf3.relu', 'name': 'tdnnf3.relu', 'node_type': 'component-node', 'id': 20, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 105408.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 13.213889, 'deriv_avg-l2-norm': 15.905553, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,383 __main__ INFO {'input': ['tdnnf3.relu'], 'component': 'tdnnf3.batchnorm', 'name': 'tdnnf3.batchnorm', 'node_type': 'component-node', 'id': 21, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 175872.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 13.222022, 'stats_var-l2-norm': 13.43212}
2020-02-11 14:44:31,383 __main__ INFO {'input': ['tdnnf3.batchnorm'], 'component': 'tdnnf3.dropout', 'name': 'tdnnf3.dropout', 'node_type': 'component-node', 'id': 22, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,383 __main__ INFO {'id': 23, 'type': 'Scale', 'name': 'tdnnf2.noop.Scale.0.66', 'input': ['tdnnf2.noop'], 'scale': 0.66}
2020-02-11 14:44:31,383 __main__ INFO {'id': 24, 'type': 'Sum', 'name': 'tdnnf2.noop.Scale.0.66.Sum.tdnnf3.dropout', 'input': ['tdnnf2.noop.Scale.0.66', 'tdnnf3.dropout']}
2020-02-11 14:44:31,384 __main__ INFO {'input': ['tdnnf2.noop.Scale.0.66.Sum.tdnnf3.dropout'], 'component': 'tdnnf3.noop', 'name': 'tdnnf3.noop', 'node_type': 'component-node', 'id': 25, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,384 __main__ INFO {'input': ['tdnnf3.noop'], 'component': 'tdnnf4.linear', 'name': 'tdnnf4.linear', 'node_type': 'component-node', 'id': 26, 'time_offsets': array([-1,  0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 11.342851}
2020-02-11 14:44:31,385 __main__ INFO {'input': ['tdnnf4.linear'], 'component': 'tdnnf4.affine', 'name': 'tdnnf4.affine', 'node_type': 'component-node', 'id': 27, 'time_offsets': array([0, 1]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 9.7689705, 'bias-l2-norm': 1.2836239}
2020-02-11 14:44:31,385 __main__ INFO {'input': ['tdnnf4.affine'], 'component': 'tdnnf4.relu', 'name': 'tdnnf4.relu', 'node_type': 'component-node', 'id': 28, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 26880.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 12.693966, 'deriv_avg-l2-norm': 15.441769, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,386 __main__ INFO {'input': ['tdnnf4.relu'], 'component': 'tdnnf4.batchnorm', 'name': 'tdnnf4.batchnorm', 'node_type': 'component-node', 'id': 29, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 58624.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 12.727512, 'stats_var-l2-norm': 13.328432}
2020-02-11 14:44:31,386 __main__ INFO {'input': ['tdnnf4.batchnorm'], 'component': 'tdnnf4.dropout', 'name': 'tdnnf4.dropout', 'node_type': 'component-node', 'id': 30, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,386 __main__ INFO {'id': 31, 'type': 'Scale', 'name': 'tdnnf3.noop.Scale.0.66', 'input': ['tdnnf3.noop'], 'scale': 0.66}
2020-02-11 14:44:31,386 __main__ INFO {'id': 32, 'type': 'Sum', 'name': 'tdnnf3.noop.Scale.0.66.Sum.tdnnf4.dropout', 'input': ['tdnnf3.noop.Scale.0.66', 'tdnnf4.dropout']}
2020-02-11 14:44:31,386 __main__ INFO {'input': ['tdnnf3.noop.Scale.0.66.Sum.tdnnf4.dropout'], 'component': 'tdnnf4.noop', 'name': 'tdnnf4.noop', 'node_type': 'component-node', 'id': 33, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,387 __main__ INFO {'input': ['tdnnf4.noop'], 'component': 'tdnnf5.linear', 'name': 'tdnnf5.linear', 'node_type': 'component-node', 'id': 34, 'time_offsets': array([0]), 'params': (128, 1024), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 8.509507}
2020-02-11 14:44:31,387 __main__ INFO {'input': ['tdnnf5.linear'], 'component': 'tdnnf5.affine', 'name': 'tdnnf5.affine', 'node_type': 'component-node', 'id': 35, 'time_offsets': array([0]), 'params': (1024, 128), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 7.895306, 'bias-l2-norm': 1.8074945}
2020-02-11 14:44:31,388 __main__ INFO {'input': ['tdnnf5.affine'], 'component': 'tdnnf5.relu', 'name': 'tdnnf5.relu', 'node_type': 'component-node', 'id': 36, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 35328.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 6.929157, 'deriv_avg-l2-norm': 13.615622, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,388 __main__ INFO {'input': ['tdnnf5.relu'], 'component': 'tdnnf5.batchnorm', 'name': 'tdnnf5.batchnorm', 'node_type': 'component-node', 'id': 37, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 58624.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 6.9155283, 'stats_var-l2-norm': 4.8078284}
2020-02-11 14:44:31,389 __main__ INFO {'input': ['tdnnf5.batchnorm'], 'component': 'tdnnf5.dropout', 'name': 'tdnnf5.dropout', 'node_type': 'component-node', 'id': 38, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,389 __main__ INFO {'id': 39, 'type': 'Scale', 'name': 'tdnnf4.noop.Scale.0.66', 'input': ['tdnnf4.noop'], 'scale': 0.66}
2020-02-11 14:44:31,389 __main__ INFO {'id': 40, 'type': 'Sum', 'name': 'tdnnf4.noop.Scale.0.66.Sum.tdnnf5.dropout', 'input': ['tdnnf4.noop.Scale.0.66', 'tdnnf5.dropout']}
2020-02-11 14:44:31,389 __main__ INFO {'input': ['tdnnf4.noop.Scale.0.66.Sum.tdnnf5.dropout'], 'component': 'tdnnf5.noop', 'name': 'tdnnf5.noop', 'node_type': 'component-node', 'id': 41, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,390 __main__ INFO {'input': ['tdnnf5.noop'], 'component': 'tdnnf6.linear', 'name': 'tdnnf6.linear', 'node_type': 'component-node', 'id': 42, 'time_offsets': array([-3,  0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 11.609792}
2020-02-11 14:44:31,390 __main__ INFO {'input': ['tdnnf6.linear'], 'component': 'tdnnf6.affine', 'name': 'tdnnf6.affine', 'node_type': 'component-node', 'id': 43, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.078222, 'bias-l2-norm': 1.6047498}
2020-02-11 14:44:31,391 __main__ INFO {'input': ['tdnnf6.affine'], 'component': 'tdnnf6.relu', 'name': 'tdnnf6.relu', 'node_type': 'component-node', 'id': 44, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 18624.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 11.288168, 'deriv_avg-l2-norm': 15.204925, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,391 __main__ INFO {'input': ['tdnnf6.relu'], 'component': 'tdnnf6.batchnorm', 'name': 'tdnnf6.batchnorm', 'node_type': 'component-node', 'id': 45, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 56704.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 11.27561, 'stats_var-l2-norm': 10.5281}
2020-02-11 14:44:31,391 __main__ INFO {'input': ['tdnnf6.batchnorm'], 'component': 'tdnnf6.dropout', 'name': 'tdnnf6.dropout', 'node_type': 'component-node', 'id': 46, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,392 __main__ INFO {'id': 47, 'type': 'Scale', 'name': 'tdnnf5.noop.Scale.0.66', 'input': ['tdnnf5.noop'], 'scale': 0.66}
2020-02-11 14:44:31,392 __main__ INFO {'id': 48, 'type': 'Sum', 'name': 'tdnnf5.noop.Scale.0.66.Sum.tdnnf6.dropout', 'input': ['tdnnf5.noop.Scale.0.66', 'tdnnf6.dropout']}
2020-02-11 14:44:31,392 __main__ INFO {'input': ['tdnnf5.noop.Scale.0.66.Sum.tdnnf6.dropout'], 'component': 'tdnnf6.noop', 'name': 'tdnnf6.noop', 'node_type': 'component-node', 'id': 49, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,392 __main__ INFO {'input': ['tdnnf6.noop'], 'component': 'tdnnf7.linear', 'name': 'tdnnf7.linear', 'node_type': 'component-node', 'id': 50, 'time_offsets': array([-3,  0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 11.21937}
2020-02-11 14:44:31,393 __main__ INFO {'input': ['tdnnf7.linear'], 'component': 'tdnnf7.affine', 'name': 'tdnnf7.affine', 'node_type': 'component-node', 'id': 51, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 9.733917, 'bias-l2-norm': 1.7020972}
2020-02-11 14:44:31,394 __main__ INFO {'input': ['tdnnf7.affine'], 'component': 'tdnnf7.relu', 'name': 'tdnnf7.relu', 'node_type': 'component-node', 'id': 52, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 25920.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 9.964567, 'deriv_avg-l2-norm': 14.876908, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,394 __main__ INFO {'input': ['tdnnf7.relu'], 'component': 'tdnnf7.batchnorm', 'name': 'tdnnf7.batchnorm', 'node_type': 'component-node', 'id': 53, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 54784.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 9.983768, 'stats_var-l2-norm': 8.700165}
2020-02-11 14:44:31,394 __main__ INFO {'input': ['tdnnf7.batchnorm'], 'component': 'tdnnf7.dropout', 'name': 'tdnnf7.dropout', 'node_type': 'component-node', 'id': 54, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,394 __main__ INFO {'id': 55, 'type': 'Scale', 'name': 'tdnnf6.noop.Scale.0.66', 'input': ['tdnnf6.noop'], 'scale': 0.66}
2020-02-11 14:44:31,395 __main__ INFO {'id': 56, 'type': 'Sum', 'name': 'tdnnf6.noop.Scale.0.66.Sum.tdnnf7.dropout', 'input': ['tdnnf6.noop.Scale.0.66', 'tdnnf7.dropout']}
2020-02-11 14:44:31,395 __main__ INFO {'input': ['tdnnf6.noop.Scale.0.66.Sum.tdnnf7.dropout'], 'component': 'tdnnf7.noop', 'name': 'tdnnf7.noop', 'node_type': 'component-node', 'id': 57, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,395 __main__ INFO {'input': ['tdnnf7.noop'], 'component': 'tdnnf8.linear', 'name': 'tdnnf8.linear', 'node_type': 'component-node', 'id': 58, 'time_offsets': array([-3,  0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.982727}
2020-02-11 14:44:31,396 __main__ INFO {'input': ['tdnnf8.linear'], 'component': 'tdnnf8.affine', 'name': 'tdnnf8.affine', 'node_type': 'component-node', 'id': 59, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 9.5415945, 'bias-l2-norm': 1.716922}
2020-02-11 14:44:31,397 __main__ INFO {'input': ['tdnnf8.affine'], 'component': 'tdnnf8.relu', 'name': 'tdnnf8.relu', 'node_type': 'component-node', 'id': 60, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 22208.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 9.486838, 'deriv_avg-l2-norm': 14.476424, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,397 __main__ INFO {'input': ['tdnnf8.relu'], 'component': 'tdnnf8.batchnorm', 'name': 'tdnnf8.batchnorm', 'node_type': 'component-node', 'id': 61, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 52864.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 9.471879, 'stats_var-l2-norm': 8.468856}
2020-02-11 14:44:31,397 __main__ INFO {'input': ['tdnnf8.batchnorm'], 'component': 'tdnnf8.dropout', 'name': 'tdnnf8.dropout', 'node_type': 'component-node', 'id': 62, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,397 __main__ INFO {'id': 63, 'type': 'Scale', 'name': 'tdnnf7.noop.Scale.0.66', 'input': ['tdnnf7.noop'], 'scale': 0.66}
2020-02-11 14:44:31,398 __main__ INFO {'id': 64, 'type': 'Sum', 'name': 'tdnnf7.noop.Scale.0.66.Sum.tdnnf8.dropout', 'input': ['tdnnf7.noop.Scale.0.66', 'tdnnf8.dropout']}
2020-02-11 14:44:31,398 __main__ INFO {'input': ['tdnnf7.noop.Scale.0.66.Sum.tdnnf8.dropout'], 'component': 'tdnnf8.noop', 'name': 'tdnnf8.noop', 'node_type': 'component-node', 'id': 65, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,398 __main__ INFO {'input': ['tdnnf8.noop'], 'component': 'tdnnf9.linear', 'name': 'tdnnf9.linear', 'node_type': 'component-node', 'id': 66, 'time_offsets': array([-3,  0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.8021345}
2020-02-11 14:44:31,399 __main__ INFO {'input': ['tdnnf9.linear'], 'component': 'tdnnf9.affine', 'name': 'tdnnf9.affine', 'node_type': 'component-node', 'id': 67, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 9.299649, 'bias-l2-norm': 1.5847737}
2020-02-11 14:44:31,399 __main__ INFO {'input': ['tdnnf9.affine'], 'component': 'tdnnf9.relu', 'name': 'tdnnf9.relu', 'node_type': 'component-node', 'id': 68, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 23296.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 9.041475, 'deriv_avg-l2-norm': 14.187531, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,400 __main__ INFO {'input': ['tdnnf9.relu'], 'component': 'tdnnf9.batchnorm', 'name': 'tdnnf9.batchnorm', 'node_type': 'component-node', 'id': 69, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 50944.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 9.050355, 'stats_var-l2-norm': 8.135726}
2020-02-11 14:44:31,400 __main__ INFO {'input': ['tdnnf9.batchnorm'], 'component': 'tdnnf9.dropout', 'name': 'tdnnf9.dropout', 'node_type': 'component-node', 'id': 70, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,400 __main__ INFO {'id': 71, 'type': 'Scale', 'name': 'tdnnf8.noop.Scale.0.66', 'input': ['tdnnf8.noop'], 'scale': 0.66}
2020-02-11 14:44:31,400 __main__ INFO {'id': 72, 'type': 'Sum', 'name': 'tdnnf8.noop.Scale.0.66.Sum.tdnnf9.dropout', 'input': ['tdnnf8.noop.Scale.0.66', 'tdnnf9.dropout']}
2020-02-11 14:44:31,401 __main__ INFO {'input': ['tdnnf8.noop.Scale.0.66.Sum.tdnnf9.dropout'], 'component': 'tdnnf9.noop', 'name': 'tdnnf9.noop', 'node_type': 'component-node', 'id': 73, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,401 __main__ INFO {'input': ['tdnnf9.noop'], 'component': 'tdnnf10.linear', 'name': 'tdnnf10.linear', 'node_type': 'component-node', 'id': 74, 'time_offsets': array([-3,  0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.755399}
2020-02-11 14:44:31,402 __main__ INFO {'input': ['tdnnf10.linear'], 'component': 'tdnnf10.affine', 'name': 'tdnnf10.affine', 'node_type': 'component-node', 'id': 75, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 9.086278, 'bias-l2-norm': 1.3546987}
2020-02-11 14:44:31,402 __main__ INFO {'input': ['tdnnf10.affine'], 'component': 'tdnnf10.relu', 'name': 'tdnnf10.relu', 'node_type': 'component-node', 'id': 76, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 23232.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 9.134641, 'deriv_avg-l2-norm': 13.997164, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,402 __main__ INFO {'input': ['tdnnf10.relu'], 'component': 'tdnnf10.batchnorm', 'name': 'tdnnf10.batchnorm', 'node_type': 'component-node', 'id': 77, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 49024.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 9.114295, 'stats_var-l2-norm': 8.443671}
2020-02-11 14:44:31,403 __main__ INFO {'input': ['tdnnf10.batchnorm'], 'component': 'tdnnf10.dropout', 'name': 'tdnnf10.dropout', 'node_type': 'component-node', 'id': 78, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,403 __main__ INFO {'id': 79, 'type': 'Scale', 'name': 'tdnnf9.noop.Scale.0.66', 'input': ['tdnnf9.noop'], 'scale': 0.66}
2020-02-11 14:44:31,403 __main__ INFO {'id': 80, 'type': 'Sum', 'name': 'tdnnf9.noop.Scale.0.66.Sum.tdnnf10.dropout', 'input': ['tdnnf9.noop.Scale.0.66', 'tdnnf10.dropout']}
2020-02-11 14:44:31,403 __main__ INFO {'input': ['tdnnf9.noop.Scale.0.66.Sum.tdnnf10.dropout'], 'component': 'tdnnf10.noop', 'name': 'tdnnf10.noop', 'node_type': 'component-node', 'id': 81, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,404 __main__ INFO {'input': ['tdnnf10.noop'], 'component': 'tdnnf11.linear', 'name': 'tdnnf11.linear', 'node_type': 'component-node', 'id': 82, 'time_offsets': array([-3,  0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.697865}
2020-02-11 14:44:31,404 __main__ INFO {'input': ['tdnnf11.linear'], 'component': 'tdnnf11.affine', 'name': 'tdnnf11.affine', 'node_type': 'component-node', 'id': 83, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 8.883663, 'bias-l2-norm': 1.2464023}
2020-02-11 14:44:31,405 __main__ INFO {'input': ['tdnnf11.affine'], 'component': 'tdnnf11.relu', 'name': 'tdnnf11.relu', 'node_type': 'component-node', 'id': 84, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 31680.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 8.557134, 'deriv_avg-l2-norm': 13.096737, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,405 __main__ INFO {'input': ['tdnnf11.relu'], 'component': 'tdnnf11.batchnorm', 'name': 'tdnnf11.batchnorm', 'node_type': 'component-node', 'id': 85, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 47104.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 8.5475025, 'stats_var-l2-norm': 8.175571}
2020-02-11 14:44:31,405 __main__ INFO {'input': ['tdnnf11.batchnorm'], 'component': 'tdnnf11.dropout', 'name': 'tdnnf11.dropout', 'node_type': 'component-node', 'id': 86, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,406 __main__ INFO {'id': 87, 'type': 'Scale', 'name': 'tdnnf10.noop.Scale.0.66', 'input': ['tdnnf10.noop'], 'scale': 0.66}
2020-02-11 14:44:31,406 __main__ INFO {'id': 88, 'type': 'Sum', 'name': 'tdnnf10.noop.Scale.0.66.Sum.tdnnf11.dropout', 'input': ['tdnnf10.noop.Scale.0.66', 'tdnnf11.dropout']}
2020-02-11 14:44:31,406 __main__ INFO {'input': ['tdnnf10.noop.Scale.0.66.Sum.tdnnf11.dropout'], 'component': 'tdnnf11.noop', 'name': 'tdnnf11.noop', 'node_type': 'component-node', 'id': 89, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,406 __main__ INFO {'input': ['tdnnf11.noop'], 'component': 'tdnnf12.linear', 'name': 'tdnnf12.linear', 'node_type': 'component-node', 'id': 90, 'time_offsets': array([-3,  0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.470395}
2020-02-11 14:44:31,407 __main__ INFO {'input': ['tdnnf12.linear'], 'component': 'tdnnf12.affine', 'name': 'tdnnf12.affine', 'node_type': 'component-node', 'id': 91, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 8.823896, 'bias-l2-norm': 1.1193717}
2020-02-11 14:44:31,408 __main__ INFO {'input': ['tdnnf12.affine'], 'component': 'tdnnf12.relu', 'name': 'tdnnf12.relu', 'node_type': 'component-node', 'id': 92, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 16640.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 7.735138, 'deriv_avg-l2-norm': 12.613586, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,408 __main__ INFO {'input': ['tdnnf12.relu'], 'component': 'tdnnf12.batchnorm', 'name': 'tdnnf12.batchnorm', 'node_type': 'component-node', 'id': 93, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 45184.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 7.646413, 'stats_var-l2-norm': 7.3074822}
2020-02-11 14:44:31,408 __main__ INFO {'input': ['tdnnf12.batchnorm'], 'component': 'tdnnf12.dropout', 'name': 'tdnnf12.dropout', 'node_type': 'component-node', 'id': 94, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,408 __main__ INFO {'id': 95, 'type': 'Scale', 'name': 'tdnnf11.noop.Scale.0.66', 'input': ['tdnnf11.noop'], 'scale': 0.66}
2020-02-11 14:44:31,409 __main__ INFO {'id': 96, 'type': 'Sum', 'name': 'tdnnf11.noop.Scale.0.66.Sum.tdnnf12.dropout', 'input': ['tdnnf11.noop.Scale.0.66', 'tdnnf12.dropout']}
2020-02-11 14:44:31,409 __main__ INFO {'input': ['tdnnf11.noop.Scale.0.66.Sum.tdnnf12.dropout'], 'component': 'tdnnf12.noop', 'name': 'tdnnf12.noop', 'node_type': 'component-node', 'id': 97, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,409 __main__ INFO {'input': ['tdnnf12.noop'], 'component': 'tdnnf13.linear', 'name': 'tdnnf13.linear', 'node_type': 'component-node', 'id': 98, 'time_offsets': array([-3,  0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.259589}
2020-02-11 14:44:31,410 __main__ INFO {'input': ['tdnnf13.linear'], 'component': 'tdnnf13.affine', 'name': 'tdnnf13.affine', 'node_type': 'component-node', 'id': 99, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 8.941648, 'bias-l2-norm': 0.99510527}
2020-02-11 14:44:31,410 __main__ INFO {'input': ['tdnnf13.affine'], 'component': 'tdnnf13.relu', 'name': 'tdnnf13.relu', 'node_type': 'component-node', 'id': 100, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 32512.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 5.9976406, 'deriv_avg-l2-norm': 11.490221, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,411 __main__ INFO {'input': ['tdnnf13.relu'], 'component': 'tdnnf13.batchnorm', 'name': 'tdnnf13.batchnorm', 'node_type': 'component-node', 'id': 101, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 43264.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 5.9866805, 'stats_var-l2-norm': 5.459227}
2020-02-11 14:44:31,411 __main__ INFO {'input': ['tdnnf13.batchnorm'], 'component': 'tdnnf13.dropout', 'name': 'tdnnf13.dropout', 'node_type': 'component-node', 'id': 102, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,411 __main__ INFO {'id': 103, 'type': 'Scale', 'name': 'tdnnf12.noop.Scale.0.66', 'input': ['tdnnf12.noop'], 'scale': 0.66}
2020-02-11 14:44:31,411 __main__ INFO {'id': 104, 'type': 'Sum', 'name': 'tdnnf12.noop.Scale.0.66.Sum.tdnnf13.dropout', 'input': ['tdnnf12.noop.Scale.0.66', 'tdnnf13.dropout']}
2020-02-11 14:44:31,411 __main__ INFO {'input': ['tdnnf12.noop.Scale.0.66.Sum.tdnnf13.dropout'], 'component': 'tdnnf13.noop', 'name': 'tdnnf13.noop', 'node_type': 'component-node', 'id': 105, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,412 __main__ INFO {'input': ['tdnnf13.noop'], 'component': 'prefinal-l', 'name': 'prefinal-l', 'node_type': 'component-node', 'id': 106, 'params': (256, 1024), 'rank_inout': 20, 'alpha': 4.0, 'num_samples_history': 2000.0, 'type': 'Linear', 'raw-type': 'Linear', 'params-l2-norm': 14.927766}
2020-02-11 14:44:31,412 __main__ INFO {'input': ['prefinal-l'], 'component': 'prefinal-chain.affine', 'name': 'prefinal-chain.affine', 'node_type': 'component-node', 'id': 107, 'max_change': 0.75, 'params': (1024, 256), 'bias': (1024,), 'rank_in': 20, 'rank_out': 80, 'num_samples_history': 2000.0, 'alpha': 4.0, 'type': 'Gemm', 'raw-type': 'NaturalGradientAffine', 'params-l2-norm': 9.80533, 'bias-l2-norm': 1.433072}
2020-02-11 14:44:31,413 __main__ INFO {'input': ['prefinal-chain.affine'], 'component': 'prefinal-chain.relu', 'name': 'prefinal-chain.relu', 'node_type': 'component-node', 'id': 108, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 19712.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 6.8817225, 'deriv_avg-l2-norm': 12.461684, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,413 __main__ INFO {'input': ['prefinal-chain.relu'], 'component': 'prefinal-chain.batchnorm1', 'name': 'prefinal-chain.batchnorm1', 'node_type': 'component-node', 'id': 109, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 43264.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 6.939498, 'stats_var-l2-norm': 6.1382785}
2020-02-11 14:44:31,413 __main__ INFO {'input': ['prefinal-chain.batchnorm1'], 'component': 'prefinal-chain.linear', 'name': 'prefinal-chain.linear', 'node_type': 'component-node', 'id': 110, 'params': (256, 1024), 'rank_inout': 20, 'alpha': 4.0, 'num_samples_history': 2000.0, 'type': 'Linear', 'raw-type': 'Linear', 'params-l2-norm': 14.669719}
2020-02-11 14:44:31,414 __main__ INFO {'input': ['prefinal-chain.linear'], 'component': 'prefinal-chain.batchnorm2', 'name': 'prefinal-chain.batchnorm2', 'node_type': 'component-node', 'id': 111, 'dim': 256, 'block_dim': 256, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 43264.0, 'stats_mean': (256,), 'stats_var': (256,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 3.6812432e-07, 'stats_var-l2-norm': 21.627495}
2020-02-11 14:44:31,415 __main__ INFO {'input': ['prefinal-chain.batchnorm2'], 'component': 'output.affine', 'name': 'output.affine', 'node_type': 'component-node', 'id': 112, 'max_change': 1.5, 'params': (3448, 256), 'bias': (3448,), 'rank_in': 20, 'rank_out': 80, 'num_samples_history': 2000.0, 'alpha': 4.0, 'type': 'Gemm', 'raw-type': 'NaturalGradientAffine', 'params-l2-norm': 33.547924, 'bias-l2-norm': 6.109993}
2020-02-11 14:44:31,415 __main__ INFO {'objective': 'linear', 'input': ['output.affine'], 'name': 'output', 'node_type': 'output-node', 'type': 'Output', 'id': 113}
2020-02-11 14:44:31,415 __main__ INFO {'input': ['prefinal-l'], 'component': 'prefinal-xent.affine', 'name': 'prefinal-xent.affine', 'node_type': 'component-node', 'id': 114, 'max_change': 0.75, 'params': (1024, 256), 'bias': (1024,), 'rank_in': 20, 'rank_out': 80, 'num_samples_history': 2000.0, 'alpha': 4.0, 'type': 'Gemm', 'raw-type': 'NaturalGradientAffine', 'params-l2-norm': 8.215993, 'bias-l2-norm': 2.358821}
2020-02-11 14:44:31,416 __main__ INFO {'input': ['prefinal-xent.affine'], 'component': 'prefinal-xent.relu', 'name': 'prefinal-xent.relu', 'node_type': 'component-node', 'id': 115, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 23936.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 6.315324, 'deriv_avg-l2-norm': 12.559063, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,416 __main__ INFO {'input': ['prefinal-xent.relu'], 'component': 'prefinal-xent.batchnorm1', 'name': 'prefinal-xent.batchnorm1', 'node_type': 'component-node', 'id': 116, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 43264.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 6.244672, 'stats_var-l2-norm': 3.9114242}
2020-02-11 14:44:31,416 __main__ INFO {'input': ['prefinal-xent.batchnorm1'], 'component': 'prefinal-xent.linear', 'name': 'prefinal-xent.linear', 'node_type': 'component-node', 'id': 117, 'params': (256, 1024), 'rank_inout': 20, 'alpha': 4.0, 'num_samples_history': 2000.0, 'type': 'Linear', 'raw-type': 'Linear', 'params-l2-norm': 10.986344}
2020-02-11 14:44:31,417 __main__ INFO {'input': ['prefinal-xent.linear'], 'component': 'prefinal-xent.batchnorm2', 'name': 'prefinal-xent.batchnorm2', 'node_type': 'component-node', 'id': 118, 'dim': 256, 'block_dim': 256, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 43264.0, 'stats_mean': (256,), 'stats_var': (256,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 2.989858e-07, 'stats_var-l2-norm': 6.3436804}
2020-02-11 14:44:31,418 __main__ INFO {'input': ['prefinal-xent.batchnorm2'], 'component': 'output-xent.affine', 'name': 'output-xent.affine', 'node_type': 'component-node', 'id': 119, 'max_change': 1.5, 'params': (3448, 256), 'bias': (3448,), 'rank_in': 20, 'rank_out': 80, 'num_samples_history': 2000.0, 'alpha': 4.0, 'type': 'Gemm', 'raw-type': 'NaturalGradientAffine', 'params-l2-norm': 54.274277, 'bias-l2-norm': 2.9230652}
2020-02-11 14:44:31,418 __main__ INFO {'input': ['output-xent.affine'], 'component': 'output-xent.log-softmax', 'name': 'output-xent.log-softmax', 'node_type': 'component-node', 'id': 120, 'dim': 3448, 'value_avg': array([], dtype=float32), 'deriv_avg': array([], dtype=float32), 'count': 0.0, 'oderiv_rms': (3448,), 'oderiv_count': 0.0, 'type': 'LogSoftmax', 'raw-type': 'LogSoftmax', 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,418 __main__ INFO {'objective': 'linear', 'input': ['output-xent.log-softmax'], 'name': 'output-xent', 'node_type': 'output-node', 'type': 'Output', 'id': 121}

and this is the l2-norm of pytorch's model parameter

2020-02-11 14:59:18,264 (common:38) INFO: load checkpoint from /mnt/cfs1_alias1/asr/users/fanlu/task/kaldi_recipe/pybind/s10/exp/chain/train_q2_orthogonal_modelmodel_tdnnf3_def_init_opadam_bs128_ep6_lr1e-3_fpe150_110_90_hn1024_fpr1500000_ms1_2_3_4_5_kernel1_1_1_0_3_3_3_3_3_3_3_3_stride1_1_1_3_1_1_1_1_1_1_1_1_l2r5e-4/best_model.pt
2020-02-11 14:59:19,863 (model_tdnnf3:224) INFO: name: tdnn1_affine.weight, shape: torch.Size([1024, 129]), l2-norm: 13.578791618347168, np-l2-norm: 13.578801155090332
2020-02-11 14:59:28,992 (model_tdnnf3:224) INFO: name: tdnn1_affine.bias, shape: torch.Size([1024]), l2-norm: 2.2701199054718018, np-l2-norm: 2.270120859146118
2020-02-11 14:59:28,992 (model_tdnnf3:224) INFO: name: tdnn1_batchnorm.weight, shape: torch.Size([1024]), l2-norm: 10.799092292785645, np-l2-norm: 10.799091339111328
2020-02-11 14:59:28,993 (model_tdnnf3:224) INFO: name: tdnn1_batchnorm.bias, shape: torch.Size([1024]), l2-norm: 2.189619302749634, np-l2-norm: 2.1896190643310547
2020-02-11 14:59:28,994 (model_tdnnf3:224) INFO: name: tdnnfs.0.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 11.698036193847656, np-l2-norm: 11.698058128356934
2020-02-11 14:59:28,994 (model_tdnnf3:224) INFO: name: tdnnfs.0.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 12.456302642822266, np-l2-norm: 12.45630931854248
2020-02-11 14:59:28,995 (model_tdnnf3:224) INFO: name: tdnnfs.0.affine.bias, shape: torch.Size([1024]), l2-norm: 0.8999515175819397, np-l2-norm: 0.8999518156051636
2020-02-11 14:59:28,995 (model_tdnnf3:224) INFO: name: tdnnfs.0.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 8.536678314208984, np-l2-norm: 8.536681175231934
2020-02-11 14:59:28,995 (model_tdnnf3:224) INFO: name: tdnnfs.0.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.9667662382125854, np-l2-norm: 0.9667660593986511
2020-02-11 14:59:28,996 (model_tdnnf3:224) INFO: name: tdnnfs.1.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 11.157291412353516, np-l2-norm: 11.15730094909668
2020-02-11 14:59:28,997 (model_tdnnf3:224) INFO: name: tdnnfs.1.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.972347259521484, np-l2-norm: 10.972354888916016
2020-02-11 14:59:28,997 (model_tdnnf3:224) INFO: name: tdnnfs.1.affine.bias, shape: torch.Size([1024]), l2-norm: 0.6986019611358643, np-l2-norm: 0.6986021399497986
2020-02-11 14:59:28,997 (model_tdnnf3:224) INFO: name: tdnnfs.1.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 7.437180995941162, np-l2-norm: 7.437183856964111
2020-02-11 14:59:28,998 (model_tdnnf3:224) INFO: name: tdnnfs.1.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.6932032704353333, np-l2-norm: 0.6932030916213989
2020-02-11 14:59:28,998 (model_tdnnf3:224) INFO: name: tdnnfs.2.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.548410415649414, np-l2-norm: 10.548415184020996
2020-02-11 14:59:28,999 (model_tdnnf3:224) INFO: name: tdnnfs.2.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.08105754852295, np-l2-norm: 10.081061363220215
2020-02-11 14:59:28,999 (model_tdnnf3:224) INFO: name: tdnnfs.2.affine.bias, shape: torch.Size([1024]), l2-norm: 0.5530431866645813, np-l2-norm: 0.5530433654785156
2020-02-11 14:59:29,000 (model_tdnnf3:224) INFO: name: tdnnfs.2.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 7.182321548461914, np-l2-norm: 7.182323932647705
2020-02-11 14:59:29,000 (model_tdnnf3:224) INFO: name: tdnnfs.2.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.7540080547332764, np-l2-norm: 0.7540078163146973
2020-02-11 14:59:29,001 (model_tdnnf3:224) INFO: name: tdnnfs.3.linear.conv.weight, shape: torch.Size([128, 1024, 1]), l2-norm: 7.231468677520752, np-l2-norm: 7.23146915435791
2020-02-11 14:59:29,001 (model_tdnnf3:224) INFO: name: tdnnfs.3.affine.weight, shape: torch.Size([1024, 128, 1]), l2-norm: 7.3756842613220215, np-l2-norm: 7.37568473815918
2020-02-11 14:59:29,001 (model_tdnnf3:224) INFO: name: tdnnfs.3.affine.bias, shape: torch.Size([1024]), l2-norm: 0.8493649363517761, np-l2-norm: 0.8493649959564209
2020-02-11 14:59:29,002 (model_tdnnf3:224) INFO: name: tdnnfs.3.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 4.721061706542969, np-l2-norm: 4.721061706542969
2020-02-11 14:59:29,002 (model_tdnnf3:224) INFO: name: tdnnfs.3.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.8471177816390991, np-l2-norm: 0.8471177220344543
2020-02-11 14:59:29,003 (model_tdnnf3:224) INFO: name: tdnnfs.4.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.269883155822754, np-l2-norm: 10.26988697052002
2020-02-11 14:59:29,003 (model_tdnnf3:224) INFO: name: tdnnfs.4.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 11.01193904876709, np-l2-norm: 11.011943817138672
2020-02-11 14:59:29,004 (model_tdnnf3:224) INFO: name: tdnnfs.4.affine.bias, shape: torch.Size([1024]), l2-norm: 0.6541978120803833, np-l2-norm: 0.6541979312896729
2020-02-11 14:59:29,004 (model_tdnnf3:224) INFO: name: tdnnfs.4.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.021320343017578, np-l2-norm: 6.021320343017578
2020-02-11 14:59:29,004 (model_tdnnf3:224) INFO: name: tdnnfs.4.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.7195524573326111, np-l2-norm: 0.7195526957511902
2020-02-11 14:59:29,005 (model_tdnnf3:224) INFO: name: tdnnfs.5.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.503177642822266, np-l2-norm: 10.503182411193848
2020-02-11 14:59:29,006 (model_tdnnf3:224) INFO: name: tdnnfs.5.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 11.230865478515625, np-l2-norm: 11.23087215423584
2020-02-11 14:59:29,006 (model_tdnnf3:224) INFO: name: tdnnfs.5.affine.bias, shape: torch.Size([1024]), l2-norm: 0.6815171837806702, np-l2-norm: 0.6815172433853149
2020-02-11 14:59:29,006 (model_tdnnf3:224) INFO: name: tdnnfs.5.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.027979373931885, np-l2-norm: 6.027980327606201
2020-02-11 14:59:29,007 (model_tdnnf3:224) INFO: name: tdnnfs.5.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.66847163438797, np-l2-norm: 0.6684714555740356
2020-02-11 14:59:29,007 (model_tdnnf3:224) INFO: name: tdnnfs.6.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.178318977355957, np-l2-norm: 10.178319931030273
2020-02-11 14:59:29,008 (model_tdnnf3:224) INFO: name: tdnnfs.6.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.798919677734375, np-l2-norm: 10.798924446105957
2020-02-11 14:59:29,008 (model_tdnnf3:224) INFO: name: tdnnfs.6.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7030293345451355, np-l2-norm: 0.703029453754425
2020-02-11 14:59:29,009 (model_tdnnf3:224) INFO: name: tdnnfs.6.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 5.839771747589111, np-l2-norm: 5.839775085449219
2020-02-11 14:59:29,009 (model_tdnnf3:224) INFO: name: tdnnfs.6.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.6625904440879822, np-l2-norm: 0.6625903248786926
2020-02-11 14:59:29,010 (model_tdnnf3:224) INFO: name: tdnnfs.7.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.609781265258789, np-l2-norm: 10.609784126281738
2020-02-11 14:59:29,010 (model_tdnnf3:224) INFO: name: tdnnfs.7.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.92614459991455, np-l2-norm: 10.926151275634766
2020-02-11 14:59:29,011 (model_tdnnf3:224) INFO: name: tdnnfs.7.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7368164658546448, np-l2-norm: 0.7368165850639343
2020-02-11 14:59:29,011 (model_tdnnf3:224) INFO: name: tdnnfs.7.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.065881252288818, np-l2-norm: 6.065880298614502
2020-02-11 14:59:29,011 (model_tdnnf3:224) INFO: name: tdnnfs.7.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.6848169565200806, np-l2-norm: 0.6848171353340149
2020-02-11 14:59:29,012 (model_tdnnf3:224) INFO: name: tdnnfs.8.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.567127227783203, np-l2-norm: 10.567130088806152
2020-02-11 14:59:29,013 (model_tdnnf3:224) INFO: name: tdnnfs.8.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.671625137329102, np-l2-norm: 10.671629905700684
2020-02-11 14:59:29,013 (model_tdnnf3:224) INFO: name: tdnnfs.8.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7605870962142944, np-l2-norm: 0.7605868577957153
2020-02-11 14:59:29,013 (model_tdnnf3:224) INFO: name: tdnnfs.8.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.370297908782959, np-l2-norm: 6.37030029296875
2020-02-11 14:59:29,014 (model_tdnnf3:224) INFO: name: tdnnfs.8.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.673714280128479, np-l2-norm: 0.6737140417098999
2020-02-11 14:59:29,014 (model_tdnnf3:224) INFO: name: tdnnfs.9.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.171960830688477, np-l2-norm: 10.17195987701416
2020-02-11 14:59:29,015 (model_tdnnf3:224) INFO: name: tdnnfs.9.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.168111801147461, np-l2-norm: 10.168119430541992
2020-02-11 14:59:29,015 (model_tdnnf3:224) INFO: name: tdnnfs.9.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7516912817955017, np-l2-norm: 0.7516909241676331
2020-02-11 14:59:29,016 (model_tdnnf3:224) INFO: name: tdnnfs.9.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.299006938934326, np-l2-norm: 6.299007892608643
2020-02-11 14:59:29,016 (model_tdnnf3:224) INFO: name: tdnnfs.9.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.6358782649040222, np-l2-norm: 0.6358781456947327
2020-02-11 14:59:29,017 (model_tdnnf3:224) INFO: name: tdnnfs.10.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.095071792602539, np-l2-norm: 10.095071792602539
2020-02-11 14:59:29,017 (model_tdnnf3:224) INFO: name: tdnnfs.10.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.038125038146973, np-l2-norm: 10.038130760192871
2020-02-11 14:59:29,017 (model_tdnnf3:224) INFO: name: tdnnfs.10.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7603970766067505, np-l2-norm: 0.7603970170021057
2020-02-11 14:59:29,018 (model_tdnnf3:224) INFO: name: tdnnfs.10.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.457774639129639, np-l2-norm: 6.457772731781006
2020-02-11 14:59:29,018 (model_tdnnf3:224) INFO: name: tdnnfs.10.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.7173940539360046, np-l2-norm: 0.7173939347267151
2020-02-11 14:59:29,019 (model_tdnnf3:224) INFO: name: tdnnfs.11.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 9.877976417541504, np-l2-norm: 9.877982139587402
2020-02-11 14:59:29,019 (model_tdnnf3:224) INFO: name: tdnnfs.11.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 9.827290534973145, np-l2-norm: 9.827296257019043
2020-02-11 14:59:29,020 (model_tdnnf3:224) INFO: name: tdnnfs.11.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7950196266174316, np-l2-norm: 0.7950197458267212
2020-02-11 14:59:29,020 (model_tdnnf3:224) INFO: name: tdnnfs.11.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.03067684173584, np-l2-norm: 6.03067684173584
2020-02-11 14:59:29,020 (model_tdnnf3:224) INFO: name: tdnnfs.11.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 1.1897754669189453, np-l2-norm: 1.1897754669189453
2020-02-11 14:59:29,021 (model_tdnnf3:224) INFO: name: prefinal_l.conv.weight, shape: torch.Size([256, 1024, 1]), l2-norm: 12.372564315795898, np-l2-norm: 12.372568130493164
2020-02-11 14:59:29,022 (model_tdnnf3:224) INFO: name: prefinal_chain.affine.weight, shape: torch.Size([1024, 256]), l2-norm: 10.835281372070312, np-l2-norm: 10.835285186767578
2020-02-11 14:59:29,022 (model_tdnnf3:224) INFO: name: prefinal_chain.affine.bias, shape: torch.Size([1024]), l2-norm: 1.2527509927749634, np-l2-norm: 1.2527514696121216
2020-02-11 14:59:29,023 (model_tdnnf3:224) INFO: name: prefinal_chain.batchnorm1.weight, shape: torch.Size([1024]), l2-norm: 10.376221656799316, np-l2-norm: 10.37622356414795
2020-02-11 14:59:29,024 (model_tdnnf3:224) INFO: name: prefinal_chain.batchnorm1.bias, shape: torch.Size([1024]), l2-norm: 1.386192707286682e-05, np-l2-norm: 1.3861925253877416e-05
2020-02-11 14:59:29,025 (model_tdnnf3:224) INFO: name: prefinal_chain.linear.conv.weight, shape: torch.Size([256, 1024, 1]), l2-norm: 10.254746437072754, np-l2-norm: 10.254751205444336
2020-02-11 14:59:29,025 (model_tdnnf3:224) INFO: name: prefinal_chain.batchnorm2.weight, shape: torch.Size([256]), l2-norm: 7.542169570922852, np-l2-norm: 7.542168140411377
2020-02-11 14:59:29,025 (model_tdnnf3:224) INFO: name: prefinal_chain.batchnorm2.bias, shape: torch.Size([256]), l2-norm: 0.5150323510169983, np-l2-norm: 0.5150324702262878
2020-02-11 14:59:29,027 (model_tdnnf3:224) INFO: name: output_affine.weight, shape: torch.Size([4336, 256]), l2-norm: 16.972375869750977, np-l2-norm: 16.97245216369629
2020-02-11 14:59:29,027 (model_tdnnf3:224) INFO: name: output_affine.bias, shape: torch.Size([4336]), l2-norm: 0.8673517107963562, np-l2-norm: 0.8673520088195801
2020-02-11 14:59:29,028 (model_tdnnf3:224) INFO: name: prefinal_xent.affine.weight, shape: torch.Size([1024, 256]), l2-norm: 8.239212989807129, np-l2-norm: 8.239215850830078
2020-02-11 14:59:29,028 (model_tdnnf3:224) INFO: name: prefinal_xent.affine.bias, shape: torch.Size([1024]), l2-norm: 0.741665780544281, np-l2-norm: 0.7416657209396362
2020-02-11 14:59:29,029 (model_tdnnf3:224) INFO: name: prefinal_xent.batchnorm1.weight, shape: torch.Size([1024]), l2-norm: 6.000321865081787, np-l2-norm: 6.00032377243042
2020-02-11 14:59:29,029 (model_tdnnf3:224) INFO: name: prefinal_xent.batchnorm1.bias, shape: torch.Size([1024]), l2-norm: 2.1985473722452298e-05, np-l2-norm: 2.198547008447349e-05
2020-02-11 14:59:29,030 (model_tdnnf3:224) INFO: name: prefinal_xent.linear.conv.weight, shape: torch.Size([256, 1024, 1]), l2-norm: 9.000222206115723, np-l2-norm: 9.000225067138672
2020-02-11 14:59:29,030 (model_tdnnf3:224) INFO: name: prefinal_xent.batchnorm2.weight, shape: torch.Size([256]), l2-norm: 29.329822540283203, np-l2-norm: 29.329822540283203
2020-02-11 14:59:29,031 (model_tdnnf3:224) INFO: name: prefinal_xent.batchnorm2.bias, shape: torch.Size([256]), l2-norm: 2.5147366523742676, np-l2-norm: 2.5147361755371094
2020-02-11 14:59:29,032 (model_tdnnf3:224) INFO: name: output_xent_affine.weight, shape: torch.Size([4336, 256]), l2-norm: 30.42934799194336, np-l2-norm: 30.429412841796875
2020-02-11 14:59:29,033 (model_tdnnf3:224) INFO: name: output_xent_affine.bias, shape: torch.Size([4336]), l2-norm: 0.7537439465522766, np-l2-norm: 0.7537445425987244
2020-02-11 14:59:29,033 (model_tdnnf3:224) INFO: name: input_batch_norm.weight, shape: torch.Size([129]), l2-norm: 8.521322250366211, np-l2-norm: 8.521322250366211
2020-02-11 14:59:29,033 (model_tdnnf3:224) INFO: name: input_batch_norm.bias, shape: torch.Size([129]), l2-norm: 1.5148330926895142, np-l2-norm: 1.5148330926895142

danpovey · 2020-02-11T07:12:08Z

Thanks.. what I really wanted to compare the l2 norms of the Kaldi model's parameters with the corresponding parameters of the PyTorch model.

…

On Tue, Feb 11, 2020 at 3:07 PM fanlu ***@***.***> wrote: I have used https://github.com/XiaoMi/kaldi-onnx.git <http://url> and torch.norm np.linalg.norm to calculate the l2_norm @danpovey <https://github.com/danpovey> please have a look and point me sth what you wanted next. thanks This is the l2_norm log of kaldi tdnn_1c tdnn_1c l2-norm 2020-02-11 14:44:31,355 __main__ INFO {'dim': '40', 'name': 'input', 'node_type': 'input-node', 'type': 'Input', 'id': 1} 2020-02-11 14:44:31,356 __main__ INFO {'id': 4, 'type': 'Splice', 'name': 'splice_4', 'input': ['input'], 'context': [-1, 0, 1]} 2020-02-11 14:44:31,372 __main__ INFO {'input': ['splice_4'], 'component': 'lda', 'name': 'lda', 'node_type': 'component-node', 'id': 5, 'params': (120, 120), 'bias': (120,), 'type': 'Gemm', 'raw-type': 'FixedAffine', 'params-l2-norm': 0.3751292, 'bias-l2-norm': 0.02903783} 2020-02-11 14:44:31,375 __main__ INFO {'input': ['lda'], 'component': 'tdnn1.affine', 'name': 'tdnn1.affine', 'node_type': 'component-node', 'id': 6, 'max_change': 0.75, 'params': (1024, 120), 'bias': (1024,), 'rank_in': 20, 'rank_out': 80, 'num_samples_history': 2000.0, 'alpha': 4.0, 'type': 'Gemm', 'raw-type': 'NaturalGradientAffine', 'params-l2-norm': 17.994331, 'bias-l2-norm': 2.1129487} 2020-02-11 14:44:31,375 __main__ INFO {'input': ['tdnn1.affine'], 'component': 'tdnn1.relu', 'name': 'tdnn1.relu', 'node_type': 'component-node', 'id': 7, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 71424.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 1.8933748, 'deriv_avg-l2-norm': 17.662828, 'oderiv_rms-l2-norm': 0.0} 2020-02-11 14:44:31,375 __main__ INFO {'input': ['tdnn1.relu'], 'component': 'tdnn1.batchnorm', 'name': 'tdnn1.batchnorm', 'node_type': 'component-node', 'id': 8, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 179712.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 1.9120138, 'stats_var-l2-norm': 0.18313415} 2020-02-11 14:44:31,376 __main__ INFO {'input': ['tdnn1.batchnorm'], 'component': 'tdnn1.dropout', 'name': 'tdnn1.dropout', 'node_type': 'component-node', 'id': 9, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'} 2020-02-11 14:44:31,376 __main__ INFO {'input': ['tdnn1.dropout'], 'component': 'tdnnf2.linear', 'name': 'tdnnf2.linear', 'node_type': 'component-node', 'id': 10, 'time_offsets': array([-1, 0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 14.65368} 2020-02-11 14:44:31,379 __main__ INFO {'input': ['tdnnf2.linear'], 'component': 'tdnnf2.affine', 'name': 'tdnnf2.affine', 'node_type': 'component-node', 'id': 11, 'time_offsets': array([0, 1]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 12.9894, 'bias-l2-norm': 2.456446} 2020-02-11 14:44:31,380 __main__ INFO {'input': ['tdnnf2.affine'], 'component': 'tdnnf2.relu', 'name': 'tdnnf2.relu', 'node_type': 'component-node', 'id': 12, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 60928.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 12.335568, 'deriv_avg-l2-norm': 15.625699, 'oderiv_rms-l2-norm': 0.0} 2020-02-11 14:44:31,380 __main__ INFO {'input': ['tdnnf2.relu'], 'component': 'tdnnf2.batchnorm', 'name': 'tdnnf2.batchnorm', 'node_type': 'component-node', 'id': 13, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 177792.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 12.284391, 'stats_var-l2-norm': 12.800878} 2020-02-11 14:44:31,380 __main__ INFO {'input': ['tdnnf2.batchnorm'], 'component': 'tdnnf2.dropout', 'name': 'tdnnf2.dropout', 'node_type': 'component-node', 'id': 14, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'} 2020-02-11 14:44:31,380 __main__ INFO {'id': 15, 'type': 'Scale', 'name': 'tdnn1.dropout.Scale.0.66', 'input': ['tdnn1.dropout'], 'scale': 0.66} 2020-02-11 14:44:31,380 __main__ INFO {'id': 16, 'type': 'Sum', 'name': 'tdnn1.dropout.Scale.0.66.Sum.tdnnf2.dropout', 'input': ['tdnn1.dropout.Scale.0.66', 'tdnnf2.dropout']} 2020-02-11 14:44:31,381 __main__ INFO {'input': ['tdnn1.dropout.Scale.0.66.Sum.tdnnf2.dropout'], 'component': 'tdnnf2.noop', 'name': 'tdnnf2.noop', 'node_type': 'component-node', 'id': 17, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'} 2020-02-11 14:44:31,381 __main__ INFO {'input': ['tdnnf2.noop'], 'component': 'tdnnf3.linear', 'name': 'tdnnf3.linear', 'node_type': 'component-node', 'id': 18, 'time_offsets': array([-1, 0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 12.562767} 2020-02-11 14:44:31,382 __main__ INFO {'input': ['tdnnf3.linear'], 'component': 'tdnnf3.affine', 'name': 'tdnnf3.affine', 'node_type': 'component-node', 'id': 19, 'time_offsets': array([0, 1]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.749477, 'bias-l2-norm': 1.5780896} 2020-02-11 14:44:31,382 __main__ INFO {'input': ['tdnnf3.affine'], 'component': 'tdnnf3.relu', 'name': 'tdnnf3.relu', 'node_type': 'component-node', 'id': 20, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 105408.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 13.213889, 'deriv_avg-l2-norm': 15.905553, 'oderiv_rms-l2-norm': 0.0} 2020-02-11 14:44:31,383 __main__ INFO {'input': ['tdnnf3.relu'], 'component': 'tdnnf3.batchnorm', 'name': 'tdnnf3.batchnorm', 'node_type': 'component-node', 'id': 21, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 175872.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 13.222022, 'stats_var-l2-norm': 13.43212} 2020-02-11 14:44:31,383 __main__ INFO {'input': ['tdnnf3.batchnorm'], 'component': 'tdnnf3.dropout', 'name': 'tdnnf3.dropout', 'node_type': 'component-node', 'id': 22, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'} 2020-02-11 14:44:31,383 __main__ INFO {'id': 23, 'type': 'Scale', 'name': 'tdnnf2.noop.Scale.0.66', 'input': ['tdnnf2.noop'], 'scale': 0.66} 2020-02-11 14:44:31,383 __main__ INFO {'id': 24, 'type': 'Sum', 'name': 'tdnnf2.noop.Scale.0.66.Sum.tdnnf3.dropout', 'input': ['tdnnf2.noop.Scale.0.66', 'tdnnf3.dropout']} 2020-02-11 14:44:31,384 __main__ INFO {'input': ['tdnnf2.noop.Scale.0.66.Sum.tdnnf3.dropout'], 'component': 'tdnnf3.noop', 'name': 'tdnnf3.noop', 'node_type': 'component-node', 'id': 25, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'} 2020-02-11 14:44:31,384 __main__ INFO {'input': ['tdnnf3.noop'], 'component': 'tdnnf4.linear', 'name': 'tdnnf4.linear', 'node_type': 'component-node', 'id': 26, 'time_offsets': array([-1, 0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 11.342851} 2020-02-11 14:44:31,385 __main__ INFO {'input': ['tdnnf4.linear'], 'component': 'tdnnf4.affine', 'name': 'tdnnf4.affine', 'node_type': 'component-node', 'id': 27, 'time_offsets': array([0, 1]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 9.7689705, 'bias-l2-norm': 1.2836239} 2020-02-11 14:44:31,385 __main__ INFO {'input': ['tdnnf4.affine'], 'component': 'tdnnf4.relu', 'name': 'tdnnf4.relu', 'node_type': 'component-node', 'id': 28, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 26880.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 12.693966, 'deriv_avg-l2-norm': 15.441769, 'oderiv_rms-l2-norm': 0.0} 2020-02-11 14:44:31,386 __main__ INFO {'input': ['tdnnf4.relu'], 'component': 'tdnnf4.batchnorm', 'name': 'tdnnf4.batchnorm', 'node_type': 'component-node', 'id': 29, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 58624.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 12.727512, 'stats_var-l2-norm': 13.328432} 2020-02-11 14:44:31,386 __main__ INFO {'input': ['tdnnf4.batchnorm'], 'component': 'tdnnf4.dropout', 'name': 'tdnnf4.dropout', 'node_type': 'component-node', 'id': 30, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'} 2020-02-11 14:44:31,386 __main__ INFO {'id': 31, 'type': 'Scale', 'name': 'tdnnf3.noop.Scale.0.66', 'input': ['tdnnf3.noop'], 'scale': 0.66} 2020-02-11 14:44:31,386 __main__ INFO {'id': 32, 'type': 'Sum', 'name': 'tdnnf3.noop.Scale.0.66.Sum.tdnnf4.dropout', 'input': ['tdnnf3.noop.Scale.0.66', 'tdnnf4.dropout']} 2020-02-11 14:44:31,386 __main__ INFO {'input': ['tdnnf3.noop.Scale.0.66.Sum.tdnnf4.dropout'], 'component': 'tdnnf4.noop', 'name': 'tdnnf4.noop', 'node_type': 'component-node', 'id': 33, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'} 2020-02-11 14:44:31,387 __main__ INFO {'input': ['tdnnf4.noop'], 'component': 'tdnnf5.linear', 'name': 'tdnnf5.linear', 'node_type': 'component-node', 'id': 34, 'time_offsets': array([0]), 'params': (128, 1024), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 8.509507} 2020-02-11 14:44:31,387 __main__ INFO {'input': ['tdnnf5.linear'], 'component': 'tdnnf5.affine', 'name': 'tdnnf5.affine', 'node_type': 'component-node', 'id': 35, 'time_offsets': array([0]), 'params': (1024, 128), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 7.895306, 'bias-l2-norm': 1.8074945} 2020-02-11 14:44:31,388 __main__ INFO {'input': ['tdnnf5.affine'], 'component': 'tdnnf5.relu', 'name': 'tdnnf5.relu', 'node_type': 'component-node', 'id': 36, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 35328.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 6.929157, 'deriv_avg-l2-norm': 13.615622, 'oderiv_rms-l2-norm': 0.0} 2020-02-11 14:44:31,388 __main__ INFO {'input': ['tdnnf5.relu'], 'component': 'tdnnf5.batchnorm', 'name': 'tdnnf5.batchnorm', 'node_type': 'component-node', 'id': 37, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 58624.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 6.9155283, 'stats_var-l2-norm': 4.8078284} 2020-02-11 14:44:31,389 __main__ INFO {'input': ['tdnnf5.batchnorm'], 'component': 'tdnnf5.dropout', 'name': 'tdnnf5.dropout', 'node_type': 'component-node', 'id': 38, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'} 2020-02-11 14:44:31,389 __main__ INFO {'id': 39, 'type': 'Scale', 'name': 'tdnnf4.noop.Scale.0.66', 'input': ['tdnnf4.noop'], 'scale': 0.66} 2020-02-11 14:44:31,389 __main__ INFO {'id': 40, 'type': 'Sum', 'name': 'tdnnf4.noop.Scale.0.66.Sum.tdnnf5.dropout', 'input': ['tdnnf4.noop.Scale.0.66', 'tdnnf5.dropout']} 2020-02-11 14:44:31,389 __main__ INFO {'input': ['tdnnf4.noop.Scale.0.66.Sum.tdnnf5.dropout'], 'component': 'tdnnf5.noop', 'name': 'tdnnf5.noop', 'node_type': 'component-node', 'id': 41, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'} 2020-02-11 14:44:31,390 __main__ INFO {'input': ['tdnnf5.noop'], 'component': 'tdnnf6.linear', 'name': 'tdnnf6.linear', 'node_type': 'component-node', 'id': 42, 'time_offsets': array([-3, 0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 11.609792} 2020-02-11 14:44:31,390 __main__ INFO {'input': ['tdnnf6.linear'], 'component': 'tdnnf6.affine', 'name': 'tdnnf6.affine', 'node_type': 'component-node', 'id': 43, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.078222, 'bias-l2-norm': 1.6047498} 2020-02-11 14:44:31,391 __main__ INFO {'input': ['tdnnf6.affine'], 'component': 'tdnnf6.relu', 'name': 'tdnnf6.relu', 'node_type': 'component-node', 'id': 44, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 18624.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 11.288168, 'deriv_avg-l2-norm': 15.204925, 'oderiv_rms-l2-norm': 0.0} 2020-02-11 14:44:31,391 __main__ INFO {'input': ['tdnnf6.relu'], 'component': 'tdnnf6.batchnorm', 'name': 'tdnnf6.batchnorm', 'node_type': 'component-node', 'id': 45, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 56704.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 11.27561, 'stats_var-l2-norm': 10.5281} 2020-02-11 14:44:31,391 __main__ INFO {'input': ['tdnnf6.batchnorm'], 'component': 'tdnnf6.dropout', 'name': 'tdnnf6.dropout', 'node_type': 'component-node', 'id': 46, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'} 2020-02-11 14:44:31,392 __main__ INFO {'id': 47, 'type': 'Scale', 'name': 'tdnnf5.noop.Scale.0.66', 'input': ['tdnnf5.noop'], 'scale': 0.66} 2020-02-11 14:44:31,392 __main__ INFO {'id': 48, 'type': 'Sum', 'name': 'tdnnf5.noop.Scale.0.66.Sum.tdnnf6.dropout', 'input': ['tdnnf5.noop.Scale.0.66', 'tdnnf6.dropout']} 2020-02-11 14:44:31,392 __main__ INFO {'input': ['tdnnf5.noop.Scale.0.66.Sum.tdnnf6.dropout'], 'component': 'tdnnf6.noop', 'name': 'tdnnf6.noop', 'node_type': 'component-node', 'id': 49, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'} 2020-02-11 14:44:31,392 __main__ INFO {'input': ['tdnnf6.noop'], 'component': 'tdnnf7.linear', 'name': 'tdnnf7.linear', 'node_type': 'component-node', 'id': 50, 'time_offsets': array([-3, 0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 11.21937} 2020-02-11 14:44:31,393 __main__ INFO {'input': ['tdnnf7.linear'], 'component': 'tdnnf7.affine', 'name': 'tdnnf7.affine', 'node_type': 'component-node', 'id': 51, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 9.733917, 'bias-l2-norm': 1.7020972} 2020-02-11 14:44:31,394 __main__ INFO {'input': ['tdnnf7.affine'], 'component': 'tdnnf7.relu', 'name': 'tdnnf7.relu', 'node_type': 'component-node', 'id': 52, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 25920.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 9.964567, 'deriv_avg-l2-norm': 14.876908, 'oderiv_rms-l2-norm': 0.0} 2020-02-11 14:44:31,394 __main__ INFO {'input': ['tdnnf7.relu'], 'component': 'tdnnf7.batchnorm', 'name': 'tdnnf7.batchnorm', 'node_type': 'component-node', 'id': 53, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 54784.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 9.983768, 'stats_var-l2-norm': 8.700165} 2020-02-11 14:44:31,394 __main__ INFO {'input': ['tdnnf7.batchnorm'], 'component': 'tdnnf7.dropout', 'name': 'tdnnf7.dropout', 'node_type': 'component-node', 'id': 54, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'} 2020-02-11 14:44:31,394 __main__ INFO {'id': 55, 'type': 'Scale', 'name': 'tdnnf6.noop.Scale.0.66', 'input': ['tdnnf6.noop'], 'scale': 0.66} 2020-02-11 14:44:31,395 __main__ INFO {'id': 56, 'type': 'Sum', 'name': 'tdnnf6.noop.Scale.0.66.Sum.tdnnf7.dropout', 'input': ['tdnnf6.noop.Scale.0.66', 'tdnnf7.dropout']} 2020-02-11 14:44:31,395 __main__ INFO {'input': ['tdnnf6.noop.Scale.0.66.Sum.tdnnf7.dropout'], 'component': 'tdnnf7.noop', 'name': 'tdnnf7.noop', 'node_type': 'component-node', 'id': 57, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'} 2020-02-11 14:44:31,395 __main__ INFO {'input': ['tdnnf7.noop'], 'component': 'tdnnf8.linear', 'name': 'tdnnf8.linear', 'node_type': 'component-node', 'id': 58, 'time_offsets': array([-3, 0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.982727} 2020-02-11 14:44:31,396 __main__ INFO {'input': ['tdnnf8.linear'], 'component': 'tdnnf8.affine', 'name': 'tdnnf8.affine', 'node_type': 'component-node', 'id': 59, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 9.5415945, 'bias-l2-norm': 1.716922} 2020-02-11 14:44:31,397 __main__ INFO {'input': ['tdnnf8.affine'], 'component': 'tdnnf8.relu', 'name': 'tdnnf8.relu', 'node_type': 'component-node', 'id': 60, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 22208.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 9.486838, 'deriv_avg-l2-norm': 14.476424, 'oderiv_rms-l2-norm': 0.0} 2020-02-11 14:44:31,397 __main__ INFO {'input': ['tdnnf8.relu'], 'component': 'tdnnf8.batchnorm', 'name': 'tdnnf8.batchnorm', 'node_type': 'component-node', 'id': 61, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 52864.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 9.471879, 'stats_var-l2-norm': 8.468856} 2020-02-11 14:44:31,397 __main__ INFO {'input': ['tdnnf8.batchnorm'], 'component': 'tdnnf8.dropout', 'name': 'tdnnf8.dropout', 'node_type': 'component-node', 'id': 62, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'} 2020-02-11 14:44:31,397 __main__ INFO {'id': 63, 'type': 'Scale', 'name': 'tdnnf7.noop.Scale.0.66', 'input': ['tdnnf7.noop'], 'scale': 0.66} 2020-02-11 14:44:31,398 __main__ INFO {'id': 64, 'type': 'Sum', 'name': 'tdnnf7.noop.Scale.0.66.Sum.tdnnf8.dropout', 'input': ['tdnnf7.noop.Scale.0.66', 'tdnnf8.dropout']} 2020-02-11 14:44:31,398 __main__ INFO {'input': ['tdnnf7.noop.Scale.0.66.Sum.tdnnf8.dropout'], 'component': 'tdnnf8.noop', 'name': 'tdnnf8.noop', 'node_type': 'component-node', 'id': 65, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'} 2020-02-11 14:44:31,398 __main__ INFO {'input': ['tdnnf8.noop'], 'component': 'tdnnf9.linear', 'name': 'tdnnf9.linear', 'node_type': 'component-node', 'id': 66, 'time_offsets': array([-3, 0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.8021345} 2020-02-11 14:44:31,399 __main__ INFO {'input': ['tdnnf9.linear'], 'component': 'tdnnf9.affine', 'name': 'tdnnf9.affine', 'node_type': 'component-node', 'id': 67, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 9.299649, 'bias-l2-norm': 1.5847737} 2020-02-11 14:44:31,399 __main__ INFO {'input': ['tdnnf9.affine'], 'component': 'tdnnf9.relu', 'name': 'tdnnf9.relu', 'node_type': 'component-node', 'id': 68, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 23296.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 9.041475, 'deriv_avg-l2-norm': 14.187531, 'oderiv_rms-l2-norm': 0.0} 2020-02-11 14:44:31,400 __main__ INFO {'input': ['tdnnf9.relu'], 'component': 'tdnnf9.batchnorm', 'name': 'tdnnf9.batchnorm', 'node_type': 'component-node', 'id': 69, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 50944.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 9.050355, 'stats_var-l2-norm': 8.135726} 2020-02-11 14:44:31,400 __main__ INFO {'input': ['tdnnf9.batchnorm'], 'component': 'tdnnf9.dropout', 'name': 'tdnnf9.dropout', 'node_type': 'component-node', 'id': 70, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'} 2020-02-11 14:44:31,400 __main__ INFO {'id': 71, 'type': 'Scale', 'name': 'tdnnf8.noop.Scale.0.66', 'input': ['tdnnf8.noop'], 'scale': 0.66} 2020-02-11 14:44:31,400 __main__ INFO {'id': 72, 'type': 'Sum', 'name': 'tdnnf8.noop.Scale.0.66.Sum.tdnnf9.dropout', 'input': ['tdnnf8.noop.Scale.0.66', 'tdnnf9.dropout']} 2020-02-11 14:44:31,401 __main__ INFO {'input': ['tdnnf8.noop.Scale.0.66.Sum.tdnnf9.dropout'], 'component': 'tdnnf9.noop', 'name': 'tdnnf9.noop', 'node_type': 'component-node', 'id': 73, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'} 2020-02-11 14:44:31,401 __main__ INFO {'input': ['tdnnf9.noop'], 'component': 'tdnnf10.linear', 'name': 'tdnnf10.linear', 'node_type': 'component-node', 'id': 74, 'time_offsets': array([-3, 0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.755399} 2020-02-11 14:44:31,402 __main__ INFO {'input': ['tdnnf10.linear'], 'component': 'tdnnf10.affine', 'name': 'tdnnf10.affine', 'node_type': 'component-node', 'id': 75, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 9.086278, 'bias-l2-norm': 1.3546987} 2020-02-11 14:44:31,402 __main__ INFO {'input': ['tdnnf10.affine'], 'component': 'tdnnf10.relu', 'name': 'tdnnf10.relu', 'node_type': 'component-node', 'id': 76, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 23232.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 9.134641, 'deriv_avg-l2-norm': 13.997164, 'oderiv_rms-l2-norm': 0.0} 2020-02-11 14:44:31,402 __main__ INFO {'input': ['tdnnf10.relu'], 'component': 'tdnnf10.batchnorm', 'name': 'tdnnf10.batchnorm', 'node_type': 'component-node', 'id': 77, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 49024.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 9.114295, 'stats_var-l2-norm': 8.443671} 2020-02-11 14:44:31,403 __main__ INFO {'input': ['tdnnf10.batchnorm'], 'component': 'tdnnf10.dropout', 'name': 'tdnnf10.dropout', 'node_type': 'component-node', 'id': 78, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'} 2020-02-11 14:44:31,403 __main__ INFO {'id': 79, 'type': 'Scale', 'name': 'tdnnf9.noop.Scale.0.66', 'input': ['tdnnf9.noop'], 'scale': 0.66} 2020-02-11 14:44:31,403 __main__ INFO {'id': 80, 'type': 'Sum', 'name': 'tdnnf9.noop.Scale.0.66.Sum.tdnnf10.dropout', 'input': ['tdnnf9.noop.Scale.0.66', 'tdnnf10.dropout']} 2020-02-11 14:44:31,403 __main__ INFO {'input': ['tdnnf9.noop.Scale.0.66.Sum.tdnnf10.dropout'], 'component': 'tdnnf10.noop', 'name': 'tdnnf10.noop', 'node_type': 'component-node', 'id': 81, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'} 2020-02-11 14:44:31,404 __main__ INFO {'input': ['tdnnf10.noop'], 'component': 'tdnnf11.linear', 'name': 'tdnnf11.linear', 'node_type': 'component-node', 'id': 82, 'time_offsets': array([-3, 0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.697865} 2020-02-11 14:44:31,404 __main__ INFO {'input': ['tdnnf11.linear'], 'component': 'tdnnf11.affine', 'name': 'tdnnf11.affine', 'node_type': 'component-node', 'id': 83, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 8.883663, 'bias-l2-norm': 1.2464023} 2020-02-11 14:44:31,405 __main__ INFO {'input': ['tdnnf11.affine'], 'component': 'tdnnf11.relu', 'name': 'tdnnf11.relu', 'node_type': 'component-node', 'id': 84, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 31680.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 8.557134, 'deriv_avg-l2-norm': 13.096737, 'oderiv_rms-l2-norm': 0.0} 2020-02-11 14:44:31,405 __main__ INFO {'input': ['tdnnf11.relu'], 'component': 'tdnnf11.batchnorm', 'name': 'tdnnf11.batchnorm', 'node_type': 'component-node', 'id': 85, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 47104.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 8.5475025, 'stats_var-l2-norm': 8.175571} 2020-02-11 14:44:31,405 __main__ INFO {'input': ['tdnnf11.batchnorm'], 'component': 'tdnnf11.dropout', 'name': 'tdnnf11.dropout', 'node_type': 'component-node', 'id': 86, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'} 2020-02-11 14:44:31,406 __main__ INFO {'id': 87, 'type': 'Scale', 'name': 'tdnnf10.noop.Scale.0.66', 'input': ['tdnnf10.noop'], 'scale': 0.66} 2020-02-11 14:44:31,406 __main__ INFO {'id': 88, 'type': 'Sum', 'name': 'tdnnf10.noop.Scale.0.66.Sum.tdnnf11.dropout', 'input': ['tdnnf10.noop.Scale.0.66', 'tdnnf11.dropout']} 2020-02-11 14:44:31,406 __main__ INFO {'input': ['tdnnf10.noop.Scale.0.66.Sum.tdnnf11.dropout'], 'component': 'tdnnf11.noop', 'name': 'tdnnf11.noop', 'node_type': 'component-node', 'id': 89, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'} 2020-02-11 14:44:31,406 __main__ INFO {'input': ['tdnnf11.noop'], 'component': 'tdnnf12.linear', 'name': 'tdnnf12.linear', 'node_type': 'component-node', 'id': 90, 'time_offsets': array([-3, 0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.470395} 2020-02-11 14:44:31,407 __main__ INFO {'input': ['tdnnf12.linear'], 'component': 'tdnnf12.affine', 'name': 'tdnnf12.affine', 'node_type': 'component-node', 'id': 91, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 8.823896, 'bias-l2-norm': 1.1193717} 2020-02-11 14:44:31,408 __main__ INFO {'input': ['tdnnf12.affine'], 'component': 'tdnnf12.relu', 'name': 'tdnnf12.relu', 'node_type': 'component-node', 'id': 92, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 16640.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 7.735138, 'deriv_avg-l2-norm': 12.613586, 'oderiv_rms-l2-norm': 0.0} 2020-02-11 14:44:31,408 __main__ INFO {'input': ['tdnnf12.relu'], 'component': 'tdnnf12.batchnorm', 'name': 'tdnnf12.batchnorm', 'node_type': 'component-node', 'id': 93, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 45184.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 7.646413, 'stats_var-l2-norm': 7.3074822} 2020-02-11 14:44:31,408 __main__ INFO {'input': ['tdnnf12.batchnorm'], 'component': 'tdnnf12.dropout', 'name': 'tdnnf12.dropout', 'node_type': 'component-node', 'id': 94, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'} 2020-02-11 14:44:31,408 __main__ INFO {'id': 95, 'type': 'Scale', 'name': 'tdnnf11.noop.Scale.0.66', 'input': ['tdnnf11.noop'], 'scale': 0.66} 2020-02-11 14:44:31,409 __main__ INFO {'id': 96, 'type': 'Sum', 'name': 'tdnnf11.noop.Scale.0.66.Sum.tdnnf12.dropout', 'input': ['tdnnf11.noop.Scale.0.66', 'tdnnf12.dropout']} 2020-02-11 14:44:31,409 __main__ INFO {'input': ['tdnnf11.noop.Scale.0.66.Sum.tdnnf12.dropout'], 'component': 'tdnnf12.noop', 'name': 'tdnnf12.noop', 'node_type': 'component-node', 'id': 97, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'} 2020-02-11 14:44:31,409 __main__ INFO {'input': ['tdnnf12.noop'], 'component': 'tdnnf13.linear', 'name': 'tdnnf13.linear', 'node_type': 'component-node', 'id': 98, 'time_offsets': array([-3, 0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.259589} 2020-02-11 14:44:31,410 __main__ INFO {'input': ['tdnnf13.linear'], 'component': 'tdnnf13.affine', 'name': 'tdnnf13.affine', 'node_type': 'component-node', 'id': 99, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 8.941648, 'bias-l2-norm': 0.99510527} 2020-02-11 14:44:31,410 __main__ INFO {'input': ['tdnnf13.affine'], 'component': 'tdnnf13.relu', 'name': 'tdnnf13.relu', 'node_type': 'component-node', 'id': 100, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 32512.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 5.9976406, 'deriv_avg-l2-norm': 11.490221, 'oderiv_rms-l2-norm': 0.0} 2020-02-11 14:44:31,411 __main__ INFO {'input': ['tdnnf13.relu'], 'component': 'tdnnf13.batchnorm', 'name': 'tdnnf13.batchnorm', 'node_type': 'component-node', 'id': 101, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 43264.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 5.9866805, 'stats_var-l2-norm': 5.459227} 2020-02-11 14:44:31,411 __main__ INFO {'input': ['tdnnf13.batchnorm'], 'component': 'tdnnf13.dropout', 'name': 'tdnnf13.dropout', 'node_type': 'component-node', 'id': 102, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'} 2020-02-11 14:44:31,411 __main__ INFO {'id': 103, 'type': 'Scale', 'name': 'tdnnf12.noop.Scale.0.66', 'input': ['tdnnf12.noop'], 'scale': 0.66} 2020-02-11 14:44:31,411 __main__ INFO {'id': 104, 'type': 'Sum', 'name': 'tdnnf12.noop.Scale.0.66.Sum.tdnnf13.dropout', 'input': ['tdnnf12.noop.Scale.0.66', 'tdnnf13.dropout']} 2020-02-11 14:44:31,411 __main__ INFO {'input': ['tdnnf12.noop.Scale.0.66.Sum.tdnnf13.dropout'], 'component': 'tdnnf13.noop', 'name': 'tdnnf13.noop', 'node_type': 'component-node', 'id': 105, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'} 2020-02-11 14:44:31,412 __main__ INFO {'input': ['tdnnf13.noop'], 'component': 'prefinal-l', 'name': 'prefinal-l', 'node_type': 'component-node', 'id': 106, 'params': (256, 1024), 'rank_inout': 20, 'alpha': 4.0, 'num_samples_history': 2000.0, 'type': 'Linear', 'raw-type': 'Linear', 'params-l2-norm': 14.927766} 2020-02-11 14:44:31,412 __main__ INFO {'input': ['prefinal-l'], 'component': 'prefinal-chain.affine', 'name': 'prefinal-chain.affine', 'node_type': 'component-node', 'id': 107, 'max_change': 0.75, 'params': (1024, 256), 'bias': (1024,), 'rank_in': 20, 'rank_out': 80, 'num_samples_history': 2000.0, 'alpha': 4.0, 'type': 'Gemm', 'raw-type': 'NaturalGradientAffine', 'params-l2-norm': 9.80533, 'bias-l2-norm': 1.433072} 2020-02-11 14:44:31,413 __main__ INFO {'input': ['prefinal-chain.affine'], 'component': 'prefinal-chain.relu', 'name': 'prefinal-chain.relu', 'node_type': 'component-node', 'id': 108, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 19712.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 6.8817225, 'deriv_avg-l2-norm': 12.461684, 'oderiv_rms-l2-norm': 0.0} 2020-02-11 14:44:31,413 __main__ INFO {'input': ['prefinal-chain.relu'], 'component': 'prefinal-chain.batchnorm1', 'name': 'prefinal-chain.batchnorm1', 'node_type': 'component-node', 'id': 109, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 43264.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 6.939498, 'stats_var-l2-norm': 6.1382785} 2020-02-11 14:44:31,413 __main__ INFO {'input': ['prefinal-chain.batchnorm1'], 'component': 'prefinal-chain.linear', 'name': 'prefinal-chain.linear', 'node_type': 'component-node', 'id': 110, 'params': (256, 1024), 'rank_inout': 20, 'alpha': 4.0, 'num_samples_history': 2000.0, 'type': 'Linear', 'raw-type': 'Linear', 'params-l2-norm': 14.669719} 2020-02-11 14:44:31,414 __main__ INFO {'input': ['prefinal-chain.linear'], 'component': 'prefinal-chain.batchnorm2', 'name': 'prefinal-chain.batchnorm2', 'node_type': 'component-node', 'id': 111, 'dim': 256, 'block_dim': 256, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 43264.0, 'stats_mean': (256,), 'stats_var': (256,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 3.6812432e-07, 'stats_var-l2-norm': 21.627495} 2020-02-11 14:44:31,415 __main__ INFO {'input': ['prefinal-chain.batchnorm2'], 'component': 'output.affine', 'name': 'output.affine', 'node_type': 'component-node', 'id': 112, 'max_change': 1.5, 'params': (3448, 256), 'bias': (3448,), 'rank_in': 20, 'rank_out': 80, 'num_samples_history': 2000.0, 'alpha': 4.0, 'type': 'Gemm', 'raw-type': 'NaturalGradientAffine', 'params-l2-norm': 33.547924, 'bias-l2-norm': 6.109993} 2020-02-11 14:44:31,415 __main__ INFO {'objective': 'linear', 'input': ['output.affine'], 'name': 'output', 'node_type': 'output-node', 'type': 'Output', 'id': 113} 2020-02-11 14:44:31,415 __main__ INFO {'input': ['prefinal-l'], 'component': 'prefinal-xent.affine', 'name': 'prefinal-xent.affine', 'node_type': 'component-node', 'id': 114, 'max_change': 0.75, 'params': (1024, 256), 'bias': (1024,), 'rank_in': 20, 'rank_out': 80, 'num_samples_history': 2000.0, 'alpha': 4.0, 'type': 'Gemm', 'raw-type': 'NaturalGradientAffine', 'params-l2-norm': 8.215993, 'bias-l2-norm': 2.358821} 2020-02-11 14:44:31,416 __main__ INFO {'input': ['prefinal-xent.affine'], 'component': 'prefinal-xent.relu', 'name': 'prefinal-xent.relu', 'node_type': 'component-node', 'id': 115, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 23936.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 6.315324, 'deriv_avg-l2-norm': 12.559063, 'oderiv_rms-l2-norm': 0.0} 2020-02-11 14:44:31,416 __main__ INFO {'input': ['prefinal-xent.relu'], 'component': 'prefinal-xent.batchnorm1', 'name': 'prefinal-xent.batchnorm1', 'node_type': 'component-node', 'id': 116, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 43264.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 6.244672, 'stats_var-l2-norm': 3.9114242} 2020-02-11 14:44:31,416 __main__ INFO {'input': ['prefinal-xent.batchnorm1'], 'component': 'prefinal-xent.linear', 'name': 'prefinal-xent.linear', 'node_type': 'component-node', 'id': 117, 'params': (256, 1024), 'rank_inout': 20, 'alpha': 4.0, 'num_samples_history': 2000.0, 'type': 'Linear', 'raw-type': 'Linear', 'params-l2-norm': 10.986344} 2020-02-11 14:44:31,417 __main__ INFO {'input': ['prefinal-xent.linear'], 'component': 'prefinal-xent.batchnorm2', 'name': 'prefinal-xent.batchnorm2', 'node_type': 'component-node', 'id': 118, 'dim': 256, 'block_dim': 256, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 43264.0, 'stats_mean': (256,), 'stats_var': (256,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 2.989858e-07, 'stats_var-l2-norm': 6.3436804} 2020-02-11 14:44:31,418 __main__ INFO {'input': ['prefinal-xent.batchnorm2'], 'component': 'output-xent.affine', 'name': 'output-xent.affine', 'node_type': 'component-node', 'id': 119, 'max_change': 1.5, 'params': (3448, 256), 'bias': (3448,), 'rank_in': 20, 'rank_out': 80, 'num_samples_history': 2000.0, 'alpha': 4.0, 'type': 'Gemm', 'raw-type': 'NaturalGradientAffine', 'params-l2-norm': 54.274277, 'bias-l2-norm': 2.9230652} 2020-02-11 14:44:31,418 __main__ INFO {'input': ['output-xent.affine'], 'component': 'output-xent.log-softmax', 'name': 'output-xent.log-softmax', 'node_type': 'component-node', 'id': 120, 'dim': 3448, 'value_avg': array([], dtype=float32), 'deriv_avg': array([], dtype=float32), 'count': 0.0, 'oderiv_rms': (3448,), 'oderiv_count': 0.0, 'type': 'LogSoftmax', 'raw-type': 'LogSoftmax', 'oderiv_rms-l2-norm': 0.0} 2020-02-11 14:44:31,418 __main__ INFO {'objective': 'linear', 'input': ['output-xent.log-softmax'], 'name': 'output-xent', 'node_type': 'output-node', 'type': 'Output', 'id': 121} and this is the l2-norm of pytorch's model parameter 2020-02-11 14:59:18,264 (common:38) INFO: load checkpoint from /mnt/cfs1_alias1/asr/users/fanlu/task/kaldi_recipe/pybind/s10/exp/chain/train_q2_orthogonal_modelmodel_tdnnf3_def_init_opadam_bs128_ep6_lr1e-3_fpe150_110_90_hn1024_fpr1500000_ms1_2_3_4_5_kernel1_1_1_0_3_3_3_3_3_3_3_3_stride1_1_1_3_1_1_1_1_1_1_1_1_l2r5e-4/best_model.pt 2020-02-11 14:59:19,863 (model_tdnnf3:224) INFO: name: tdnn1_affine.weight, shape: torch.Size([1024, 129]), l2-norm: 13.578791618347168, np-l2-norm: 13.578801155090332 2020-02-11 14:59:28,992 (model_tdnnf3:224) INFO: name: tdnn1_affine.bias, shape: torch.Size([1024]), l2-norm: 2.2701199054718018, np-l2-norm: 2.270120859146118 2020-02-11 14:59:28,992 (model_tdnnf3:224) INFO: name: tdnn1_batchnorm.weight, shape: torch.Size([1024]), l2-norm: 10.799092292785645, np-l2-norm: 10.799091339111328 2020-02-11 14:59:28,993 (model_tdnnf3:224) INFO: name: tdnn1_batchnorm.bias, shape: torch.Size([1024]), l2-norm: 2.189619302749634, np-l2-norm: 2.1896190643310547 2020-02-11 14:59:28,994 (model_tdnnf3:224) INFO: name: tdnnfs.0.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 11.698036193847656, np-l2-norm: 11.698058128356934 2020-02-11 14:59:28,994 (model_tdnnf3:224) INFO: name: tdnnfs.0.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 12.456302642822266, np-l2-norm: 12.45630931854248 2020-02-11 14:59:28,995 (model_tdnnf3:224) INFO: name: tdnnfs.0.affine.bias, shape: torch.Size([1024]), l2-norm: 0.8999515175819397, np-l2-norm: 0.8999518156051636 2020-02-11 14:59:28,995 (model_tdnnf3:224) INFO: name: tdnnfs.0.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 8.536678314208984, np-l2-norm: 8.536681175231934 2020-02-11 14:59:28,995 (model_tdnnf3:224) INFO: name: tdnnfs.0.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.9667662382125854, np-l2-norm: 0.9667660593986511 2020-02-11 14:59:28,996 (model_tdnnf3:224) INFO: name: tdnnfs.1.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 11.157291412353516, np-l2-norm: 11.15730094909668 2020-02-11 14:59:28,997 (model_tdnnf3:224) INFO: name: tdnnfs.1.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.972347259521484, np-l2-norm: 10.972354888916016 2020-02-11 14:59:28,997 (model_tdnnf3:224) INFO: name: tdnnfs.1.affine.bias, shape: torch.Size([1024]), l2-norm: 0.6986019611358643, np-l2-norm: 0.6986021399497986 2020-02-11 14:59:28,997 (model_tdnnf3:224) INFO: name: tdnnfs.1.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 7.437180995941162, np-l2-norm: 7.437183856964111 2020-02-11 14:59:28,998 (model_tdnnf3:224) INFO: name: tdnnfs.1.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.6932032704353333, np-l2-norm: 0.6932030916213989 2020-02-11 14:59:28,998 (model_tdnnf3:224) INFO: name: tdnnfs.2.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.548410415649414, np-l2-norm: 10.548415184020996 2020-02-11 14:59:28,999 (model_tdnnf3:224) INFO: name: tdnnfs.2.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.08105754852295, np-l2-norm: 10.081061363220215 2020-02-11 14:59:28,999 (model_tdnnf3:224) INFO: name: tdnnfs.2.affine.bias, shape: torch.Size([1024]), l2-norm: 0.5530431866645813, np-l2-norm: 0.5530433654785156 2020-02-11 14:59:29,000 (model_tdnnf3:224) INFO: name: tdnnfs.2.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 7.182321548461914, np-l2-norm: 7.182323932647705 2020-02-11 14:59:29,000 (model_tdnnf3:224) INFO: name: tdnnfs.2.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.7540080547332764, np-l2-norm: 0.7540078163146973 2020-02-11 14:59:29,001 (model_tdnnf3:224) INFO: name: tdnnfs.3.linear.conv.weight, shape: torch.Size([128, 1024, 1]), l2-norm: 7.231468677520752, np-l2-norm: 7.23146915435791 2020-02-11 14:59:29,001 (model_tdnnf3:224) INFO: name: tdnnfs.3.affine.weight, shape: torch.Size([1024, 128, 1]), l2-norm: 7.3756842613220215, np-l2-norm: 7.37568473815918 2020-02-11 14:59:29,001 (model_tdnnf3:224) INFO: name: tdnnfs.3.affine.bias, shape: torch.Size([1024]), l2-norm: 0.8493649363517761, np-l2-norm: 0.8493649959564209 2020-02-11 14:59:29,002 (model_tdnnf3:224) INFO: name: tdnnfs.3.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 4.721061706542969, np-l2-norm: 4.721061706542969 2020-02-11 14:59:29,002 (model_tdnnf3:224) INFO: name: tdnnfs.3.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.8471177816390991, np-l2-norm: 0.8471177220344543 2020-02-11 14:59:29,003 (model_tdnnf3:224) INFO: name: tdnnfs.4.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.269883155822754, np-l2-norm: 10.26988697052002 2020-02-11 14:59:29,003 (model_tdnnf3:224) INFO: name: tdnnfs.4.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 11.01193904876709, np-l2-norm: 11.011943817138672 2020-02-11 14:59:29,004 (model_tdnnf3:224) INFO: name: tdnnfs.4.affine.bias, shape: torch.Size([1024]), l2-norm: 0.6541978120803833, np-l2-norm: 0.6541979312896729 2020-02-11 14:59:29,004 (model_tdnnf3:224) INFO: name: tdnnfs.4.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.021320343017578, np-l2-norm: 6.021320343017578 2020-02-11 14:59:29,004 (model_tdnnf3:224) INFO: name: tdnnfs.4.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.7195524573326111, np-l2-norm: 0.7195526957511902 2020-02-11 14:59:29,005 (model_tdnnf3:224) INFO: name: tdnnfs.5.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.503177642822266, np-l2-norm: 10.503182411193848 2020-02-11 14:59:29,006 (model_tdnnf3:224) INFO: name: tdnnfs.5.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 11.230865478515625, np-l2-norm: 11.23087215423584 2020-02-11 14:59:29,006 (model_tdnnf3:224) INFO: name: tdnnfs.5.affine.bias, shape: torch.Size([1024]), l2-norm: 0.6815171837806702, np-l2-norm: 0.6815172433853149 2020-02-11 14:59:29,006 (model_tdnnf3:224) INFO: name: tdnnfs.5.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.027979373931885, np-l2-norm: 6.027980327606201 2020-02-11 14:59:29,007 (model_tdnnf3:224) INFO: name: tdnnfs.5.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.66847163438797, np-l2-norm: 0.6684714555740356 2020-02-11 14:59:29,007 (model_tdnnf3:224) INFO: name: tdnnfs.6.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.178318977355957, np-l2-norm: 10.178319931030273 2020-02-11 14:59:29,008 (model_tdnnf3:224) INFO: name: tdnnfs.6.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.798919677734375, np-l2-norm: 10.798924446105957 2020-02-11 14:59:29,008 (model_tdnnf3:224) INFO: name: tdnnfs.6.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7030293345451355, np-l2-norm: 0.703029453754425 2020-02-11 14:59:29,009 (model_tdnnf3:224) INFO: name: tdnnfs.6.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 5.839771747589111, np-l2-norm: 5.839775085449219 2020-02-11 14:59:29,009 (model_tdnnf3:224) INFO: name: tdnnfs.6.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.6625904440879822, np-l2-norm: 0.6625903248786926 2020-02-11 14:59:29,010 (model_tdnnf3:224) INFO: name: tdnnfs.7.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.609781265258789, np-l2-norm: 10.609784126281738 2020-02-11 14:59:29,010 (model_tdnnf3:224) INFO: name: tdnnfs.7.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.92614459991455, np-l2-norm: 10.926151275634766 2020-02-11 14:59:29,011 (model_tdnnf3:224) INFO: name: tdnnfs.7.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7368164658546448, np-l2-norm: 0.7368165850639343 2020-02-11 14:59:29,011 (model_tdnnf3:224) INFO: name: tdnnfs.7.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.065881252288818, np-l2-norm: 6.065880298614502 2020-02-11 14:59:29,011 (model_tdnnf3:224) INFO: name: tdnnfs.7.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.6848169565200806, np-l2-norm: 0.6848171353340149 2020-02-11 14:59:29,012 (model_tdnnf3:224) INFO: name: tdnnfs.8.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.567127227783203, np-l2-norm: 10.567130088806152 2020-02-11 14:59:29,013 (model_tdnnf3:224) INFO: name: tdnnfs.8.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.671625137329102, np-l2-norm: 10.671629905700684 2020-02-11 14:59:29,013 (model_tdnnf3:224) INFO: name: tdnnfs.8.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7605870962142944, np-l2-norm: 0.7605868577957153 2020-02-11 14:59:29,013 (model_tdnnf3:224) INFO: name: tdnnfs.8.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.370297908782959, np-l2-norm: 6.37030029296875 2020-02-11 14:59:29,014 (model_tdnnf3:224) INFO: name: tdnnfs.8.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.673714280128479, np-l2-norm: 0.6737140417098999 2020-02-11 14:59:29,014 (model_tdnnf3:224) INFO: name: tdnnfs.9.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.171960830688477, np-l2-norm: 10.17195987701416 2020-02-11 14:59:29,015 (model_tdnnf3:224) INFO: name: tdnnfs.9.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.168111801147461, np-l2-norm: 10.168119430541992 2020-02-11 14:59:29,015 (model_tdnnf3:224) INFO: name: tdnnfs.9.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7516912817955017, np-l2-norm: 0.7516909241676331 2020-02-11 14:59:29,016 (model_tdnnf3:224) INFO: name: tdnnfs.9.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.299006938934326, np-l2-norm: 6.299007892608643 2020-02-11 14:59:29,016 (model_tdnnf3:224) INFO: name: tdnnfs.9.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.6358782649040222, np-l2-norm: 0.6358781456947327 2020-02-11 14:59:29,017 (model_tdnnf3:224) INFO: name: tdnnfs.10.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.095071792602539, np-l2-norm: 10.095071792602539 2020-02-11 14:59:29,017 (model_tdnnf3:224) INFO: name: tdnnfs.10.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.038125038146973, np-l2-norm: 10.038130760192871 2020-02-11 14:59:29,017 (model_tdnnf3:224) INFO: name: tdnnfs.10.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7603970766067505, np-l2-norm: 0.7603970170021057 2020-02-11 14:59:29,018 (model_tdnnf3:224) INFO: name: tdnnfs.10.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.457774639129639, np-l2-norm: 6.457772731781006 2020-02-11 14:59:29,018 (model_tdnnf3:224) INFO: name: tdnnfs.10.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.7173940539360046, np-l2-norm: 0.7173939347267151 2020-02-11 14:59:29,019 (model_tdnnf3:224) INFO: name: tdnnfs.11.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 9.877976417541504, np-l2-norm: 9.877982139587402 2020-02-11 14:59:29,019 (model_tdnnf3:224) INFO: name: tdnnfs.11.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 9.827290534973145, np-l2-norm: 9.827296257019043 2020-02-11 14:59:29,020 (model_tdnnf3:224) INFO: name: tdnnfs.11.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7950196266174316, np-l2-norm: 0.7950197458267212 2020-02-11 14:59:29,020 (model_tdnnf3:224) INFO: name: tdnnfs.11.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.03067684173584, np-l2-norm: 6.03067684173584 2020-02-11 14:59:29,020 (model_tdnnf3:224) INFO: name: tdnnfs.11.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 1.1897754669189453, np-l2-norm: 1.1897754669189453 2020-02-11 14:59:29,021 (model_tdnnf3:224) INFO: name: prefinal_l.conv.weight, shape: torch.Size([256, 1024, 1]), l2-norm: 12.372564315795898, np-l2-norm: 12.372568130493164 2020-02-11 14:59:29,022 (model_tdnnf3:224) INFO: name: prefinal_chain.affine.weight, shape: torch.Size([1024, 256]), l2-norm: 10.835281372070312, np-l2-norm: 10.835285186767578 2020-02-11 14:59:29,022 (model_tdnnf3:224) INFO: name: prefinal_chain.affine.bias, shape: torch.Size([1024]), l2-norm: 1.2527509927749634, np-l2-norm: 1.2527514696121216 2020-02-11 14:59:29,023 (model_tdnnf3:224) INFO: name: prefinal_chain.batchnorm1.weight, shape: torch.Size([1024]), l2-norm: 10.376221656799316, np-l2-norm: 10.37622356414795 2020-02-11 14:59:29,024 (model_tdnnf3:224) INFO: name: prefinal_chain.batchnorm1.bias, shape: torch.Size([1024]), l2-norm: 1.386192707286682e-05, np-l2-norm: 1.3861925253877416e-05 2020-02-11 14:59:29,025 (model_tdnnf3:224) INFO: name: prefinal_chain.linear.conv.weight, shape: torch.Size([256, 1024, 1]), l2-norm: 10.254746437072754, np-l2-norm: 10.254751205444336 2020-02-11 14:59:29,025 (model_tdnnf3:224) INFO: name: prefinal_chain.batchnorm2.weight, shape: torch.Size([256]), l2-norm: 7.542169570922852, np-l2-norm: 7.542168140411377 2020-02-11 14:59:29,025 (model_tdnnf3:224) INFO: name: prefinal_chain.batchnorm2.bias, shape: torch.Size([256]), l2-norm: 0.5150323510169983, np-l2-norm: 0.5150324702262878 2020-02-11 14:59:29,027 (model_tdnnf3:224) INFO: name: output_affine.weight, shape: torch.Size([4336, 256]), l2-norm: 16.972375869750977, np-l2-norm: 16.97245216369629 2020-02-11 14:59:29,027 (model_tdnnf3:224) INFO: name: output_affine.bias, shape: torch.Size([4336]), l2-norm: 0.8673517107963562, np-l2-norm: 0.8673520088195801 2020-02-11 14:59:29,028 (model_tdnnf3:224) INFO: name: prefinal_xent.affine.weight, shape: torch.Size([1024, 256]), l2-norm: 8.239212989807129, np-l2-norm: 8.239215850830078 2020-02-11 14:59:29,028 (model_tdnnf3:224) INFO: name: prefinal_xent.affine.bias, shape: torch.Size([1024]), l2-norm: 0.741665780544281, np-l2-norm: 0.7416657209396362 2020-02-11 14:59:29,029 (model_tdnnf3:224) INFO: name: prefinal_xent.batchnorm1.weight, shape: torch.Size([1024]), l2-norm: 6.000321865081787, np-l2-norm: 6.00032377243042 2020-02-11 14:59:29,029 (model_tdnnf3:224) INFO: name: prefinal_xent.batchnorm1.bias, shape: torch.Size([1024]), l2-norm: 2.1985473722452298e-05, np-l2-norm: 2.198547008447349e-05 2020-02-11 14:59:29,030 (model_tdnnf3:224) INFO: name: prefinal_xent.linear.conv.weight, shape: torch.Size([256, 1024, 1]), l2-norm: 9.000222206115723, np-l2-norm: 9.000225067138672 2020-02-11 14:59:29,030 (model_tdnnf3:224) INFO: name: prefinal_xent.batchnorm2.weight, shape: torch.Size([256]), l2-norm: 29.329822540283203, np-l2-norm: 29.329822540283203 2020-02-11 14:59:29,031 (model_tdnnf3:224) INFO: name: prefinal_xent.batchnorm2.bias, shape: torch.Size([256]), l2-norm: 2.5147366523742676, np-l2-norm: 2.5147361755371094 2020-02-11 14:59:29,032 (model_tdnnf3:224) INFO: name: output_xent_affine.weight, shape: torch.Size([4336, 256]), l2-norm: 30.42934799194336, np-l2-norm: 30.429412841796875 2020-02-11 14:59:29,033 (model_tdnnf3:224) INFO: name: output_xent_affine.bias, shape: torch.Size([4336]), l2-norm: 0.7537439465522766, np-l2-norm: 0.7537445425987244 2020-02-11 14:59:29,033 (model_tdnnf3:224) INFO: name: input_batch_norm.weight, shape: torch.Size([129]), l2-norm: 8.521322250366211, np-l2-norm: 8.521322250366211 2020-02-11 14:59:29,033 (model_tdnnf3:224) INFO: name: input_batch_norm.bias, shape: torch.Size([129]), l2-norm: 1.5148330926895142, np-l2-norm: 1.5148330926895142 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3892?email_source=notifications&email_token=AAZFLOYCJLQ2WP6YKALJGU3RCJFCBA5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELLMZCQ#issuecomment-584502410>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO5ZC4I3DCKYKN37YTLRCJFCBANCNFSM4KNO5YQA> .

fanlu · 2020-02-11T07:33:40Z

oh sorry for Kaldi's model.. it will be printed in the progress.N.log, search for Norm
…
On Tue, Feb 11, 2020 at 2:22 PM 付嘉懿 @.***> wrote: which layer should I focus on ? And Is there a tool to get l2 norm of kaldi's parameter? Maybe this is a simple way: use kaldi tool: "nnet-am-copy --binary=false final.mdl" to convert the mdl file to the text mode and then write a script a compute the 2-norm of weights. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3892?email_source=notifications&email_token=AAZFLO5VNIK7WCBUT6DKGX3RCI72XA5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELLKSHA#issuecomment-584493340>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO2X4YKGIPDIS7REQFDRCI72XANCNFSM4KNO5YQA .

the log of last iter or all iters? this is the Norms log from kaldi's tdnn_1c.

./log/progress.74.log:LOG (nnet3-show-progress[5.5.717~1-e05890d]:main():nnet3-show-progress.cc:153) Norms of parameter matrices from <new-nnet-in> are [ tdnn1.affine:18.4505 tdnnf2.linear:14.7518 tdnnf2.affine:13.3762 tdnnf3.linear:12.3981 tdnnf3.affine:11.0532 tdnnf4.linear:11.7087 tdnnf4.affine:10.0316 tdnnf5.linear:8.64949 tdnnf5.affine:8.20888 tdnnf6.linear:11.7272 tdnnf6.affine:10.3102 tdnnf7.linear:11.3739 tdnnf7.affine:10.0447 tdnnf8.linear:11.1174 tdnnf8.affine:9.76855 tdnnf9.linear:10.9489 tdnnf9.affine:9.57144 tdnnf10.linear:10.8642 tdnnf10.affine:9.30931 tdnnf11.linear:10.8142 tdnnf11.affine:9.08426 tdnnf12.linear:10.6601 tdnnf12.affine:9.06823 tdnnf13.linear:10.4231 tdnnf13.affine:9.131 prefinal-l:15.0584 prefinal-chain.affine:10.0323 prefinal-chain.linear:15.0632 output.affine:34.4237 prefinal-xent.affine:8.71721 prefinal-xent.linear:11.205 output-xent.affine:54.2259 ]
./log/progress.75.log:LOG (nnet3-show-progress[5.5.717~1-e05890d]:main():nnet3-show-progress.cc:153) Norms of parameter matrices from <new-nnet-in> are [ tdnn1.affine:18.3429 tdnnf2.linear:14.6932 tdnnf2.affine:13.3021 tdnnf3.linear:12.3406 tdnnf3.affine:11.0014 tdnnf4.linear:11.6603 tdnnf4.affine:9.9918 tdnnf5.linear:8.60848 tdnnf5.affine:8.17265 tdnnf6.linear:11.6833 tdnnf6.affine:10.2691 tdnnf7.linear:11.3308 tdnnf7.affine:10.006 tdnnf8.linear:11.0748 tdnnf8.affine:9.73264 tdnnf9.linear:10.9096 tdnnf9.affine:9.53797 tdnnf10.linear:10.8255 tdnnf10.affine:9.27463 tdnnf11.linear:10.7736 tdnnf11.affine:9.04792 tdnnf12.linear:10.6172 tdnnf12.affine:9.03221 tdnnf13.linear:10.3811 tdnnf13.affine:9.0969 prefinal-l:14.9907 prefinal-chain.affine:9.98558 prefinal-chain.linear:14.9604 output.affine:34.3603 prefinal-xent.affine:8.66909 prefinal-xent.linear:11.1402 output-xent.affine:54.1951 ]
./log/progress.76.log:LOG (nnet3-show-progress[5.5.717~1-e05890d]:main():nnet3-show-progress.cc:153) Norms of parameter matrices from <new-nnet-in> are [ tdnn1.affine:18.2866 tdnnf2.linear:14.6814 tdnnf2.affine:13.2642 tdnnf3.linear:12.3177 tdnnf3.affine:10.9801 tdnnf4.linear:11.6426 tdnnf4.affine:9.98039 tdnnf5.linear:8.59164 tdnnf5.affine:8.16007 tdnnf6.linear:11.6703 tdnnf6.affine:10.2559 tdnnf7.linear:11.3178 tdnnf7.affine:9.99525 tdnnf8.linear:11.0611 tdnnf8.affine:9.72302 tdnnf9.linear:10.899 tdnnf9.affine:9.53062 tdnnf10.linear:10.8144 tdnnf10.affine:9.26636 tdnnf11.linear:10.7627 tdnnf11.affine:9.03746 tdnnf12.linear:10.6021 tdnnf12.affine:9.02006 tdnnf13.linear:10.3647 tdnnf13.affine:9.08546 prefinal-l:14.9618 prefinal-chain.affine:9.96465 prefinal-chain.linear:14.8922 output.affine:34.3072 prefinal-xent.affine:8.64409 prefinal-xent.linear:11.1031 output-xent.affine:54.2753 ]
./log/progress.77.log:LOG (nnet3-show-progress[5.5.717~1-e05890d]:main():nnet3-show-progress.cc:153) Norms of parameter matrices from <new-nnet-in> are [ tdnn1.affine:18.2199 tdnnf2.linear:14.6626 tdnnf2.affine:13.2239 tdnnf3.linear:12.2911 tdnnf3.affine:10.9535 tdnnf4.linear:11.6206 tdnnf4.affine:9.96432 tdnnf5.linear:8.57113 tdnnf5.affine:8.14308 tdnnf6.linear:11.6539 tdnnf6.affine:10.2378 tdnnf7.linear:11.2999 tdnnf7.affine:9.97964 tdnnf8.linear:11.0445 tdnnf8.affine:9.70736 tdnnf9.linear:10.8822 tdnnf9.affine:9.51659 tdnnf10.linear:10.7979 tdnnf10.affine:9.25141 tdnnf11.linear:10.7436 tdnnf11.affine:9.02005 tdnnf12.linear:10.5817 tdnnf12.affine:9.00242 tdnnf13.linear:10.3444 tdnnf13.affine:9.07074 prefinal-l:14.9253 prefinal-chain.affine:9.93753 prefinal-chain.linear:14.8181 output.affine:34.2569 prefinal-xent.affine:8.61303 prefinal-xent.linear:11.0593 output-xent.affine:54.323 ]
./log/progress.78.log:LOG (nnet3-show-progress[5.5.717~1-e05890d]:main():nnet3-show-progress.cc:153) Norms of parameter matrices from <new-nnet-in> are [ tdnn1.affine:18.1474 tdnnf2.linear:14.6329 tdnnf2.affine:13.1756 tdnnf3.linear:12.2561 tdnnf3.affine:10.9222 tdnnf4.linear:11.5922 tdnnf4.affine:9.94353 tdnnf5.linear:8.5489 tdnnf5.affine:8.12412 tdnnf6.linear:11.6324 tdnnf6.affine:10.2153 tdnnf7.linear:11.2786 tdnnf7.affine:9.96117 tdnnf8.linear:11.0244 tdnnf8.affine:9.69122 tdnnf9.linear:10.8637 tdnnf9.affine:9.50082 tdnnf10.linear:10.7798 tdnnf10.affine:9.23452 tdnnf11.linear:10.7234 tdnnf11.affine:9.00197 tdnnf12.linear:10.5592 tdnnf12.affine:8.98375 tdnnf13.linear:10.3206 tdnnf13.affine:9.05327 prefinal-l:14.8827 prefinal-chain.affine:9.90754 prefinal-chain.linear:14.7399 output.affine:34.2076 prefinal-xent.affine:8.58043 prefinal-xent.linear:11.0129 output-xent.affine:54.3543 ]

danpovey · 2020-02-11T07:42:35Z

Yes, cool. What are the corresponding norms from the PyTorch model? If you have showed these already, I didn't notice them.

…

On Tue, Feb 11, 2020 at 3:33 PM fanlu ***@***.***> wrote: oh sorry for Kaldi's model.. it will be printed in the progress.N.log, search for Norm … <#m_-723985659311707384_> On Tue, Feb 11, 2020 at 2:22 PM 付嘉懿 *@*.***> wrote: which layer should I focus on ? And Is there a tool to get l2 norm of kaldi's parameter? Maybe this is a simple way: use kaldi tool: "nnet-am-copy --binary=false final.mdl" to convert the mdl file to the text mode and then write a script a compute the 2-norm of weights. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3892 <#3892>?email_source=notifications&email_token=AAZFLO5VNIK7WCBUT6DKGX3RCI72XA5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELLKSHA#issuecomment-584493340>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO2X4YKGIPDIS7REQFDRCI72XANCNFSM4KNO5YQA . the log of last iter or all iters? this is the Norms log from kaldi's tdnn_1c. ./log/progress.74.log:LOG (nnet3-show-progress[5.5.717~1-e05890d]:main():nnet3-show-progress.cc:153) Norms of parameter matrices from <new-nnet-in> are [ tdnn1.affine:18.4505 tdnnf2.linear:14.7518 tdnnf2.affine:13.3762 tdnnf3.linear:12.3981 tdnnf3.affine:11.0532 tdnnf4.linear:11.7087 tdnnf4.affine:10.0316 tdnnf5.linear:8.64949 tdnnf5.affine:8.20888 tdnnf6.linear:11.7272 tdnnf6.affine:10.3102 tdnnf7.linear:11.3739 tdnnf7.affine:10.0447 tdnnf8.linear:11.1174 tdnnf8.affine:9.76855 tdnnf9.linear:10.9489 tdnnf9.affine:9.57144 tdnnf10.linear:10.8642 tdnnf10.affine:9.30931 tdnnf11.linear:10.8142 tdnnf11.affine:9.08426 tdnnf12.linear:10.6601 tdnnf12.affine:9.06823 tdnnf13.linear:10.4231 tdnnf13.affine:9.131 prefinal-l:15.0584 prefinal-chain.affine:10.0323 prefinal-chain.linear:15.0632 output.affine:34.4237 prefinal-xent.affine:8.71721 prefinal-xent.linear:11.205 output-xent.affine:54.2259 ] ./log/progress.75.log:LOG (nnet3-show-progress[5.5.717~1-e05890d]:main():nnet3-show-progress.cc:153) Norms of parameter matrices from <new-nnet-in> are [ tdnn1.affine:18.3429 tdnnf2.linear:14.6932 tdnnf2.affine:13.3021 tdnnf3.linear:12.3406 tdnnf3.affine:11.0014 tdnnf4.linear:11.6603 tdnnf4.affine:9.9918 tdnnf5.linear:8.60848 tdnnf5.affine:8.17265 tdnnf6.linear:11.6833 tdnnf6.affine:10.2691 tdnnf7.linear:11.3308 tdnnf7.affine:10.006 tdnnf8.linear:11.0748 tdnnf8.affine:9.73264 tdnnf9.linear:10.9096 tdnnf9.affine:9.53797 tdnnf10.linear:10.8255 tdnnf10.affine:9.27463 tdnnf11.linear:10.7736 tdnnf11.affine:9.04792 tdnnf12.linear:10.6172 tdnnf12.affine:9.03221 tdnnf13.linear:10.3811 tdnnf13.affine:9.0969 prefinal-l:14.9907 prefinal-chain.affine:9.98558 prefinal-chain.linear:14.9604 output.affine:34.3603 prefinal-xent.affine:8.66909 prefinal-xent.linear:11.1402 output-xent.affine:54.1951 ] ./log/progress.76.log:LOG (nnet3-show-progress[5.5.717~1-e05890d]:main():nnet3-show-progress.cc:153) Norms of parameter matrices from <new-nnet-in> are [ tdnn1.affine:18.2866 tdnnf2.linear:14.6814 tdnnf2.affine:13.2642 tdnnf3.linear:12.3177 tdnnf3.affine:10.9801 tdnnf4.linear:11.6426 tdnnf4.affine:9.98039 tdnnf5.linear:8.59164 tdnnf5.affine:8.16007 tdnnf6.linear:11.6703 tdnnf6.affine:10.2559 tdnnf7.linear:11.3178 tdnnf7.affine:9.99525 tdnnf8.linear:11.0611 tdnnf8.affine:9.72302 tdnnf9.linear:10.899 tdnnf9.affine:9.53062 tdnnf10.linear:10.8144 tdnnf10.affine:9.26636 tdnnf11.linear:10.7627 tdnnf11.affine:9.03746 tdnnf12.linear:10.6021 tdnnf12.affine:9.02006 tdnnf13.linear:10.3647 tdnnf13.affine:9.08546 prefinal-l:14.9618 prefinal-chain.affine:9.96465 prefinal-chain.linear:14.8922 output.affine:34.3072 prefinal-xent.affine:8.64409 prefinal-xent.linear:11.1031 output-xent.affine:54.2753 ] ./log/progress.77.log:LOG (nnet3-show-progress[5.5.717~1-e05890d]:main():nnet3-show-progress.cc:153) Norms of parameter matrices from <new-nnet-in> are [ tdnn1.affine:18.2199 tdnnf2.linear:14.6626 tdnnf2.affine:13.2239 tdnnf3.linear:12.2911 tdnnf3.affine:10.9535 tdnnf4.linear:11.6206 tdnnf4.affine:9.96432 tdnnf5.linear:8.57113 tdnnf5.affine:8.14308 tdnnf6.linear:11.6539 tdnnf6.affine:10.2378 tdnnf7.linear:11.2999 tdnnf7.affine:9.97964 tdnnf8.linear:11.0445 tdnnf8.affine:9.70736 tdnnf9.linear:10.8822 tdnnf9.affine:9.51659 tdnnf10.linear:10.7979 tdnnf10.affine:9.25141 tdnnf11.linear:10.7436 tdnnf11.affine:9.02005 tdnnf12.linear:10.5817 tdnnf12.affine:9.00242 tdnnf13.linear:10.3444 tdnnf13.affine:9.07074 prefinal-l:14.9253 prefinal-chain.affine:9.93753 prefinal-chain.linear:14.8181 output.affine:34.2569 prefinal-xent.affine:8.61303 prefinal-xent.linear:11.0593 output-xent.affine:54.323 ] ./log/progress.78.log:LOG (nnet3-show-progress[5.5.717~1-e05890d]:main():nnet3-show-progress.cc:153) Norms of parameter matrices from <new-nnet-in> are [ tdnn1.affine:18.1474 tdnnf2.linear:14.6329 tdnnf2.affine:13.1756 tdnnf3.linear:12.2561 tdnnf3.affine:10.9222 tdnnf4.linear:11.5922 tdnnf4.affine:9.94353 tdnnf5.linear:8.5489 tdnnf5.affine:8.12412 tdnnf6.linear:11.6324 tdnnf6.affine:10.2153 tdnnf7.linear:11.2786 tdnnf7.affine:9.96117 tdnnf8.linear:11.0244 tdnnf8.affine:9.69122 tdnnf9.linear:10.8637 tdnnf9.affine:9.50082 tdnnf10.linear:10.7798 tdnnf10.affine:9.23452 tdnnf11.linear:10.7234 tdnnf11.affine:9.00197 tdnnf12.linear:10.5592 tdnnf12.affine:8.98375 tdnnf13.linear:10.3206 tdnnf13.affine:9.05327 prefinal-l:14.8827 prefinal-chain.affine:9.90754 prefinal-chain.linear:14.7399 output.affine:34.2076 prefinal-xent.affine:8.58043 prefinal-xent.linear:11.0129 output-xent.affine:54.3543 ] — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3892?email_source=notifications&email_token=AAZFLOYR4NHRTZIYTVOF27LRCJIFLA5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELLONIY#issuecomment-584509091>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO4JJWJB3D62PBJESKTRCJIFLANCNFSM4KNO5YQA> .

fanlu · 2020-02-11T07:50:38Z

I am drawing the corresponding norms of the Pytorch model, please wait a while.

danpovey · 2020-02-11T08:02:21Z

Thanks. Just the numbers would be fine-- no figure needed!

…

On Tue, Feb 11, 2020 at 3:50 PM fanlu ***@***.***> wrote: I am drawing the corresponding norms of the Pytorch model, please wait a while. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3892?email_source=notifications&email_token=AAZFLO3GC34RFTW2C23PEPLRCJKE7A5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELLPOFA#issuecomment-584513300>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLOZPONYKISYB5ZTJQMLRCJKE7ANCNFSM4KNO5YQA> .

fanlu · 2020-02-11T08:06:23Z

I must run this exp to log norm of every iteration again, Since I have the last norm of Pytorch model only.

danpovey · 2020-02-11T08:10:16Z

Just the last one is fine.

…

On Tue, Feb 11, 2020 at 4:06 PM fanlu ***@***.***> wrote: I must run this exp to log norm of every iteration again, Since I have the last norm of Pytorch model only. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3892?email_source=notifications&email_token=AAZFLO56B2YNN4XAVNFDXI3RCJMABA5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELLQONY#issuecomment-584517431>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLOZGGDSG3DCWIE4X673RCJMABANCNFSM4KNO5YQA> .

danpovey · 2020-02-11T08:15:24Z

Let me merge this now, so we don't get too far out of sync.

fanlu · 2020-02-11T08:32:59Z

Here is the differences.

kaldi   [ tdnn1.affine:18.1474 tdnnf2.linear:14.6329 tdnnf2.affine:13.1756 tdnnf3.linear:12.2561 tdnnf3.affine:10.9222 tdnnf4.linear:11.5922 tdnnf4.affine:9.94353 tdnnf5.linear:8.5489 tdnnf5.affine:8.12412 tdnnf6.linear:11.6324 tdnnf6.affine:10.2153 tdnnf7.linear:11.2786 tdnnf7.affine:9.96117 tdnnf8.linear:11.0244 tdnnf8.affine:9.69122 tdnnf9.linear:10.8637 tdnnf9.affine:9.50082 tdnnf10.linear:10.7798 tdnnf10.affine:9.23452 tdnnf11.linear:10.7234 tdnnf11.affine:9.00197 tdnnf12.linear:10.5592 tdnnf12.affine:8.98375 tdnnf13.linear:10.3206 tdnnf13.affine:9.05327 prefinal-l:14.8827 prefinal-chain.affine:9.90754 prefinal-chain.linear:14.7399 output.affine:34.2076 prefinal-xent.affine:8.58043 prefinal-xent.linear:11.0129 output-xent.affine:54.3543 ]
pytorch [ tdnn1.affine:13.5788 tdnnf2.linear:11.6980 tdnnf2.affine:12.4563 tdnnf3.linear:11.1573 tdnnf3.affine:10.9723 tdnnf4.linear:10.5484 tdnnf4.affine:10.0810 tdnnf5.linear:7.2315 tdnnf5.affine:7.3757 tdnnf6.linear:10.2699 tdnnf6.affine:11.0119 tdnnf7.linear:10.5032 tdnnf7.affine:11.2309 tdnnf8.linear:10.1783 tdnnf8.affine:10.7989 tdnnf9.linear:10.6098 tdnnf9.affine:10.9261 tdnnf10.linear:10.5671 tdnnf10.affine:10.6716 tdnnf11.linear:10.1720 tdnnf11.affine:10.1681 tdnnf12.linear:10.0951 tdnnf12.affine:10.0381 tdnnf13.linear:9.8780 tdnnf13.affine:9.8273 prefinal-l:12.3726 prefinal-chain.affine:10.83528 prefinal-chain.linear:10.2547 output.affine:16.9724 prefinal-xent.affine:8.2392 prefinal-xent.linear:9.0002 output-xent.affine:30.4293 ]

danpovey · 2020-02-11T08:35:10Z

OK, interesting. They are very close. What were the final learning rates in each case, and what was the minibatch size in PyTorch?

fanlu · 2020-02-11T08:48:53Z

the final leaning rate is :3.125e-5, and the batch size is 128

danpovey · 2020-02-11T10:04:21Z

egs/aishell/s10/chain/tdnnf_layer.py

+        # TODO(fangjun): implement GeneralDropoutComponent in PyTorch
+
+        if self.linear.kernel_size == 3:
+            x = self.bypass_scale * input_x[:, :, 1:-1:self.conv_stride] + x


shouldn't this be c:-c:c rather than 1:-1:c, where c is self.conv_stride?

Suppose the time_stride is 1 and the conv_stride is 1.

If the input time index is

0 1 2 3 4 5 6

After self.linear, the time index will be

1 2 3 4 5

since the kernel shape is [-1, 0, 1] (time_stride == 1)

After self.affine, the time index is still

1 2 3 4 5

The index of input[1:-1:self.conv_stride] is [1, 2, 3, 4, 5] which matches
the output of self.affine.

It is assumed that

time_stride == 1, conv_stride == 1

or

time_stride == 0, conv_stride == 3

So c:-c:c is equivalent to 1:-1:c when time_stride==1 and conv_stride == 1.

I don't think it should be called time_stride here. Perhaps in the original Kaldi code it wasn't super clear but when implemented as convolution it gets very confusing. Better to make (stride, kernel_size) the parameters and have them be (1, 3), (1, 3), ... (3, 3), (1, 1), (1, 3), (1, 3) ...
In any case, please revert other aspects of the implementation to more similar to the way it was before and start doing experiments with that. I don't see much point starting from such a strange starting point. (i.e. the way the code is right now).

I agree.

I also find them confusing but I wrote it this way to follow the naming style in Kaldi.

I'll change them now.

danpovey · 2020-02-11T10:04:57Z

egs/aishell/s10/chain/tdnnf_layer.py

+                                stride=conv_stride)
+
+        # batchnorm requires [N, C, T]
+        self.batchnorm = nn.BatchNorm1d(num_features=dim)


It would be a closer match to what Kaldi's system is doing if you were to add affine=False wherever you use batchnorm.

will be addressed in the next pullrequest.

csukuangfj added 2 commits January 30, 2020 10:43

[WIP]: add TDNNF to pytorch.

27843a9

add skip connection.

16052ce

update model to use TDNNF.

7c7dda3

csukuangfj force-pushed the fangjun-tdnnf branch from 9508860 to a560d0d Compare January 31, 2020 04:24

update training scripts.

154e366

csukuangfj force-pushed the fangjun-tdnnf branch from a560d0d to 154e366 Compare January 31, 2020 06:02

csukuangfj changed the title ~~[WIP]: add TDNNF to pytorch.~~ WIP: add TDNNF to pytorch. Jan 31, 2020

remove pitch features.

00e44ff

danpovey merged commit be0842f into kaldi-asr:pybind11 Feb 11, 2020

danpovey reviewed Feb 11, 2020

View reviewed changes

csukuangfj deleted the fangjun-tdnnf branch February 12, 2020 00:01

qindazhu mentioned this pull request Feb 14, 2020

show L2 norm of parameters during training. #3925

Merged

WIP: add TDNNF to pytorch. #3892

WIP: add TDNNF to pytorch. #3892

Conversation

csukuangfj commented Jan 30, 2020

danpovey commented Jan 30, 2020

csukuangfj commented Jan 30, 2020

danpovey commented Jan 30, 2020 via email

csukuangfj commented Jan 31, 2020 • edited Loading

csukuangfj commented Jan 31, 2020

csukuangfj commented Jan 31, 2020 • edited Loading

jtrmal commented Jan 31, 2020

csukuangfj commented Jan 31, 2020

jtrmal commented Jan 31, 2020 via email

RuABraun commented Jan 31, 2020

danpovey commented Jan 31, 2020

jtrmal commented Jan 31, 2020 via email

qindazhu commented Feb 1, 2020

All result until now

TDNN

TDNN-F

csukuangfj commented Feb 1, 2020

qindazhu commented Feb 1, 2020

csukuangfj commented Feb 1, 2020

danpovey commented Feb 1, 2020 via email

csukuangfj commented Feb 4, 2020

csukuangfj commented Feb 4, 2020

danpovey commented Feb 4, 2020 via email

csukuangfj commented Feb 4, 2020

danpovey commented Feb 4, 2020 via email

csukuangfj commented Feb 6, 2020

fanlu commented Feb 10, 2020 • edited Loading

danpovey commented Feb 10, 2020

fanlu commented Feb 10, 2020

fanlu commented Feb 10, 2020

csukuangfj commented Feb 11, 2020 via email

csukuangfj commented Feb 11, 2020 via email

fanlu commented Feb 11, 2020

danpovey commented Feb 11, 2020

danpovey commented Feb 11, 2020

csukuangfj commented Feb 11, 2020 via email

JiayiFu commented Feb 11, 2020 • edited Loading

danpovey commented Feb 11, 2020 via email

fanlu commented Feb 11, 2020 • edited Loading

danpovey commented Feb 11, 2020 via email

fanlu commented Feb 11, 2020

danpovey commented Feb 11, 2020 via email

fanlu commented Feb 11, 2020

danpovey commented Feb 11, 2020 via email

fanlu commented Feb 11, 2020

danpovey commented Feb 11, 2020 via email

danpovey commented Feb 11, 2020

fanlu commented Feb 11, 2020

danpovey commented Feb 11, 2020

fanlu commented Feb 11, 2020

danpovey Feb 11, 2020

Choose a reason for hiding this comment

csukuangfj Feb 12, 2020

Choose a reason for hiding this comment

danpovey Feb 12, 2020

Choose a reason for hiding this comment

csukuangfj Feb 12, 2020

Choose a reason for hiding this comment

danpovey Feb 11, 2020

Choose a reason for hiding this comment

csukuangfj Feb 12, 2020

Choose a reason for hiding this comment

csukuangfj commented Jan 31, 2020 •

edited

Loading

csukuangfj commented Jan 31, 2020 •

edited

Loading

fanlu commented Feb 10, 2020 •

edited

Loading

JiayiFu commented Feb 11, 2020 •

edited

Loading

fanlu commented Feb 11, 2020 •

edited

Loading