Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: add TDNNF to pytorch. #3892

Merged
merged 5 commits into from
Feb 11, 2020
Merged

Conversation

csukuangfj
Copy link
Contributor

We are trying to replace TDNN with TDNNF in kaldi pybind training with PyTorch.

@danpovey
Copy link
Contributor

Cool!
The orthonormalization is fairly important. It should probably be implemented as some kind of post-update-hook, not sure what those are called? Or as some kind of modification to the trainer, but I think post-update-hook would be easier.

@csukuangfj
Copy link
Contributor Author

I find kaldi invokes

// The following will only do something if we have a LinearComponent
// or AffineComponent with orthonormal-constraint set to a nonzero value.
ConstrainOrthonormal(nnet_);

right after the update of parameters.

I am going to invoke it after calling optimizer.step()

@danpovey
Copy link
Contributor

danpovey commented Jan 30, 2020 via email

@csukuangfj
Copy link
Contributor Author

csukuangfj commented Jan 31, 2020

decoding result for TDNNF is as follows:

TDNN (PyTorch) TDNNF (PyTorch) TDNNF (kaldi, tdnn_1c)
dev_cer 8.22 7.26 5.71
dev_wer 16.66 15.49 13.49
test_cer 9.98 9.21 6.65
test_wer 18.89 17.98 15.18

The first column is from
https://github.com/kaldi-asr/kaldi/blob/pybind11/egs/aishell/s10/local/run_chain.sh#L236

The second column is the result from this pullrequest.

The third column comes from #3868

The second column has greater number of layers and larger hidden dim than the first column.
I am not sure whether the improvement in cer/wer is due to factorized TDNN or the adoption of a larger network.

The second column has almost the same topology as the third column. The differences are

  • we use high-resolution MFCC (40 dim) + pitch (3 dim) = 43 dim features
  • we do not use GeneralDropoutComponent
  • we use [-1, 0, 1] for the orthonormalization layer

I am not sure whether the above differences cause inferior results for PyTorch.


Another difference is the alignment information:

@csukuangfj csukuangfj changed the title [WIP]: add TDNNF to pytorch. WIP: add TDNNF to pytorch. Jan 31, 2020
@csukuangfj
Copy link
Contributor Author

@danpovey

I have removed pitch and am running the training again. Now the feature part
in PyTorch is the same as kaldi's.

Regarding the [-1, 0, 1], TDNN networks use input=Append(-1,0,1) and I call it [-1, 0, 1].

As for the weight matrix M of a TDNN layer, this paper https://www.danielpovey.com/files/2018_interspeech_tdnnf.pdf splits
M into two parts: M = A B, i.e., splits one TDNN layer into two layers:

  • the first layer is a linear layer with weight matrix B, where B B^T == Identity. The input of this layer is [-1, 0]

  • the second layer is an affine layer with weight matrix A; the input of this layer is [0, 1]

In the PyTorch implementation, we use [-1, 0, 1] for the first linear layer and there is no splicing
in the second affine layer.


The above paper also proposes 3-stage splicing, i.e., inserting a 2x1 conv layer between
the first and the second layer. But I find that kaldi has not implemented 3-stage splicing.
I guess it is for computation efficiency reasons that kaldi does not implement it. If both3-stage splicing and frame subsampling factor == 3 are used, you have to perform computation at every layer for every frame.

@csukuangfj
Copy link
Contributor Author

csukuangfj commented Jan 31, 2020

after removing pitch, the result becomes a little worse

with pitch (PyTorch) without pitch (PyTorch)
dev_cer 7.26 7.52
dev_wer 15.49 15.70
test_cer 9.21 9.27
test_wer 17.98 18.04

The above table is copied here for better comparison

TDNN (PyTorch) TDNNF (PyTorch) TDNNF (kaldi, tdnn_1c)
dev_cer 8.22 7.26 5.71
dev_wer 16.66 15.49 13.49
test_cer 9.98 9.21 6.65
test_wer 18.89 17.98 15.18

@jtrmal
Copy link
Contributor

jtrmal commented Jan 31, 2020

@csukuangfj -- did you look at the likelihoods? I wonder if it is overtraining or undertraining?

@csukuangfj
Copy link
Contributor Author

@jtrmal
Did you mean the objective function value?

Part of the training log is as follows:

2020-01-31 16:17:23,226 INFO [train.py:185] epoch 0, learning rate 0.001
2020-01-31 16:17:23,536 INFO [train.py:102] Process 0/3161(0.000000%) global average objf: -1.195890 over 6400.0 frames, current batch average objf: -1.195890 over 6400 frames, epoch 0
2020-01-31 16:17:44,072 INFO [train.py:102] Process 100/3161(3.163556%) global average objf: -0.687263 over 573696.0 frames, current batch average objf: -0.457086 over 6400 frames, epoch 0
2020-01-31 16:18:04,479 INFO [train.py:102] Process 200/3161(6.327112%) global average objf: -0.535040 over 1138432.0 frames, current batch average objf: -0.338968 over 6400 frames, epoch 0
2020-01-31 16:18:24,999 INFO [train.py:102] Process 300/3161(9.490668%) global average objf: -0.453431 over 1704064.0 frames, current batch average objf: -0.261345 over 6400 frames, epoch 0
2020-01-31 16:18:45,192 INFO [train.py:102] Process 400/3161(12.654223%) global average objf: -0.402034 over 2267136.0 frames, current batch average objf: -0.242083 over 6400 frames, epoch 0
....
2020-01-31 17:21:32,249 INFO [train.py:102] Process 2800/3161(88.579563%) global average objf: -0.060549 over 15840896.0 frames, current batch average objf: -0.064717 over 3840 frames, epoch 5
2020-01-31 17:21:53,120 INFO [train.py:102] Process 2900/3161(91.743119%) global average objf: -0.060385 over 16406528.0 frames, current batch average objf: -0.066644 over 3840 frames, epoch 5
2020-01-31 17:22:14,151 INFO [train.py:102] Process 3000/3161(94.906675%) global average objf: -0.060270 over 16973824.0 frames, current batch average objf: -0.047593 over 6400 frames, epoch 5
2020-01-31 17:22:34,801 INFO [train.py:102] Process 3100/3161(98.070231%) global average objf: -0.060135 over 17539456.0 frames, current batch average objf: -0.050985 over 6400 frames, epoch 5

The screenshot of the tensorboard is

Screen Shot 2020-01-31 at 17 55 07

How can you tell whether it is underfitting or overfitting from the objective function value?

@jtrmal
Copy link
Contributor

jtrmal commented Jan 31, 2020 via email

@RuABraun
Copy link
Contributor

Could the difference by explained by the different optimizers (Adam v NSGD) ?

@danpovey
Copy link
Contributor

Let's keep the features the same for now while we work out the other differences.
There are likely quite a few differences and I want to add more diagnostics to the PyTorch setup to help track it down in more detail.

@jtrmal
Copy link
Contributor

jtrmal commented Jan 31, 2020 via email

@qindazhu
Copy link
Contributor

qindazhu commented Feb 1, 2020

Run tdnn_1c by removing DropoutComponent, the corresponding neural net config (replacing relu-batchnorm-dropout-layer with relu-batchnorm-layer, removing GeneralDropoutComponent in tdnnf) is as below.

  num_targets=$(tree-info $tree_dir/tree |grep num-pdfs|awk '{print $2}')
  learning_rate_factor=$(echo "print 0.5/$xent_regularize" | python)
  affine_opts="l2-regularize=0.008"
  tdnnf_opts="l2-regularize=0.008  bypass-scale=0.66"
  linear_opts="l2-regularize=0.008 orthonormal-constraint=-1.0"
  prefinal_opts="l2-regularize=0.008"
  output_opts="l2-regularize=0.002"

  input dim=40 name=input
  fixed-affine-layer name=lda input=Append(-1,0,1) affine-transform-file=$dir/configs/lda.mat
  relu-batchnorm-layer name=tdnn1 $affine_opts dim=1024
  tdnnf-layer name=tdnnf2 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=1
  tdnnf-layer name=tdnnf3 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=1
  tdnnf-layer name=tdnnf4 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=1
  tdnnf-layer name=tdnnf5 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=0
  tdnnf-layer name=tdnnf6 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3
  tdnnf-layer name=tdnnf7 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3
  tdnnf-layer name=tdnnf8 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3
  tdnnf-layer name=tdnnf9 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3
  tdnnf-layer name=tdnnf10 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3
  tdnnf-layer name=tdnnf11 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3
  tdnnf-layer name=tdnnf12 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3
  tdnnf-layer name=tdnnf13 $tdnnf_opts dim=1024 bottleneck-dim=128 time-stride=3
  linear-component name=prefinal-l dim=256 $linear_opts

  prefinal-layer name=prefinal-chain input=prefinal-l $prefinal_opts big-dim=1024 small-dim=256
  output-layer name=output include-log-softmax=false dim=$num_targets $output_opts

  prefinal-layer name=prefinal-xent input=prefinal-l $prefinal_opts big-dim=1024 small-dim=256
  output-layer name=output-xent dim=$num_targets learning-rate-factor=$learning_rate_factor $output_opts

Result

==> exp/chain_cleaned_1c_without_dropout/tdnn1c_sp/decode_test/scoring_kaldi/best_cer <==
%WER 6.62 [ 6935 / 104765, 149 ins, 264 del, 6522 sub ] exp/chain_cleaned_1c_without_dropout/tdnn1c_sp/decode_test/cer_12_1.0

==> exp/chain_cleaned_1c_without_dropout/tdnn1c_sp/decode_test/scoring_kaldi/best_wer <==
%WER 15.18 [ 9782 / 64428, 1010 ins, 1290 del, 7482 sub ] exp/chain_cleaned_1c_without_dropout/tdnn1c_sp/decode_test/wer_14_0.0

==> exp/chain_cleaned_1c_without_dropout/tdnn1c_sp/decode_dev/scoring_kaldi/best_cer <==
%WER 5.66 [ 11626 / 205341, 236 ins, 370 del, 11020 sub ] exp/chain_cleaned_1c_without_dropout/tdnn1c_sp/decode_dev/cer_11_0.5

==> exp/chain_cleaned_1c_without_dropout/tdnn1c_sp/decode_dev/scoring_kaldi/best_wer <==
%WER 13.41 [ 17120 / 127698, 1542 ins, 2463 del, 13115 sub ] exp/chain_cleaned_1c_without_dropout/tdnn1c_sp/decode_dev/wer_12_0.0

All result until now

TDNN

TDNN(Pytorch) tdnn_1b(Kaldi)
dev_cer 8.22 7.06
dev_wer 16.66 15.11
test_cer 9.98 8.63
test_wer 18.89 17.40

Both of them hold same config with

  input dim=$feat_dim name=input
  fixed-affine-layer name=lda input=Append(-1,0,1) affine-transform-file=$dir/configs/lda.mat
  relu-batchnorm-layer name=tdnn1 dim=625
  relu-batchnorm-layer name=tdnn2 input=Append(-1,0,1) dim=625
  relu-batchnorm-layer name=tdnn3 input=Append(-1,0,1) dim=625
  relu-batchnorm-layer name=tdnn4 input=Append(-3,0,3) dim=625
  relu-batchnorm-layer name=tdnn5 input=Append(-3,0,3) dim=625
  relu-batchnorm-layer name=tdnn6 input=Append(-3,0,3) dim=625
  relu-batchnorm-layer name=prefinal-chain input=tdnn6 dim=625 target-rms=0.5
  output-layer name=output include-log-softmax=false dim=$num_targets max-change=1.5

TDNN-F

TDNN-F(Pytorch) tdnn_1c_r_d(Kaldi) tdnn_1c(Kaldi) tdnn_1d(Kaldi)
dev_cer 7.26 5.66 5.71 5.51
dev_wer 15.49 13.41 13.49 13.19
test_cer 9.21 6.62 6.65 6.46
test_wer 17.98 15.18 15.18 14.91

They hold the same TDNN-F config at the top, the difference is:

  • TDNN-F(Pytorch)
    without i-vector, without dropout, with pitch (a LITTLE better than the version without pitch), using [-1, 0, 1] for the orthonormalization layer. See above comments from @csukuangfj for more details.
  • tdnn_1c_r_d
    without i-vector, without dropout. BTW, r in the name means removed, d means dropout
  • tdnn_1c
    without i-vector, with dropout
  • tdnn_1d
    with i-vector, with dropout.

It seems that dropout will not make a difference on this dataset(aishell).

@csukuangfj
Copy link
Contributor Author

@qindazhu thanks.
tdnnf-layer is expanded to contain GeneralDropoutComponent.
You can refer to exp/chain_cleaned_1c/tdnn1c_sp/configs/ref.config.

@qindazhu
Copy link
Contributor

qindazhu commented Feb 1, 2020

@qindazhu thanks.
tdnnf-layer is expanded to contain GeneralDropoutComponent.
You can refer to exp/chain_cleaned_1c/tdnn1c_sp/configs/ref.config.

NO, it will not. if you leave parameter dropout-proportion with -1(default value), the result config will not include Dropout.

ref.config of tdnn_1c

component name=tdnnf2.noop type=NoOpComponent dim=1024
component-node name=tdnnf2.noop component=tdnnf2.noop input=Sum(Scale(0.66, tdnn1.dropout), tdnnf2.dropout)
component name=tdnnf3.linear type=TdnnComponent input-dim=1024 output-dim=128 l2-regularize=0.008 max-change=0.75 use-bias=false time-offsets=-1,0 orthonormal-constraint=-1.0
component-node name=tdnnf3.linear component=tdnnf3.linear input=tdnnf2.noop
component name=tdnnf3.affine type=TdnnComponent input-dim=128 output-dim=1024 l2-regularize=0.008 max-change=0.75 time-offsets=0,1
component-node name=tdnnf3.affine component=tdnnf3.affine input=tdnnf3.linear
component name=tdnnf3.relu type=RectifiedLinearComponent dim=1024 self-repair-scale=1e-05
component-node name=tdnnf3.relu component=tdnnf3.relu input=tdnnf3.affine
component name=tdnnf3.batchnorm type=BatchNormComponent dim=1024
component-node name=tdnnf3.batchnorm component=tdnnf3.batchnorm input=tdnnf3.relu
component name=tdnnf3.dropout type=GeneralDropoutComponent dim=1024 dropout-proportion=0.0 continuous=true
component-node name=tdnnf3.dropout component=tdnnf3.dropout input=tdnnf3.batchnorm
component name=tdnnf3.noop type=NoOpComponent dim=1024

ref.config of tdnn_1c_r_d

component-node name=tdnnf2.noop component=tdnnf2.noop input=Sum(Scale(0.66, tdnn1.batchnorm), tdnnf2.batchnorm)
component name=tdnnf3.linear type=TdnnComponent input-dim=1024 output-dim=128 l2-regularize=0.008 max-change=0.75 use-bias=false time-offsets=-1,0 orthonormal-constraint=-1.0
component-node name=tdnnf3.linear component=tdnnf3.linear input=tdnnf2.noop
component name=tdnnf3.affine type=TdnnComponent input-dim=128 output-dim=1024 l2-regularize=0.008 max-change=0.75 time-offsets=0,1
component-node name=tdnnf3.affine component=tdnnf3.affine input=tdnnf3.linear
component name=tdnnf3.relu type=RectifiedLinearComponent dim=1024 self-repair-scale=1e-05
component-node name=tdnnf3.relu component=tdnnf3.relu input=tdnnf3.affine
component name=tdnnf3.batchnorm type=BatchNormComponent dim=1024
component-node name=tdnnf3.batchnorm component=tdnnf3.batchnorm input=tdnnf3.relu
component name=tdnnf3.noop type=NoOpComponent dim=1024
component-node name=tdnnf3.noop component=tdnnf3.noop input=Sum(Scale(0.66, tdnnf2.noop), tdnnf3.batchnorm)
component name=tdnnf4.linear type=TdnnComponent input-dim=1024 output-dim=128 l2-regularize=0.008 max-change=0

@csukuangfj
Copy link
Contributor Author

I see.

@danpovey
Copy link
Contributor

danpovey commented Feb 1, 2020 via email

@csukuangfj
Copy link
Contributor Author

@qindazhu
could you turn off max-change and natural gradient optimizer ?

@csukuangfj
Copy link
Contributor Author

By the way, PyTorch is significantly faster than kaldi.

It took about 1 hour in total for 6 epochs in the current pullrequest.

@fanlu reported in this pullrequest that kaldi took about 4 hours in total for 6 epochs.

@danpovey
Copy link
Contributor

danpovey commented Feb 4, 2020 via email

@csukuangfj
Copy link
Contributor Author

thanks a lot.

@danpovey
Copy link
Contributor

danpovey commented Feb 4, 2020 via email

@csukuangfj
Copy link
Contributor Author

Sure, I will draw the L2 norm of all the weight matrices with tensorboard.

@fanlu
Copy link

fanlu commented Feb 10, 2020

I have changed the model structure and forward function, And the result is

TDNN(Pytorch)
dev_cer 6.67
dev_wer 14.72
test_cer 8.38
test_wer 17.08

But It's slower than before, It's take about 4 hours 20minutes.
And the parameters' shape is like kaldi's

2020-02-10 19:41:18,662 INFO [train4.py:201] name: module.tdnn1_affine.weight, shape: torch.Size([1024, 129])
2020-02-10 19:41:18,662 INFO [train4.py:201] name: module.tdnn1_affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,663 INFO [train4.py:201] name: module.tdnn1_batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,663 INFO [train4.py:201] name: module.tdnn1_batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,663 INFO [train4.py:201] name: module.tdnnfs.0.linear.conv.weight, shape: torch.Size([128, 1024, 2])
2020-02-10 19:41:18,663 INFO [train4.py:201] name: module.tdnnfs.0.affine.weight, shape: torch.Size([1024, 128, 2])
2020-02-10 19:41:18,664 INFO [train4.py:201] name: module.tdnnfs.0.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,664 INFO [train4.py:201] name: module.tdnnfs.0.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,664 INFO [train4.py:201] name: module.tdnnfs.0.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,664 INFO [train4.py:201] name: module.tdnnfs.1.linear.conv.weight, shape: torch.Size([128, 1024, 2])
2020-02-10 19:41:18,665 INFO [train4.py:201] name: module.tdnnfs.1.affine.weight, shape: torch.Size([1024, 128, 2])
2020-02-10 19:41:18,665 INFO [train4.py:201] name: module.tdnnfs.1.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,665 INFO [train4.py:201] name: module.tdnnfs.1.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,665 INFO [train4.py:201] name: module.tdnnfs.1.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,666 INFO [train4.py:201] name: module.tdnnfs.2.linear.conv.weight, shape: torch.Size([128, 1024, 2])
2020-02-10 19:41:18,666 INFO [train4.py:201] name: module.tdnnfs.2.affine.weight, shape: torch.Size([1024, 128, 2])
2020-02-10 19:41:18,666 INFO [train4.py:201] name: module.tdnnfs.2.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,666 INFO [train4.py:201] name: module.tdnnfs.2.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,667 INFO [train4.py:201] name: module.tdnnfs.2.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,667 INFO [train4.py:201] name: module.tdnnfs.3.linear.conv.weight, shape: torch.Size([128, 1024, 1])
2020-02-10 19:41:18,667 INFO [train4.py:201] name: module.tdnnfs.3.affine.weight, shape: torch.Size([1024, 128, 1])
2020-02-10 19:41:18,667 INFO [train4.py:201] name: module.tdnnfs.3.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,668 INFO [train4.py:201] name: module.tdnnfs.3.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,668 INFO [train4.py:201] name: module.tdnnfs.3.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,668 INFO [train4.py:201] name: module.tdnnfs.4.linear.conv.weight, shape: torch.Size([128, 1024, 2])
2020-02-10 19:41:18,668 INFO [train4.py:201] name: module.tdnnfs.4.affine.weight, shape: torch.Size([1024, 128, 2])
2020-02-10 19:41:18,668 INFO [train4.py:201] name: module.tdnnfs.4.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,669 INFO [train4.py:201] name: module.tdnnfs.4.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,669 INFO [train4.py:201] name: module.tdnnfs.4.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,669 INFO [train4.py:201] name: module.tdnnfs.5.linear.conv.weight, shape: torch.Size([128, 1024, 2])
2020-02-10 19:41:18,669 INFO [train4.py:201] name: module.tdnnfs.5.affine.weight, shape: torch.Size([1024, 128, 2])
2020-02-10 19:41:18,670 INFO [train4.py:201] name: module.tdnnfs.5.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,670 INFO [train4.py:201] name: module.tdnnfs.5.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,670 INFO [train4.py:201] name: module.tdnnfs.5.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,670 INFO [train4.py:201] name: module.tdnnfs.6.linear.conv.weight, shape: torch.Size([128, 1024, 2])
2020-02-10 19:41:18,671 INFO [train4.py:201] name: module.tdnnfs.6.affine.weight, shape: torch.Size([1024, 128, 2])
2020-02-10 19:41:18,671 INFO [train4.py:201] name: module.tdnnfs.6.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,671 INFO [train4.py:201] name: module.tdnnfs.6.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,672 INFO [train4.py:201] name: module.tdnnfs.6.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,672 INFO [train4.py:201] name: module.tdnnfs.7.linear.conv.weight, shape: torch.Size([128, 1024, 2])
2020-02-10 19:41:18,672 INFO [train4.py:201] name: module.tdnnfs.7.affine.weight, shape: torch.Size([1024, 128, 2])
2020-02-10 19:41:18,673 INFO [train4.py:201] name: module.tdnnfs.7.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,673 INFO [train4.py:201] name: module.tdnnfs.7.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,673 INFO [train4.py:201] name: module.tdnnfs.7.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,673 INFO [train4.py:201] name: module.tdnnfs.8.linear.conv.weight, shape: torch.Size([128, 1024, 2])
2020-02-10 19:41:18,674 INFO [train4.py:201] name: module.tdnnfs.8.affine.weight, shape: torch.Size([1024, 128, 2])
2020-02-10 19:41:18,674 INFO [train4.py:201] name: module.tdnnfs.8.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,674 INFO [train4.py:201] name: module.tdnnfs.8.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,674 INFO [train4.py:201] name: module.tdnnfs.8.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,674 INFO [train4.py:201] name: module.tdnnfs.9.linear.conv.weight, shape: torch.Size([128, 1024, 2])
2020-02-10 19:41:18,675 INFO [train4.py:201] name: module.tdnnfs.9.affine.weight, shape: torch.Size([1024, 128, 2])
2020-02-10 19:41:18,675 INFO [train4.py:201] name: module.tdnnfs.9.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,675 INFO [train4.py:201] name: module.tdnnfs.9.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,675 INFO [train4.py:201] name: module.tdnnfs.9.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,676 INFO [train4.py:201] name: module.tdnnfs.10.linear.conv.weight, shape: torch.Size([128, 1024, 2])
2020-02-10 19:41:18,676 INFO [train4.py:201] name: module.tdnnfs.10.affine.weight, shape: torch.Size([1024, 128, 2])
2020-02-10 19:41:18,676 INFO [train4.py:201] name: module.tdnnfs.10.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,676 INFO [train4.py:201] name: module.tdnnfs.10.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,677 INFO [train4.py:201] name: module.tdnnfs.10.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,677 INFO [train4.py:201] name: module.tdnnfs.11.linear.conv.weight, shape: torch.Size([128, 1024, 2])
2020-02-10 19:41:18,677 INFO [train4.py:201] name: module.tdnnfs.11.affine.weight, shape: torch.Size([1024, 128, 2])
2020-02-10 19:41:18,677 INFO [train4.py:201] name: module.tdnnfs.11.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,678 INFO [train4.py:201] name: module.tdnnfs.11.batchnorm.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,678 INFO [train4.py:201] name: module.tdnnfs.11.batchnorm.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,678 INFO [train4.py:201] name: module.prefinal_l.conv.weight, shape: torch.Size([256, 1024, 1])
2020-02-10 19:41:18,678 INFO [train4.py:201] name: module.prefinal_chain.affine.weight, shape: torch.Size([1024, 256])
2020-02-10 19:41:18,679 INFO [train4.py:201] name: module.prefinal_chain.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,679 INFO [train4.py:201] name: module.prefinal_chain.batchnorm1.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,679 INFO [train4.py:201] name: module.prefinal_chain.batchnorm1.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,679 INFO [train4.py:201] name: module.prefinal_chain.linear.conv.weight, shape: torch.Size([256, 1024, 1])
2020-02-10 19:41:18,680 INFO [train4.py:201] name: module.prefinal_chain.batchnorm2.weight, shape: torch.Size([256])
2020-02-10 19:41:18,680 INFO [train4.py:201] name: module.prefinal_chain.batchnorm2.bias, shape: torch.Size([256])
2020-02-10 19:41:18,680 INFO [train4.py:201] name: module.output_affine.weight, shape: torch.Size([4336, 256])
2020-02-10 19:41:18,681 INFO [train4.py:201] name: module.output_affine.bias, shape: torch.Size([4336])
2020-02-10 19:41:18,681 INFO [train4.py:201] name: module.prefinal_xent.affine.weight, shape: torch.Size([1024, 256])
2020-02-10 19:41:18,681 INFO [train4.py:201] name: module.prefinal_xent.affine.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,681 INFO [train4.py:201] name: module.prefinal_xent.batchnorm1.weight, shape: torch.Size([1024])
2020-02-10 19:41:18,682 INFO [train4.py:201] name: module.prefinal_xent.batchnorm1.bias, shape: torch.Size([1024])
2020-02-10 19:41:18,682 INFO [train4.py:201] name: module.prefinal_xent.linear.conv.weight, shape: torch.Size([256, 1024, 1])
2020-02-10 19:41:18,682 INFO [train4.py:201] name: module.prefinal_xent.batchnorm2.weight, shape: torch.Size([256])
2020-02-10 19:41:18,682 INFO [train4.py:201] name: module.prefinal_xent.batchnorm2.bias, shape: torch.Size([256])
2020-02-10 19:41:18,683 INFO [train4.py:201] name: module.output_xent_affine.weight, shape: torch.Size([4336, 256])
2020-02-10 19:41:18,683 INFO [train4.py:201] name: module.output_xent_affine.bias, shape: torch.Size([4336])
2020-02-10 19:41:18,683 INFO [train4.py:201] name: module.input_batch_norm.weight, shape: torch.Size([129])
2020-02-10 19:41:18,684 INFO [train4.py:201] name: module.input_batch_norm.bias, shape: torch.Size([129])

@danpovey
Copy link
Contributor

Cool! So getting closer. The l2 norm of the parameter matrices, compared with Kaldi's, may tell us what's going on with the optimization and help tune learning rates etc.

@fanlu
Copy link

fanlu commented Feb 10, 2020

I have drawed the distribution and histogram of parameters.
eg:
image

image

image

which layer should I focus on ? And Is there a tool to get l2 norm of kaldi's parameter?

@fanlu
Copy link

fanlu commented Feb 10, 2020

Hi, @csukuangfj
When I use DataParallel to training tdnnf model on multi gpu

model = torch.nn.DataParallel(model.cuda(), device_ids=list(range(args.ngpu)))

I have got an error when criterion called

nnet_output = kaldi.PytorchToCuSubMatrix(to_dlpack(nnet_output_tensor))

and the error msg is below:

ASSERTION_FAILED ([5.5.717~1-e05890d]:ConsumeDLManagedTensor():dlpack/dlpack_pybind.cc:129) Assertion failed: (ctx->device_id == device_id)

should we specify fixed device_id in this function?

@csukuangfj
Copy link
Contributor Author

csukuangfj commented Feb 11, 2020 via email

@csukuangfj
Copy link
Contributor Author

csukuangfj commented Feb 11, 2020 via email

@fanlu
Copy link

fanlu commented Feb 11, 2020

Ok, I'll try
there is another error when data with speed perturb and mfcc_hires feature training
or I change the weight_decay of adam optimizer from 5e-4 to 8e-3 that is kaldi's default config
there is error msg below.

Traceback (most recent call last):
  File "./chain/train3.py", line 339, in <module>
    main()
  File "./chain/train3.py", line 268, in main
    tf_writer=tf_writer)
  File "./chain/train3.py", line 109, in train_one_epoch
    model.constraint_orthonormal()
  File "/mnt/cfs1_alias1/asr/users/fanlu/task/kaldi_recipe/pybind/s10/chain/model_tdnnf3.py", line 202, in constraint_orthonormal
    self.tdnnfs[i].constraint_orthonormal()
  File "/mnt/cfs1_alias1/asr/users/fanlu/task/kaldi_recipe/pybind/s10/chain/tdnnf_layer2.py", line 221, in constraint_orthonormal
    self.linear.constraint_orthonormal()
  File "/mnt/cfs1_alias1/asr/users/fanlu/task/kaldi_recipe/pybind/s10/chain/tdnnf_layer2.py", line 93, in constraint_orthonormal
    w = _constraint_orthonormal_internal(w)
  File "/mnt/cfs1_alias1/asr/users/fanlu/task/kaldi_recipe/pybind/s10/chain/tdnnf_layer2.py", line 37, in _constraint_orthonormal_internal
    assert ratio > 0.99
AssertionError

@danpovey
Copy link
Contributor

I have drawed the distribution and histogram of parameters.
eg:
image

image

image

which layer should I focus on ? And Is there a tool to get l2 norm of kaldi's parameter?

.. Regarding the weights: I don't really understand those plots, but I just wanted the 2-norm, which would be torch.sqrt((some_tensor ** 2).sum()), for each parameter. You might have to write a little code to get it.

@danpovey
Copy link
Contributor

Ok, I'll try
there is another error when data with speed perturb and mfcc_hires feature training
or I change the weight_decay of adam optimizer from 5e-4 to 8e-3 that is kaldi's default config
there is error msg below.

Traceback (most recent call last):
  File "./chain/train3.py", line 339, in <module>
    main()
  File "./chain/train3.py", line 268, in main
    tf_writer=tf_writer)
  File "./chain/train3.py", line 109, in train_one_epoch
    model.constraint_orthonormal()
  File "/mnt/cfs1_alias1/asr/users/fanlu/task/kaldi_recipe/pybind/s10/chain/model_tdnnf3.py", line 202, in constraint_orthonormal
    self.tdnnfs[i].constraint_orthonormal()
  File "/mnt/cfs1_alias1/asr/users/fanlu/task/kaldi_recipe/pybind/s10/chain/tdnnf_layer2.py", line 221, in constraint_orthonormal
    self.linear.constraint_orthonormal()
  File "/mnt/cfs1_alias1/asr/users/fanlu/task/kaldi_recipe/pybind/s10/chain/tdnnf_layer2.py", line 93, in constraint_orthonormal
    w = _constraint_orthonormal_internal(w)
  File "/mnt/cfs1_alias1/asr/users/fanlu/task/kaldi_recipe/pybind/s10/chain/tdnnf_layer2.py", line 37, in _constraint_orthonormal_internal
    assert ratio > 0.99
AssertionError

Regarding the weight decay: I would advise to just tune those separately. The constants are defined in quite different ways, and wouldn't even be comparable between adam and SGD, probably.

@csukuangfj
Copy link
Contributor Author

csukuangfj commented Feb 11, 2020 via email

@JiayiFu
Copy link

JiayiFu commented Feb 11, 2020

which layer should I focus on ? And Is there a tool to get l2 norm of kaldi's parameter?

Maybe this is a simple way: use kaldi tool: "nnet3-copy --binary=false final.mdl" to convert the mdl file to the text mode and then write a script a compute the 2-norm of weights.

@danpovey
Copy link
Contributor

danpovey commented Feb 11, 2020 via email

@fanlu
Copy link

fanlu commented Feb 11, 2020

I have used https://github.com/XiaoMi/kaldi-onnx.git and torch.norm np.linalg.norm to calculate the l2_norm
@danpovey please have a look and point me sth what you want next. thanks
This is the l2_norm log of kaldi tdnn_1c

tdnn_1c l2-norm
2020-02-11 14:44:31,355 __main__ INFO {'dim': '40', 'name': 'input', 'node_type': 'input-node', 'type': 'Input', 'id': 1}
2020-02-11 14:44:31,356 __main__ INFO {'id': 4, 'type': 'Splice', 'name': 'splice_4', 'input': ['input'], 'context': [-1, 0, 1]}
2020-02-11 14:44:31,372 __main__ INFO {'input': ['splice_4'], 'component': 'lda', 'name': 'lda', 'node_type': 'component-node', 'id': 5, 'params': (120, 120), 'bias': (120,), 'type': 'Gemm', 'raw-type': 'FixedAffine', 'params-l2-norm': 0.3751292, 'bias-l2-norm': 0.02903783}
2020-02-11 14:44:31,375 __main__ INFO {'input': ['lda'], 'component': 'tdnn1.affine', 'name': 'tdnn1.affine', 'node_type': 'component-node', 'id': 6, 'max_change': 0.75, 'params': (1024, 120), 'bias': (1024,), 'rank_in': 20, 'rank_out': 80, 'num_samples_history': 2000.0, 'alpha': 4.0, 'type': 'Gemm', 'raw-type': 'NaturalGradientAffine', 'params-l2-norm': 17.994331, 'bias-l2-norm': 2.1129487}
2020-02-11 14:44:31,375 __main__ INFO {'input': ['tdnn1.affine'], 'component': 'tdnn1.relu', 'name': 'tdnn1.relu', 'node_type': 'component-node', 'id': 7, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 71424.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 1.8933748, 'deriv_avg-l2-norm': 17.662828, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,375 __main__ INFO {'input': ['tdnn1.relu'], 'component': 'tdnn1.batchnorm', 'name': 'tdnn1.batchnorm', 'node_type': 'component-node', 'id': 8, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 179712.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 1.9120138, 'stats_var-l2-norm': 0.18313415}
2020-02-11 14:44:31,376 __main__ INFO {'input': ['tdnn1.batchnorm'], 'component': 'tdnn1.dropout', 'name': 'tdnn1.dropout', 'node_type': 'component-node', 'id': 9, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,376 __main__ INFO {'input': ['tdnn1.dropout'], 'component': 'tdnnf2.linear', 'name': 'tdnnf2.linear', 'node_type': 'component-node', 'id': 10, 'time_offsets': array([-1,  0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 14.65368}
2020-02-11 14:44:31,379 __main__ INFO {'input': ['tdnnf2.linear'], 'component': 'tdnnf2.affine', 'name': 'tdnnf2.affine', 'node_type': 'component-node', 'id': 11, 'time_offsets': array([0, 1]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 12.9894, 'bias-l2-norm': 2.456446}
2020-02-11 14:44:31,380 __main__ INFO {'input': ['tdnnf2.affine'], 'component': 'tdnnf2.relu', 'name': 'tdnnf2.relu', 'node_type': 'component-node', 'id': 12, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 60928.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 12.335568, 'deriv_avg-l2-norm': 15.625699, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,380 __main__ INFO {'input': ['tdnnf2.relu'], 'component': 'tdnnf2.batchnorm', 'name': 'tdnnf2.batchnorm', 'node_type': 'component-node', 'id': 13, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 177792.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 12.284391, 'stats_var-l2-norm': 12.800878}
2020-02-11 14:44:31,380 __main__ INFO {'input': ['tdnnf2.batchnorm'], 'component': 'tdnnf2.dropout', 'name': 'tdnnf2.dropout', 'node_type': 'component-node', 'id': 14, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,380 __main__ INFO {'id': 15, 'type': 'Scale', 'name': 'tdnn1.dropout.Scale.0.66', 'input': ['tdnn1.dropout'], 'scale': 0.66}
2020-02-11 14:44:31,380 __main__ INFO {'id': 16, 'type': 'Sum', 'name': 'tdnn1.dropout.Scale.0.66.Sum.tdnnf2.dropout', 'input': ['tdnn1.dropout.Scale.0.66', 'tdnnf2.dropout']}
2020-02-11 14:44:31,381 __main__ INFO {'input': ['tdnn1.dropout.Scale.0.66.Sum.tdnnf2.dropout'], 'component': 'tdnnf2.noop', 'name': 'tdnnf2.noop', 'node_type': 'component-node', 'id': 17, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,381 __main__ INFO {'input': ['tdnnf2.noop'], 'component': 'tdnnf3.linear', 'name': 'tdnnf3.linear', 'node_type': 'component-node', 'id': 18, 'time_offsets': array([-1,  0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 12.562767}
2020-02-11 14:44:31,382 __main__ INFO {'input': ['tdnnf3.linear'], 'component': 'tdnnf3.affine', 'name': 'tdnnf3.affine', 'node_type': 'component-node', 'id': 19, 'time_offsets': array([0, 1]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.749477, 'bias-l2-norm': 1.5780896}
2020-02-11 14:44:31,382 __main__ INFO {'input': ['tdnnf3.affine'], 'component': 'tdnnf3.relu', 'name': 'tdnnf3.relu', 'node_type': 'component-node', 'id': 20, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 105408.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 13.213889, 'deriv_avg-l2-norm': 15.905553, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,383 __main__ INFO {'input': ['tdnnf3.relu'], 'component': 'tdnnf3.batchnorm', 'name': 'tdnnf3.batchnorm', 'node_type': 'component-node', 'id': 21, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 175872.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 13.222022, 'stats_var-l2-norm': 13.43212}
2020-02-11 14:44:31,383 __main__ INFO {'input': ['tdnnf3.batchnorm'], 'component': 'tdnnf3.dropout', 'name': 'tdnnf3.dropout', 'node_type': 'component-node', 'id': 22, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,383 __main__ INFO {'id': 23, 'type': 'Scale', 'name': 'tdnnf2.noop.Scale.0.66', 'input': ['tdnnf2.noop'], 'scale': 0.66}
2020-02-11 14:44:31,383 __main__ INFO {'id': 24, 'type': 'Sum', 'name': 'tdnnf2.noop.Scale.0.66.Sum.tdnnf3.dropout', 'input': ['tdnnf2.noop.Scale.0.66', 'tdnnf3.dropout']}
2020-02-11 14:44:31,384 __main__ INFO {'input': ['tdnnf2.noop.Scale.0.66.Sum.tdnnf3.dropout'], 'component': 'tdnnf3.noop', 'name': 'tdnnf3.noop', 'node_type': 'component-node', 'id': 25, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,384 __main__ INFO {'input': ['tdnnf3.noop'], 'component': 'tdnnf4.linear', 'name': 'tdnnf4.linear', 'node_type': 'component-node', 'id': 26, 'time_offsets': array([-1,  0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 11.342851}
2020-02-11 14:44:31,385 __main__ INFO {'input': ['tdnnf4.linear'], 'component': 'tdnnf4.affine', 'name': 'tdnnf4.affine', 'node_type': 'component-node', 'id': 27, 'time_offsets': array([0, 1]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 9.7689705, 'bias-l2-norm': 1.2836239}
2020-02-11 14:44:31,385 __main__ INFO {'input': ['tdnnf4.affine'], 'component': 'tdnnf4.relu', 'name': 'tdnnf4.relu', 'node_type': 'component-node', 'id': 28, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 26880.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 12.693966, 'deriv_avg-l2-norm': 15.441769, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,386 __main__ INFO {'input': ['tdnnf4.relu'], 'component': 'tdnnf4.batchnorm', 'name': 'tdnnf4.batchnorm', 'node_type': 'component-node', 'id': 29, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 58624.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 12.727512, 'stats_var-l2-norm': 13.328432}
2020-02-11 14:44:31,386 __main__ INFO {'input': ['tdnnf4.batchnorm'], 'component': 'tdnnf4.dropout', 'name': 'tdnnf4.dropout', 'node_type': 'component-node', 'id': 30, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,386 __main__ INFO {'id': 31, 'type': 'Scale', 'name': 'tdnnf3.noop.Scale.0.66', 'input': ['tdnnf3.noop'], 'scale': 0.66}
2020-02-11 14:44:31,386 __main__ INFO {'id': 32, 'type': 'Sum', 'name': 'tdnnf3.noop.Scale.0.66.Sum.tdnnf4.dropout', 'input': ['tdnnf3.noop.Scale.0.66', 'tdnnf4.dropout']}
2020-02-11 14:44:31,386 __main__ INFO {'input': ['tdnnf3.noop.Scale.0.66.Sum.tdnnf4.dropout'], 'component': 'tdnnf4.noop', 'name': 'tdnnf4.noop', 'node_type': 'component-node', 'id': 33, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,387 __main__ INFO {'input': ['tdnnf4.noop'], 'component': 'tdnnf5.linear', 'name': 'tdnnf5.linear', 'node_type': 'component-node', 'id': 34, 'time_offsets': array([0]), 'params': (128, 1024), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 8.509507}
2020-02-11 14:44:31,387 __main__ INFO {'input': ['tdnnf5.linear'], 'component': 'tdnnf5.affine', 'name': 'tdnnf5.affine', 'node_type': 'component-node', 'id': 35, 'time_offsets': array([0]), 'params': (1024, 128), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 7.895306, 'bias-l2-norm': 1.8074945}
2020-02-11 14:44:31,388 __main__ INFO {'input': ['tdnnf5.affine'], 'component': 'tdnnf5.relu', 'name': 'tdnnf5.relu', 'node_type': 'component-node', 'id': 36, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 35328.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 6.929157, 'deriv_avg-l2-norm': 13.615622, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,388 __main__ INFO {'input': ['tdnnf5.relu'], 'component': 'tdnnf5.batchnorm', 'name': 'tdnnf5.batchnorm', 'node_type': 'component-node', 'id': 37, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 58624.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 6.9155283, 'stats_var-l2-norm': 4.8078284}
2020-02-11 14:44:31,389 __main__ INFO {'input': ['tdnnf5.batchnorm'], 'component': 'tdnnf5.dropout', 'name': 'tdnnf5.dropout', 'node_type': 'component-node', 'id': 38, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,389 __main__ INFO {'id': 39, 'type': 'Scale', 'name': 'tdnnf4.noop.Scale.0.66', 'input': ['tdnnf4.noop'], 'scale': 0.66}
2020-02-11 14:44:31,389 __main__ INFO {'id': 40, 'type': 'Sum', 'name': 'tdnnf4.noop.Scale.0.66.Sum.tdnnf5.dropout', 'input': ['tdnnf4.noop.Scale.0.66', 'tdnnf5.dropout']}
2020-02-11 14:44:31,389 __main__ INFO {'input': ['tdnnf4.noop.Scale.0.66.Sum.tdnnf5.dropout'], 'component': 'tdnnf5.noop', 'name': 'tdnnf5.noop', 'node_type': 'component-node', 'id': 41, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,390 __main__ INFO {'input': ['tdnnf5.noop'], 'component': 'tdnnf6.linear', 'name': 'tdnnf6.linear', 'node_type': 'component-node', 'id': 42, 'time_offsets': array([-3,  0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 11.609792}
2020-02-11 14:44:31,390 __main__ INFO {'input': ['tdnnf6.linear'], 'component': 'tdnnf6.affine', 'name': 'tdnnf6.affine', 'node_type': 'component-node', 'id': 43, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.078222, 'bias-l2-norm': 1.6047498}
2020-02-11 14:44:31,391 __main__ INFO {'input': ['tdnnf6.affine'], 'component': 'tdnnf6.relu', 'name': 'tdnnf6.relu', 'node_type': 'component-node', 'id': 44, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 18624.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 11.288168, 'deriv_avg-l2-norm': 15.204925, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,391 __main__ INFO {'input': ['tdnnf6.relu'], 'component': 'tdnnf6.batchnorm', 'name': 'tdnnf6.batchnorm', 'node_type': 'component-node', 'id': 45, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 56704.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 11.27561, 'stats_var-l2-norm': 10.5281}
2020-02-11 14:44:31,391 __main__ INFO {'input': ['tdnnf6.batchnorm'], 'component': 'tdnnf6.dropout', 'name': 'tdnnf6.dropout', 'node_type': 'component-node', 'id': 46, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,392 __main__ INFO {'id': 47, 'type': 'Scale', 'name': 'tdnnf5.noop.Scale.0.66', 'input': ['tdnnf5.noop'], 'scale': 0.66}
2020-02-11 14:44:31,392 __main__ INFO {'id': 48, 'type': 'Sum', 'name': 'tdnnf5.noop.Scale.0.66.Sum.tdnnf6.dropout', 'input': ['tdnnf5.noop.Scale.0.66', 'tdnnf6.dropout']}
2020-02-11 14:44:31,392 __main__ INFO {'input': ['tdnnf5.noop.Scale.0.66.Sum.tdnnf6.dropout'], 'component': 'tdnnf6.noop', 'name': 'tdnnf6.noop', 'node_type': 'component-node', 'id': 49, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,392 __main__ INFO {'input': ['tdnnf6.noop'], 'component': 'tdnnf7.linear', 'name': 'tdnnf7.linear', 'node_type': 'component-node', 'id': 50, 'time_offsets': array([-3,  0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 11.21937}
2020-02-11 14:44:31,393 __main__ INFO {'input': ['tdnnf7.linear'], 'component': 'tdnnf7.affine', 'name': 'tdnnf7.affine', 'node_type': 'component-node', 'id': 51, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 9.733917, 'bias-l2-norm': 1.7020972}
2020-02-11 14:44:31,394 __main__ INFO {'input': ['tdnnf7.affine'], 'component': 'tdnnf7.relu', 'name': 'tdnnf7.relu', 'node_type': 'component-node', 'id': 52, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 25920.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 9.964567, 'deriv_avg-l2-norm': 14.876908, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,394 __main__ INFO {'input': ['tdnnf7.relu'], 'component': 'tdnnf7.batchnorm', 'name': 'tdnnf7.batchnorm', 'node_type': 'component-node', 'id': 53, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 54784.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 9.983768, 'stats_var-l2-norm': 8.700165}
2020-02-11 14:44:31,394 __main__ INFO {'input': ['tdnnf7.batchnorm'], 'component': 'tdnnf7.dropout', 'name': 'tdnnf7.dropout', 'node_type': 'component-node', 'id': 54, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,394 __main__ INFO {'id': 55, 'type': 'Scale', 'name': 'tdnnf6.noop.Scale.0.66', 'input': ['tdnnf6.noop'], 'scale': 0.66}
2020-02-11 14:44:31,395 __main__ INFO {'id': 56, 'type': 'Sum', 'name': 'tdnnf6.noop.Scale.0.66.Sum.tdnnf7.dropout', 'input': ['tdnnf6.noop.Scale.0.66', 'tdnnf7.dropout']}
2020-02-11 14:44:31,395 __main__ INFO {'input': ['tdnnf6.noop.Scale.0.66.Sum.tdnnf7.dropout'], 'component': 'tdnnf7.noop', 'name': 'tdnnf7.noop', 'node_type': 'component-node', 'id': 57, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,395 __main__ INFO {'input': ['tdnnf7.noop'], 'component': 'tdnnf8.linear', 'name': 'tdnnf8.linear', 'node_type': 'component-node', 'id': 58, 'time_offsets': array([-3,  0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.982727}
2020-02-11 14:44:31,396 __main__ INFO {'input': ['tdnnf8.linear'], 'component': 'tdnnf8.affine', 'name': 'tdnnf8.affine', 'node_type': 'component-node', 'id': 59, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 9.5415945, 'bias-l2-norm': 1.716922}
2020-02-11 14:44:31,397 __main__ INFO {'input': ['tdnnf8.affine'], 'component': 'tdnnf8.relu', 'name': 'tdnnf8.relu', 'node_type': 'component-node', 'id': 60, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 22208.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 9.486838, 'deriv_avg-l2-norm': 14.476424, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,397 __main__ INFO {'input': ['tdnnf8.relu'], 'component': 'tdnnf8.batchnorm', 'name': 'tdnnf8.batchnorm', 'node_type': 'component-node', 'id': 61, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 52864.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 9.471879, 'stats_var-l2-norm': 8.468856}
2020-02-11 14:44:31,397 __main__ INFO {'input': ['tdnnf8.batchnorm'], 'component': 'tdnnf8.dropout', 'name': 'tdnnf8.dropout', 'node_type': 'component-node', 'id': 62, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,397 __main__ INFO {'id': 63, 'type': 'Scale', 'name': 'tdnnf7.noop.Scale.0.66', 'input': ['tdnnf7.noop'], 'scale': 0.66}
2020-02-11 14:44:31,398 __main__ INFO {'id': 64, 'type': 'Sum', 'name': 'tdnnf7.noop.Scale.0.66.Sum.tdnnf8.dropout', 'input': ['tdnnf7.noop.Scale.0.66', 'tdnnf8.dropout']}
2020-02-11 14:44:31,398 __main__ INFO {'input': ['tdnnf7.noop.Scale.0.66.Sum.tdnnf8.dropout'], 'component': 'tdnnf8.noop', 'name': 'tdnnf8.noop', 'node_type': 'component-node', 'id': 65, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,398 __main__ INFO {'input': ['tdnnf8.noop'], 'component': 'tdnnf9.linear', 'name': 'tdnnf9.linear', 'node_type': 'component-node', 'id': 66, 'time_offsets': array([-3,  0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.8021345}
2020-02-11 14:44:31,399 __main__ INFO {'input': ['tdnnf9.linear'], 'component': 'tdnnf9.affine', 'name': 'tdnnf9.affine', 'node_type': 'component-node', 'id': 67, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 9.299649, 'bias-l2-norm': 1.5847737}
2020-02-11 14:44:31,399 __main__ INFO {'input': ['tdnnf9.affine'], 'component': 'tdnnf9.relu', 'name': 'tdnnf9.relu', 'node_type': 'component-node', 'id': 68, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 23296.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 9.041475, 'deriv_avg-l2-norm': 14.187531, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,400 __main__ INFO {'input': ['tdnnf9.relu'], 'component': 'tdnnf9.batchnorm', 'name': 'tdnnf9.batchnorm', 'node_type': 'component-node', 'id': 69, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 50944.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 9.050355, 'stats_var-l2-norm': 8.135726}
2020-02-11 14:44:31,400 __main__ INFO {'input': ['tdnnf9.batchnorm'], 'component': 'tdnnf9.dropout', 'name': 'tdnnf9.dropout', 'node_type': 'component-node', 'id': 70, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,400 __main__ INFO {'id': 71, 'type': 'Scale', 'name': 'tdnnf8.noop.Scale.0.66', 'input': ['tdnnf8.noop'], 'scale': 0.66}
2020-02-11 14:44:31,400 __main__ INFO {'id': 72, 'type': 'Sum', 'name': 'tdnnf8.noop.Scale.0.66.Sum.tdnnf9.dropout', 'input': ['tdnnf8.noop.Scale.0.66', 'tdnnf9.dropout']}
2020-02-11 14:44:31,401 __main__ INFO {'input': ['tdnnf8.noop.Scale.0.66.Sum.tdnnf9.dropout'], 'component': 'tdnnf9.noop', 'name': 'tdnnf9.noop', 'node_type': 'component-node', 'id': 73, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,401 __main__ INFO {'input': ['tdnnf9.noop'], 'component': 'tdnnf10.linear', 'name': 'tdnnf10.linear', 'node_type': 'component-node', 'id': 74, 'time_offsets': array([-3,  0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.755399}
2020-02-11 14:44:31,402 __main__ INFO {'input': ['tdnnf10.linear'], 'component': 'tdnnf10.affine', 'name': 'tdnnf10.affine', 'node_type': 'component-node', 'id': 75, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 9.086278, 'bias-l2-norm': 1.3546987}
2020-02-11 14:44:31,402 __main__ INFO {'input': ['tdnnf10.affine'], 'component': 'tdnnf10.relu', 'name': 'tdnnf10.relu', 'node_type': 'component-node', 'id': 76, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 23232.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 9.134641, 'deriv_avg-l2-norm': 13.997164, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,402 __main__ INFO {'input': ['tdnnf10.relu'], 'component': 'tdnnf10.batchnorm', 'name': 'tdnnf10.batchnorm', 'node_type': 'component-node', 'id': 77, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 49024.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 9.114295, 'stats_var-l2-norm': 8.443671}
2020-02-11 14:44:31,403 __main__ INFO {'input': ['tdnnf10.batchnorm'], 'component': 'tdnnf10.dropout', 'name': 'tdnnf10.dropout', 'node_type': 'component-node', 'id': 78, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,403 __main__ INFO {'id': 79, 'type': 'Scale', 'name': 'tdnnf9.noop.Scale.0.66', 'input': ['tdnnf9.noop'], 'scale': 0.66}
2020-02-11 14:44:31,403 __main__ INFO {'id': 80, 'type': 'Sum', 'name': 'tdnnf9.noop.Scale.0.66.Sum.tdnnf10.dropout', 'input': ['tdnnf9.noop.Scale.0.66', 'tdnnf10.dropout']}
2020-02-11 14:44:31,403 __main__ INFO {'input': ['tdnnf9.noop.Scale.0.66.Sum.tdnnf10.dropout'], 'component': 'tdnnf10.noop', 'name': 'tdnnf10.noop', 'node_type': 'component-node', 'id': 81, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,404 __main__ INFO {'input': ['tdnnf10.noop'], 'component': 'tdnnf11.linear', 'name': 'tdnnf11.linear', 'node_type': 'component-node', 'id': 82, 'time_offsets': array([-3,  0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.697865}
2020-02-11 14:44:31,404 __main__ INFO {'input': ['tdnnf11.linear'], 'component': 'tdnnf11.affine', 'name': 'tdnnf11.affine', 'node_type': 'component-node', 'id': 83, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 8.883663, 'bias-l2-norm': 1.2464023}
2020-02-11 14:44:31,405 __main__ INFO {'input': ['tdnnf11.affine'], 'component': 'tdnnf11.relu', 'name': 'tdnnf11.relu', 'node_type': 'component-node', 'id': 84, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 31680.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 8.557134, 'deriv_avg-l2-norm': 13.096737, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,405 __main__ INFO {'input': ['tdnnf11.relu'], 'component': 'tdnnf11.batchnorm', 'name': 'tdnnf11.batchnorm', 'node_type': 'component-node', 'id': 85, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 47104.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 8.5475025, 'stats_var-l2-norm': 8.175571}
2020-02-11 14:44:31,405 __main__ INFO {'input': ['tdnnf11.batchnorm'], 'component': 'tdnnf11.dropout', 'name': 'tdnnf11.dropout', 'node_type': 'component-node', 'id': 86, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,406 __main__ INFO {'id': 87, 'type': 'Scale', 'name': 'tdnnf10.noop.Scale.0.66', 'input': ['tdnnf10.noop'], 'scale': 0.66}
2020-02-11 14:44:31,406 __main__ INFO {'id': 88, 'type': 'Sum', 'name': 'tdnnf10.noop.Scale.0.66.Sum.tdnnf11.dropout', 'input': ['tdnnf10.noop.Scale.0.66', 'tdnnf11.dropout']}
2020-02-11 14:44:31,406 __main__ INFO {'input': ['tdnnf10.noop.Scale.0.66.Sum.tdnnf11.dropout'], 'component': 'tdnnf11.noop', 'name': 'tdnnf11.noop', 'node_type': 'component-node', 'id': 89, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,406 __main__ INFO {'input': ['tdnnf11.noop'], 'component': 'tdnnf12.linear', 'name': 'tdnnf12.linear', 'node_type': 'component-node', 'id': 90, 'time_offsets': array([-3,  0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.470395}
2020-02-11 14:44:31,407 __main__ INFO {'input': ['tdnnf12.linear'], 'component': 'tdnnf12.affine', 'name': 'tdnnf12.affine', 'node_type': 'component-node', 'id': 91, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 8.823896, 'bias-l2-norm': 1.1193717}
2020-02-11 14:44:31,408 __main__ INFO {'input': ['tdnnf12.affine'], 'component': 'tdnnf12.relu', 'name': 'tdnnf12.relu', 'node_type': 'component-node', 'id': 92, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 16640.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 7.735138, 'deriv_avg-l2-norm': 12.613586, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,408 __main__ INFO {'input': ['tdnnf12.relu'], 'component': 'tdnnf12.batchnorm', 'name': 'tdnnf12.batchnorm', 'node_type': 'component-node', 'id': 93, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 45184.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 7.646413, 'stats_var-l2-norm': 7.3074822}
2020-02-11 14:44:31,408 __main__ INFO {'input': ['tdnnf12.batchnorm'], 'component': 'tdnnf12.dropout', 'name': 'tdnnf12.dropout', 'node_type': 'component-node', 'id': 94, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,408 __main__ INFO {'id': 95, 'type': 'Scale', 'name': 'tdnnf11.noop.Scale.0.66', 'input': ['tdnnf11.noop'], 'scale': 0.66}
2020-02-11 14:44:31,409 __main__ INFO {'id': 96, 'type': 'Sum', 'name': 'tdnnf11.noop.Scale.0.66.Sum.tdnnf12.dropout', 'input': ['tdnnf11.noop.Scale.0.66', 'tdnnf12.dropout']}
2020-02-11 14:44:31,409 __main__ INFO {'input': ['tdnnf11.noop.Scale.0.66.Sum.tdnnf12.dropout'], 'component': 'tdnnf12.noop', 'name': 'tdnnf12.noop', 'node_type': 'component-node', 'id': 97, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,409 __main__ INFO {'input': ['tdnnf12.noop'], 'component': 'tdnnf13.linear', 'name': 'tdnnf13.linear', 'node_type': 'component-node', 'id': 98, 'time_offsets': array([-3,  0]), 'params': (128, 2048), 'bias': array([], dtype=float32), 'orthonormal_constraint': -1.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 10.259589}
2020-02-11 14:44:31,410 __main__ INFO {'input': ['tdnnf13.linear'], 'component': 'tdnnf13.affine', 'name': 'tdnnf13.affine', 'node_type': 'component-node', 'id': 99, 'time_offsets': array([0, 3]), 'params': (1024, 256), 'bias': (1024,), 'orthonormal_constraint': 0.0, 'use_natrual_gradient': True, 'num_samples_history': 2000.0, 'alpha_inout': 4.0, 'rank_inout': 20, 'type': 'Tdnn', 'raw-type': 'Tdnn', 'params-l2-norm': 8.941648, 'bias-l2-norm': 0.99510527}
2020-02-11 14:44:31,410 __main__ INFO {'input': ['tdnnf13.affine'], 'component': 'tdnnf13.relu', 'name': 'tdnnf13.relu', 'node_type': 'component-node', 'id': 100, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 32512.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 5.9976406, 'deriv_avg-l2-norm': 11.490221, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,411 __main__ INFO {'input': ['tdnnf13.relu'], 'component': 'tdnnf13.batchnorm', 'name': 'tdnnf13.batchnorm', 'node_type': 'component-node', 'id': 101, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 43264.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 5.9866805, 'stats_var-l2-norm': 5.459227}
2020-02-11 14:44:31,411 __main__ INFO {'input': ['tdnnf13.batchnorm'], 'component': 'tdnnf13.dropout', 'name': 'tdnnf13.dropout', 'node_type': 'component-node', 'id': 102, 'dim': 1024, 'type': 'Dropout', 'raw-type': 'GeneralDropout'}
2020-02-11 14:44:31,411 __main__ INFO {'id': 103, 'type': 'Scale', 'name': 'tdnnf12.noop.Scale.0.66', 'input': ['tdnnf12.noop'], 'scale': 0.66}
2020-02-11 14:44:31,411 __main__ INFO {'id': 104, 'type': 'Sum', 'name': 'tdnnf12.noop.Scale.0.66.Sum.tdnnf13.dropout', 'input': ['tdnnf12.noop.Scale.0.66', 'tdnnf13.dropout']}
2020-02-11 14:44:31,411 __main__ INFO {'input': ['tdnnf12.noop.Scale.0.66.Sum.tdnnf13.dropout'], 'component': 'tdnnf13.noop', 'name': 'tdnnf13.noop', 'node_type': 'component-node', 'id': 105, 'dim': 1024, 'type': 'NoOp', 'raw-type': 'NoOp'}
2020-02-11 14:44:31,412 __main__ INFO {'input': ['tdnnf13.noop'], 'component': 'prefinal-l', 'name': 'prefinal-l', 'node_type': 'component-node', 'id': 106, 'params': (256, 1024), 'rank_inout': 20, 'alpha': 4.0, 'num_samples_history': 2000.0, 'type': 'Linear', 'raw-type': 'Linear', 'params-l2-norm': 14.927766}
2020-02-11 14:44:31,412 __main__ INFO {'input': ['prefinal-l'], 'component': 'prefinal-chain.affine', 'name': 'prefinal-chain.affine', 'node_type': 'component-node', 'id': 107, 'max_change': 0.75, 'params': (1024, 256), 'bias': (1024,), 'rank_in': 20, 'rank_out': 80, 'num_samples_history': 2000.0, 'alpha': 4.0, 'type': 'Gemm', 'raw-type': 'NaturalGradientAffine', 'params-l2-norm': 9.80533, 'bias-l2-norm': 1.433072}
2020-02-11 14:44:31,413 __main__ INFO {'input': ['prefinal-chain.affine'], 'component': 'prefinal-chain.relu', 'name': 'prefinal-chain.relu', 'node_type': 'component-node', 'id': 108, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 19712.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 6.8817225, 'deriv_avg-l2-norm': 12.461684, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,413 __main__ INFO {'input': ['prefinal-chain.relu'], 'component': 'prefinal-chain.batchnorm1', 'name': 'prefinal-chain.batchnorm1', 'node_type': 'component-node', 'id': 109, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 43264.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 6.939498, 'stats_var-l2-norm': 6.1382785}
2020-02-11 14:44:31,413 __main__ INFO {'input': ['prefinal-chain.batchnorm1'], 'component': 'prefinal-chain.linear', 'name': 'prefinal-chain.linear', 'node_type': 'component-node', 'id': 110, 'params': (256, 1024), 'rank_inout': 20, 'alpha': 4.0, 'num_samples_history': 2000.0, 'type': 'Linear', 'raw-type': 'Linear', 'params-l2-norm': 14.669719}
2020-02-11 14:44:31,414 __main__ INFO {'input': ['prefinal-chain.linear'], 'component': 'prefinal-chain.batchnorm2', 'name': 'prefinal-chain.batchnorm2', 'node_type': 'component-node', 'id': 111, 'dim': 256, 'block_dim': 256, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 43264.0, 'stats_mean': (256,), 'stats_var': (256,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 3.6812432e-07, 'stats_var-l2-norm': 21.627495}
2020-02-11 14:44:31,415 __main__ INFO {'input': ['prefinal-chain.batchnorm2'], 'component': 'output.affine', 'name': 'output.affine', 'node_type': 'component-node', 'id': 112, 'max_change': 1.5, 'params': (3448, 256), 'bias': (3448,), 'rank_in': 20, 'rank_out': 80, 'num_samples_history': 2000.0, 'alpha': 4.0, 'type': 'Gemm', 'raw-type': 'NaturalGradientAffine', 'params-l2-norm': 33.547924, 'bias-l2-norm': 6.109993}
2020-02-11 14:44:31,415 __main__ INFO {'objective': 'linear', 'input': ['output.affine'], 'name': 'output', 'node_type': 'output-node', 'type': 'Output', 'id': 113}
2020-02-11 14:44:31,415 __main__ INFO {'input': ['prefinal-l'], 'component': 'prefinal-xent.affine', 'name': 'prefinal-xent.affine', 'node_type': 'component-node', 'id': 114, 'max_change': 0.75, 'params': (1024, 256), 'bias': (1024,), 'rank_in': 20, 'rank_out': 80, 'num_samples_history': 2000.0, 'alpha': 4.0, 'type': 'Gemm', 'raw-type': 'NaturalGradientAffine', 'params-l2-norm': 8.215993, 'bias-l2-norm': 2.358821}
2020-02-11 14:44:31,416 __main__ INFO {'input': ['prefinal-xent.affine'], 'component': 'prefinal-xent.relu', 'name': 'prefinal-xent.relu', 'node_type': 'component-node', 'id': 115, 'dim': 1024, 'value_avg': (1024,), 'deriv_avg': (1024,), 'count': 23936.0, 'oderiv_rms': (1024,), 'oderiv_count': 0.0, 'type': 'Relu', 'raw-type': 'RectifiedLinear', 'value_avg-l2-norm': 6.315324, 'deriv_avg-l2-norm': 12.559063, 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,416 __main__ INFO {'input': ['prefinal-xent.relu'], 'component': 'prefinal-xent.batchnorm1', 'name': 'prefinal-xent.batchnorm1', 'node_type': 'component-node', 'id': 116, 'dim': 1024, 'block_dim': 1024, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 43264.0, 'stats_mean': (1024,), 'stats_var': (1024,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 6.244672, 'stats_var-l2-norm': 3.9114242}
2020-02-11 14:44:31,416 __main__ INFO {'input': ['prefinal-xent.batchnorm1'], 'component': 'prefinal-xent.linear', 'name': 'prefinal-xent.linear', 'node_type': 'component-node', 'id': 117, 'params': (256, 1024), 'rank_inout': 20, 'alpha': 4.0, 'num_samples_history': 2000.0, 'type': 'Linear', 'raw-type': 'Linear', 'params-l2-norm': 10.986344}
2020-02-11 14:44:31,417 __main__ INFO {'input': ['prefinal-xent.linear'], 'component': 'prefinal-xent.batchnorm2', 'name': 'prefinal-xent.batchnorm2', 'node_type': 'component-node', 'id': 118, 'dim': 256, 'block_dim': 256, 'epsilon': 0.001, 'target_rms': 1.0, 'test_mode': False, 'count': 43264.0, 'stats_mean': (256,), 'stats_var': (256,), 'type': 'BatchNorm', 'raw-type': 'BatchNorm', 'stats_mean-l2-norm': 2.989858e-07, 'stats_var-l2-norm': 6.3436804}
2020-02-11 14:44:31,418 __main__ INFO {'input': ['prefinal-xent.batchnorm2'], 'component': 'output-xent.affine', 'name': 'output-xent.affine', 'node_type': 'component-node', 'id': 119, 'max_change': 1.5, 'params': (3448, 256), 'bias': (3448,), 'rank_in': 20, 'rank_out': 80, 'num_samples_history': 2000.0, 'alpha': 4.0, 'type': 'Gemm', 'raw-type': 'NaturalGradientAffine', 'params-l2-norm': 54.274277, 'bias-l2-norm': 2.9230652}
2020-02-11 14:44:31,418 __main__ INFO {'input': ['output-xent.affine'], 'component': 'output-xent.log-softmax', 'name': 'output-xent.log-softmax', 'node_type': 'component-node', 'id': 120, 'dim': 3448, 'value_avg': array([], dtype=float32), 'deriv_avg': array([], dtype=float32), 'count': 0.0, 'oderiv_rms': (3448,), 'oderiv_count': 0.0, 'type': 'LogSoftmax', 'raw-type': 'LogSoftmax', 'oderiv_rms-l2-norm': 0.0}
2020-02-11 14:44:31,418 __main__ INFO {'objective': 'linear', 'input': ['output-xent.log-softmax'], 'name': 'output-xent', 'node_type': 'output-node', 'type': 'Output', 'id': 121}

and this is the l2-norm of pytorch's model parameter

2020-02-11 14:59:18,264 (common:38) INFO: load checkpoint from /mnt/cfs1_alias1/asr/users/fanlu/task/kaldi_recipe/pybind/s10/exp/chain/train_q2_orthogonal_modelmodel_tdnnf3_def_init_opadam_bs128_ep6_lr1e-3_fpe150_110_90_hn1024_fpr1500000_ms1_2_3_4_5_kernel1_1_1_0_3_3_3_3_3_3_3_3_stride1_1_1_3_1_1_1_1_1_1_1_1_l2r5e-4/best_model.pt
2020-02-11 14:59:19,863 (model_tdnnf3:224) INFO: name: tdnn1_affine.weight, shape: torch.Size([1024, 129]), l2-norm: 13.578791618347168, np-l2-norm: 13.578801155090332
2020-02-11 14:59:28,992 (model_tdnnf3:224) INFO: name: tdnn1_affine.bias, shape: torch.Size([1024]), l2-norm: 2.2701199054718018, np-l2-norm: 2.270120859146118
2020-02-11 14:59:28,992 (model_tdnnf3:224) INFO: name: tdnn1_batchnorm.weight, shape: torch.Size([1024]), l2-norm: 10.799092292785645, np-l2-norm: 10.799091339111328
2020-02-11 14:59:28,993 (model_tdnnf3:224) INFO: name: tdnn1_batchnorm.bias, shape: torch.Size([1024]), l2-norm: 2.189619302749634, np-l2-norm: 2.1896190643310547
2020-02-11 14:59:28,994 (model_tdnnf3:224) INFO: name: tdnnfs.0.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 11.698036193847656, np-l2-norm: 11.698058128356934
2020-02-11 14:59:28,994 (model_tdnnf3:224) INFO: name: tdnnfs.0.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 12.456302642822266, np-l2-norm: 12.45630931854248
2020-02-11 14:59:28,995 (model_tdnnf3:224) INFO: name: tdnnfs.0.affine.bias, shape: torch.Size([1024]), l2-norm: 0.8999515175819397, np-l2-norm: 0.8999518156051636
2020-02-11 14:59:28,995 (model_tdnnf3:224) INFO: name: tdnnfs.0.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 8.536678314208984, np-l2-norm: 8.536681175231934
2020-02-11 14:59:28,995 (model_tdnnf3:224) INFO: name: tdnnfs.0.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.9667662382125854, np-l2-norm: 0.9667660593986511
2020-02-11 14:59:28,996 (model_tdnnf3:224) INFO: name: tdnnfs.1.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 11.157291412353516, np-l2-norm: 11.15730094909668
2020-02-11 14:59:28,997 (model_tdnnf3:224) INFO: name: tdnnfs.1.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.972347259521484, np-l2-norm: 10.972354888916016
2020-02-11 14:59:28,997 (model_tdnnf3:224) INFO: name: tdnnfs.1.affine.bias, shape: torch.Size([1024]), l2-norm: 0.6986019611358643, np-l2-norm: 0.6986021399497986
2020-02-11 14:59:28,997 (model_tdnnf3:224) INFO: name: tdnnfs.1.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 7.437180995941162, np-l2-norm: 7.437183856964111
2020-02-11 14:59:28,998 (model_tdnnf3:224) INFO: name: tdnnfs.1.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.6932032704353333, np-l2-norm: 0.6932030916213989
2020-02-11 14:59:28,998 (model_tdnnf3:224) INFO: name: tdnnfs.2.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.548410415649414, np-l2-norm: 10.548415184020996
2020-02-11 14:59:28,999 (model_tdnnf3:224) INFO: name: tdnnfs.2.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.08105754852295, np-l2-norm: 10.081061363220215
2020-02-11 14:59:28,999 (model_tdnnf3:224) INFO: name: tdnnfs.2.affine.bias, shape: torch.Size([1024]), l2-norm: 0.5530431866645813, np-l2-norm: 0.5530433654785156
2020-02-11 14:59:29,000 (model_tdnnf3:224) INFO: name: tdnnfs.2.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 7.182321548461914, np-l2-norm: 7.182323932647705
2020-02-11 14:59:29,000 (model_tdnnf3:224) INFO: name: tdnnfs.2.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.7540080547332764, np-l2-norm: 0.7540078163146973
2020-02-11 14:59:29,001 (model_tdnnf3:224) INFO: name: tdnnfs.3.linear.conv.weight, shape: torch.Size([128, 1024, 1]), l2-norm: 7.231468677520752, np-l2-norm: 7.23146915435791
2020-02-11 14:59:29,001 (model_tdnnf3:224) INFO: name: tdnnfs.3.affine.weight, shape: torch.Size([1024, 128, 1]), l2-norm: 7.3756842613220215, np-l2-norm: 7.37568473815918
2020-02-11 14:59:29,001 (model_tdnnf3:224) INFO: name: tdnnfs.3.affine.bias, shape: torch.Size([1024]), l2-norm: 0.8493649363517761, np-l2-norm: 0.8493649959564209
2020-02-11 14:59:29,002 (model_tdnnf3:224) INFO: name: tdnnfs.3.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 4.721061706542969, np-l2-norm: 4.721061706542969
2020-02-11 14:59:29,002 (model_tdnnf3:224) INFO: name: tdnnfs.3.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.8471177816390991, np-l2-norm: 0.8471177220344543
2020-02-11 14:59:29,003 (model_tdnnf3:224) INFO: name: tdnnfs.4.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.269883155822754, np-l2-norm: 10.26988697052002
2020-02-11 14:59:29,003 (model_tdnnf3:224) INFO: name: tdnnfs.4.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 11.01193904876709, np-l2-norm: 11.011943817138672
2020-02-11 14:59:29,004 (model_tdnnf3:224) INFO: name: tdnnfs.4.affine.bias, shape: torch.Size([1024]), l2-norm: 0.6541978120803833, np-l2-norm: 0.6541979312896729
2020-02-11 14:59:29,004 (model_tdnnf3:224) INFO: name: tdnnfs.4.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.021320343017578, np-l2-norm: 6.021320343017578
2020-02-11 14:59:29,004 (model_tdnnf3:224) INFO: name: tdnnfs.4.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.7195524573326111, np-l2-norm: 0.7195526957511902
2020-02-11 14:59:29,005 (model_tdnnf3:224) INFO: name: tdnnfs.5.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.503177642822266, np-l2-norm: 10.503182411193848
2020-02-11 14:59:29,006 (model_tdnnf3:224) INFO: name: tdnnfs.5.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 11.230865478515625, np-l2-norm: 11.23087215423584
2020-02-11 14:59:29,006 (model_tdnnf3:224) INFO: name: tdnnfs.5.affine.bias, shape: torch.Size([1024]), l2-norm: 0.6815171837806702, np-l2-norm: 0.6815172433853149
2020-02-11 14:59:29,006 (model_tdnnf3:224) INFO: name: tdnnfs.5.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.027979373931885, np-l2-norm: 6.027980327606201
2020-02-11 14:59:29,007 (model_tdnnf3:224) INFO: name: tdnnfs.5.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.66847163438797, np-l2-norm: 0.6684714555740356
2020-02-11 14:59:29,007 (model_tdnnf3:224) INFO: name: tdnnfs.6.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.178318977355957, np-l2-norm: 10.178319931030273
2020-02-11 14:59:29,008 (model_tdnnf3:224) INFO: name: tdnnfs.6.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.798919677734375, np-l2-norm: 10.798924446105957
2020-02-11 14:59:29,008 (model_tdnnf3:224) INFO: name: tdnnfs.6.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7030293345451355, np-l2-norm: 0.703029453754425
2020-02-11 14:59:29,009 (model_tdnnf3:224) INFO: name: tdnnfs.6.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 5.839771747589111, np-l2-norm: 5.839775085449219
2020-02-11 14:59:29,009 (model_tdnnf3:224) INFO: name: tdnnfs.6.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.6625904440879822, np-l2-norm: 0.6625903248786926
2020-02-11 14:59:29,010 (model_tdnnf3:224) INFO: name: tdnnfs.7.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.609781265258789, np-l2-norm: 10.609784126281738
2020-02-11 14:59:29,010 (model_tdnnf3:224) INFO: name: tdnnfs.7.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.92614459991455, np-l2-norm: 10.926151275634766
2020-02-11 14:59:29,011 (model_tdnnf3:224) INFO: name: tdnnfs.7.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7368164658546448, np-l2-norm: 0.7368165850639343
2020-02-11 14:59:29,011 (model_tdnnf3:224) INFO: name: tdnnfs.7.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.065881252288818, np-l2-norm: 6.065880298614502
2020-02-11 14:59:29,011 (model_tdnnf3:224) INFO: name: tdnnfs.7.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.6848169565200806, np-l2-norm: 0.6848171353340149
2020-02-11 14:59:29,012 (model_tdnnf3:224) INFO: name: tdnnfs.8.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.567127227783203, np-l2-norm: 10.567130088806152
2020-02-11 14:59:29,013 (model_tdnnf3:224) INFO: name: tdnnfs.8.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.671625137329102, np-l2-norm: 10.671629905700684
2020-02-11 14:59:29,013 (model_tdnnf3:224) INFO: name: tdnnfs.8.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7605870962142944, np-l2-norm: 0.7605868577957153
2020-02-11 14:59:29,013 (model_tdnnf3:224) INFO: name: tdnnfs.8.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.370297908782959, np-l2-norm: 6.37030029296875
2020-02-11 14:59:29,014 (model_tdnnf3:224) INFO: name: tdnnfs.8.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.673714280128479, np-l2-norm: 0.6737140417098999
2020-02-11 14:59:29,014 (model_tdnnf3:224) INFO: name: tdnnfs.9.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.171960830688477, np-l2-norm: 10.17195987701416
2020-02-11 14:59:29,015 (model_tdnnf3:224) INFO: name: tdnnfs.9.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.168111801147461, np-l2-norm: 10.168119430541992
2020-02-11 14:59:29,015 (model_tdnnf3:224) INFO: name: tdnnfs.9.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7516912817955017, np-l2-norm: 0.7516909241676331
2020-02-11 14:59:29,016 (model_tdnnf3:224) INFO: name: tdnnfs.9.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.299006938934326, np-l2-norm: 6.299007892608643
2020-02-11 14:59:29,016 (model_tdnnf3:224) INFO: name: tdnnfs.9.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.6358782649040222, np-l2-norm: 0.6358781456947327
2020-02-11 14:59:29,017 (model_tdnnf3:224) INFO: name: tdnnfs.10.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 10.095071792602539, np-l2-norm: 10.095071792602539
2020-02-11 14:59:29,017 (model_tdnnf3:224) INFO: name: tdnnfs.10.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 10.038125038146973, np-l2-norm: 10.038130760192871
2020-02-11 14:59:29,017 (model_tdnnf3:224) INFO: name: tdnnfs.10.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7603970766067505, np-l2-norm: 0.7603970170021057
2020-02-11 14:59:29,018 (model_tdnnf3:224) INFO: name: tdnnfs.10.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.457774639129639, np-l2-norm: 6.457772731781006
2020-02-11 14:59:29,018 (model_tdnnf3:224) INFO: name: tdnnfs.10.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 0.7173940539360046, np-l2-norm: 0.7173939347267151
2020-02-11 14:59:29,019 (model_tdnnf3:224) INFO: name: tdnnfs.11.linear.conv.weight, shape: torch.Size([128, 1024, 2]), l2-norm: 9.877976417541504, np-l2-norm: 9.877982139587402
2020-02-11 14:59:29,019 (model_tdnnf3:224) INFO: name: tdnnfs.11.affine.weight, shape: torch.Size([1024, 128, 2]), l2-norm: 9.827290534973145, np-l2-norm: 9.827296257019043
2020-02-11 14:59:29,020 (model_tdnnf3:224) INFO: name: tdnnfs.11.affine.bias, shape: torch.Size([1024]), l2-norm: 0.7950196266174316, np-l2-norm: 0.7950197458267212
2020-02-11 14:59:29,020 (model_tdnnf3:224) INFO: name: tdnnfs.11.batchnorm.weight, shape: torch.Size([1024]), l2-norm: 6.03067684173584, np-l2-norm: 6.03067684173584
2020-02-11 14:59:29,020 (model_tdnnf3:224) INFO: name: tdnnfs.11.batchnorm.bias, shape: torch.Size([1024]), l2-norm: 1.1897754669189453, np-l2-norm: 1.1897754669189453
2020-02-11 14:59:29,021 (model_tdnnf3:224) INFO: name: prefinal_l.conv.weight, shape: torch.Size([256, 1024, 1]), l2-norm: 12.372564315795898, np-l2-norm: 12.372568130493164
2020-02-11 14:59:29,022 (model_tdnnf3:224) INFO: name: prefinal_chain.affine.weight, shape: torch.Size([1024, 256]), l2-norm: 10.835281372070312, np-l2-norm: 10.835285186767578
2020-02-11 14:59:29,022 (model_tdnnf3:224) INFO: name: prefinal_chain.affine.bias, shape: torch.Size([1024]), l2-norm: 1.2527509927749634, np-l2-norm: 1.2527514696121216
2020-02-11 14:59:29,023 (model_tdnnf3:224) INFO: name: prefinal_chain.batchnorm1.weight, shape: torch.Size([1024]), l2-norm: 10.376221656799316, np-l2-norm: 10.37622356414795
2020-02-11 14:59:29,024 (model_tdnnf3:224) INFO: name: prefinal_chain.batchnorm1.bias, shape: torch.Size([1024]), l2-norm: 1.386192707286682e-05, np-l2-norm: 1.3861925253877416e-05
2020-02-11 14:59:29,025 (model_tdnnf3:224) INFO: name: prefinal_chain.linear.conv.weight, shape: torch.Size([256, 1024, 1]), l2-norm: 10.254746437072754, np-l2-norm: 10.254751205444336
2020-02-11 14:59:29,025 (model_tdnnf3:224) INFO: name: prefinal_chain.batchnorm2.weight, shape: torch.Size([256]), l2-norm: 7.542169570922852, np-l2-norm: 7.542168140411377
2020-02-11 14:59:29,025 (model_tdnnf3:224) INFO: name: prefinal_chain.batchnorm2.bias, shape: torch.Size([256]), l2-norm: 0.5150323510169983, np-l2-norm: 0.5150324702262878
2020-02-11 14:59:29,027 (model_tdnnf3:224) INFO: name: output_affine.weight, shape: torch.Size([4336, 256]), l2-norm: 16.972375869750977, np-l2-norm: 16.97245216369629
2020-02-11 14:59:29,027 (model_tdnnf3:224) INFO: name: output_affine.bias, shape: torch.Size([4336]), l2-norm: 0.8673517107963562, np-l2-norm: 0.8673520088195801
2020-02-11 14:59:29,028 (model_tdnnf3:224) INFO: name: prefinal_xent.affine.weight, shape: torch.Size([1024, 256]), l2-norm: 8.239212989807129, np-l2-norm: 8.239215850830078
2020-02-11 14:59:29,028 (model_tdnnf3:224) INFO: name: prefinal_xent.affine.bias, shape: torch.Size([1024]), l2-norm: 0.741665780544281, np-l2-norm: 0.7416657209396362
2020-02-11 14:59:29,029 (model_tdnnf3:224) INFO: name: prefinal_xent.batchnorm1.weight, shape: torch.Size([1024]), l2-norm: 6.000321865081787, np-l2-norm: 6.00032377243042
2020-02-11 14:59:29,029 (model_tdnnf3:224) INFO: name: prefinal_xent.batchnorm1.bias, shape: torch.Size([1024]), l2-norm: 2.1985473722452298e-05, np-l2-norm: 2.198547008447349e-05
2020-02-11 14:59:29,030 (model_tdnnf3:224) INFO: name: prefinal_xent.linear.conv.weight, shape: torch.Size([256, 1024, 1]), l2-norm: 9.000222206115723, np-l2-norm: 9.000225067138672
2020-02-11 14:59:29,030 (model_tdnnf3:224) INFO: name: prefinal_xent.batchnorm2.weight, shape: torch.Size([256]), l2-norm: 29.329822540283203, np-l2-norm: 29.329822540283203
2020-02-11 14:59:29,031 (model_tdnnf3:224) INFO: name: prefinal_xent.batchnorm2.bias, shape: torch.Size([256]), l2-norm: 2.5147366523742676, np-l2-norm: 2.5147361755371094
2020-02-11 14:59:29,032 (model_tdnnf3:224) INFO: name: output_xent_affine.weight, shape: torch.Size([4336, 256]), l2-norm: 30.42934799194336, np-l2-norm: 30.429412841796875
2020-02-11 14:59:29,033 (model_tdnnf3:224) INFO: name: output_xent_affine.bias, shape: torch.Size([4336]), l2-norm: 0.7537439465522766, np-l2-norm: 0.7537445425987244
2020-02-11 14:59:29,033 (model_tdnnf3:224) INFO: name: input_batch_norm.weight, shape: torch.Size([129]), l2-norm: 8.521322250366211, np-l2-norm: 8.521322250366211
2020-02-11 14:59:29,033 (model_tdnnf3:224) INFO: name: input_batch_norm.bias, shape: torch.Size([129]), l2-norm: 1.5148330926895142, np-l2-norm: 1.5148330926895142

@danpovey
Copy link
Contributor

danpovey commented Feb 11, 2020 via email

@fanlu
Copy link

fanlu commented Feb 11, 2020

oh sorry for Kaldi's model.. it will be printed in the progress.N.log, search for Norm

On Tue, Feb 11, 2020 at 2:22 PM 付嘉懿 @.***> wrote: which layer should I focus on ? And Is there a tool to get l2 norm of kaldi's parameter? Maybe this is a simple way: use kaldi tool: "nnet-am-copy --binary=false final.mdl" to convert the mdl file to the text mode and then write a script a compute the 2-norm of weights. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3892?email_source=notifications&email_token=AAZFLO5VNIK7WCBUT6DKGX3RCI72XA5CNFSM4KNO5YQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELLKSHA#issuecomment-584493340>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO2X4YKGIPDIS7REQFDRCI72XANCNFSM4KNO5YQA .

the log of last iter or all iters? this is the Norms log from kaldi's tdnn_1c.

./log/progress.74.log:LOG (nnet3-show-progress[5.5.717~1-e05890d]:main():nnet3-show-progress.cc:153) Norms of parameter matrices from <new-nnet-in> are [ tdnn1.affine:18.4505 tdnnf2.linear:14.7518 tdnnf2.affine:13.3762 tdnnf3.linear:12.3981 tdnnf3.affine:11.0532 tdnnf4.linear:11.7087 tdnnf4.affine:10.0316 tdnnf5.linear:8.64949 tdnnf5.affine:8.20888 tdnnf6.linear:11.7272 tdnnf6.affine:10.3102 tdnnf7.linear:11.3739 tdnnf7.affine:10.0447 tdnnf8.linear:11.1174 tdnnf8.affine:9.76855 tdnnf9.linear:10.9489 tdnnf9.affine:9.57144 tdnnf10.linear:10.8642 tdnnf10.affine:9.30931 tdnnf11.linear:10.8142 tdnnf11.affine:9.08426 tdnnf12.linear:10.6601 tdnnf12.affine:9.06823 tdnnf13.linear:10.4231 tdnnf13.affine:9.131 prefinal-l:15.0584 prefinal-chain.affine:10.0323 prefinal-chain.linear:15.0632 output.affine:34.4237 prefinal-xent.affine:8.71721 prefinal-xent.linear:11.205 output-xent.affine:54.2259 ]
./log/progress.75.log:LOG (nnet3-show-progress[5.5.717~1-e05890d]:main():nnet3-show-progress.cc:153) Norms of parameter matrices from <new-nnet-in> are [ tdnn1.affine:18.3429 tdnnf2.linear:14.6932 tdnnf2.affine:13.3021 tdnnf3.linear:12.3406 tdnnf3.affine:11.0014 tdnnf4.linear:11.6603 tdnnf4.affine:9.9918 tdnnf5.linear:8.60848 tdnnf5.affine:8.17265 tdnnf6.linear:11.6833 tdnnf6.affine:10.2691 tdnnf7.linear:11.3308 tdnnf7.affine:10.006 tdnnf8.linear:11.0748 tdnnf8.affine:9.73264 tdnnf9.linear:10.9096 tdnnf9.affine:9.53797 tdnnf10.linear:10.8255 tdnnf10.affine:9.27463 tdnnf11.linear:10.7736 tdnnf11.affine:9.04792 tdnnf12.linear:10.6172 tdnnf12.affine:9.03221 tdnnf13.linear:10.3811 tdnnf13.affine:9.0969 prefinal-l:14.9907 prefinal-chain.affine:9.98558 prefinal-chain.linear:14.9604 output.affine:34.3603 prefinal-xent.affine:8.66909 prefinal-xent.linear:11.1402 output-xent.affine:54.1951 ]
./log/progress.76.log:LOG (nnet3-show-progress[5.5.717~1-e05890d]:main():nnet3-show-progress.cc:153) Norms of parameter matrices from <new-nnet-in> are [ tdnn1.affine:18.2866 tdnnf2.linear:14.6814 tdnnf2.affine:13.2642 tdnnf3.linear:12.3177 tdnnf3.affine:10.9801 tdnnf4.linear:11.6426 tdnnf4.affine:9.98039 tdnnf5.linear:8.59164 tdnnf5.affine:8.16007 tdnnf6.linear:11.6703 tdnnf6.affine:10.2559 tdnnf7.linear:11.3178 tdnnf7.affine:9.99525 tdnnf8.linear:11.0611 tdnnf8.affine:9.72302 tdnnf9.linear:10.899 tdnnf9.affine:9.53062 tdnnf10.linear:10.8144 tdnnf10.affine:9.26636 tdnnf11.linear:10.7627 tdnnf11.affine:9.03746 tdnnf12.linear:10.6021 tdnnf12.affine:9.02006 tdnnf13.linear:10.3647 tdnnf13.affine:9.08546 prefinal-l:14.9618 prefinal-chain.affine:9.96465 prefinal-chain.linear:14.8922 output.affine:34.3072 prefinal-xent.affine:8.64409 prefinal-xent.linear:11.1031 output-xent.affine:54.2753 ]
./log/progress.77.log:LOG (nnet3-show-progress[5.5.717~1-e05890d]:main():nnet3-show-progress.cc:153) Norms of parameter matrices from <new-nnet-in> are [ tdnn1.affine:18.2199 tdnnf2.linear:14.6626 tdnnf2.affine:13.2239 tdnnf3.linear:12.2911 tdnnf3.affine:10.9535 tdnnf4.linear:11.6206 tdnnf4.affine:9.96432 tdnnf5.linear:8.57113 tdnnf5.affine:8.14308 tdnnf6.linear:11.6539 tdnnf6.affine:10.2378 tdnnf7.linear:11.2999 tdnnf7.affine:9.97964 tdnnf8.linear:11.0445 tdnnf8.affine:9.70736 tdnnf9.linear:10.8822 tdnnf9.affine:9.51659 tdnnf10.linear:10.7979 tdnnf10.affine:9.25141 tdnnf11.linear:10.7436 tdnnf11.affine:9.02005 tdnnf12.linear:10.5817 tdnnf12.affine:9.00242 tdnnf13.linear:10.3444 tdnnf13.affine:9.07074 prefinal-l:14.9253 prefinal-chain.affine:9.93753 prefinal-chain.linear:14.8181 output.affine:34.2569 prefinal-xent.affine:8.61303 prefinal-xent.linear:11.0593 output-xent.affine:54.323 ]
./log/progress.78.log:LOG (nnet3-show-progress[5.5.717~1-e05890d]:main():nnet3-show-progress.cc:153) Norms of parameter matrices from <new-nnet-in> are [ tdnn1.affine:18.1474 tdnnf2.linear:14.6329 tdnnf2.affine:13.1756 tdnnf3.linear:12.2561 tdnnf3.affine:10.9222 tdnnf4.linear:11.5922 tdnnf4.affine:9.94353 tdnnf5.linear:8.5489 tdnnf5.affine:8.12412 tdnnf6.linear:11.6324 tdnnf6.affine:10.2153 tdnnf7.linear:11.2786 tdnnf7.affine:9.96117 tdnnf8.linear:11.0244 tdnnf8.affine:9.69122 tdnnf9.linear:10.8637 tdnnf9.affine:9.50082 tdnnf10.linear:10.7798 tdnnf10.affine:9.23452 tdnnf11.linear:10.7234 tdnnf11.affine:9.00197 tdnnf12.linear:10.5592 tdnnf12.affine:8.98375 tdnnf13.linear:10.3206 tdnnf13.affine:9.05327 prefinal-l:14.8827 prefinal-chain.affine:9.90754 prefinal-chain.linear:14.7399 output.affine:34.2076 prefinal-xent.affine:8.58043 prefinal-xent.linear:11.0129 output-xent.affine:54.3543 ]

@danpovey
Copy link
Contributor

danpovey commented Feb 11, 2020 via email

@fanlu
Copy link

fanlu commented Feb 11, 2020

I am drawing the corresponding norms of the Pytorch model, please wait a while.

@danpovey
Copy link
Contributor

danpovey commented Feb 11, 2020 via email

@fanlu
Copy link

fanlu commented Feb 11, 2020

I must run this exp to log norm of every iteration again, Since I have the last norm of Pytorch model only.

@danpovey
Copy link
Contributor

danpovey commented Feb 11, 2020 via email

@danpovey
Copy link
Contributor

Let me merge this now, so we don't get too far out of sync.

@danpovey danpovey merged commit be0842f into kaldi-asr:pybind11 Feb 11, 2020
@fanlu
Copy link

fanlu commented Feb 11, 2020

Here is the differences.

kaldi   [ tdnn1.affine:18.1474 tdnnf2.linear:14.6329 tdnnf2.affine:13.1756 tdnnf3.linear:12.2561 tdnnf3.affine:10.9222 tdnnf4.linear:11.5922 tdnnf4.affine:9.94353 tdnnf5.linear:8.5489 tdnnf5.affine:8.12412 tdnnf6.linear:11.6324 tdnnf6.affine:10.2153 tdnnf7.linear:11.2786 tdnnf7.affine:9.96117 tdnnf8.linear:11.0244 tdnnf8.affine:9.69122 tdnnf9.linear:10.8637 tdnnf9.affine:9.50082 tdnnf10.linear:10.7798 tdnnf10.affine:9.23452 tdnnf11.linear:10.7234 tdnnf11.affine:9.00197 tdnnf12.linear:10.5592 tdnnf12.affine:8.98375 tdnnf13.linear:10.3206 tdnnf13.affine:9.05327 prefinal-l:14.8827 prefinal-chain.affine:9.90754 prefinal-chain.linear:14.7399 output.affine:34.2076 prefinal-xent.affine:8.58043 prefinal-xent.linear:11.0129 output-xent.affine:54.3543 ]
pytorch [ tdnn1.affine:13.5788 tdnnf2.linear:11.6980 tdnnf2.affine:12.4563 tdnnf3.linear:11.1573 tdnnf3.affine:10.9723 tdnnf4.linear:10.5484 tdnnf4.affine:10.0810 tdnnf5.linear:7.2315 tdnnf5.affine:7.3757 tdnnf6.linear:10.2699 tdnnf6.affine:11.0119 tdnnf7.linear:10.5032 tdnnf7.affine:11.2309 tdnnf8.linear:10.1783 tdnnf8.affine:10.7989 tdnnf9.linear:10.6098 tdnnf9.affine:10.9261 tdnnf10.linear:10.5671 tdnnf10.affine:10.6716 tdnnf11.linear:10.1720 tdnnf11.affine:10.1681 tdnnf12.linear:10.0951 tdnnf12.affine:10.0381 tdnnf13.linear:9.8780 tdnnf13.affine:9.8273 prefinal-l:12.3726 prefinal-chain.affine:10.83528 prefinal-chain.linear:10.2547 output.affine:16.9724 prefinal-xent.affine:8.2392 prefinal-xent.linear:9.0002 output-xent.affine:30.4293 ]

@danpovey
Copy link
Contributor

OK, interesting. They are very close. What were the final learning rates in each case, and what was the minibatch size in PyTorch?

@fanlu
Copy link

fanlu commented Feb 11, 2020

image
the final leaning rate is :3.125e-5, and the batch size is 128

# TODO(fangjun): implement GeneralDropoutComponent in PyTorch

if self.linear.kernel_size == 3:
x = self.bypass_scale * input_x[:, :, 1:-1:self.conv_stride] + x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be c:-c:c rather than 1:-1:c, where c is self.conv_stride?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose the time_stride is 1 and the conv_stride is 1.

If the input time index is

0 1 2 3 4 5 6

After self.linear, the time index will be

1 2 3 4 5

since the kernel shape is [-1, 0, 1] (time_stride == 1)

After self.affine, the time index is still

1 2 3 4 5

The index of input[1:-1:self.conv_stride] is [1, 2, 3, 4, 5] which matches
the output of self.affine.


It is assumed that

  • time_stride == 1, conv_stride == 1

or

  • time_stride == 0, conv_stride == 3

So c:-c:c is equivalent to 1:-1:c when time_stride==1 and conv_stride == 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it should be called time_stride here. Perhaps in the original Kaldi code it wasn't super clear but when implemented as convolution it gets very confusing. Better to make (stride, kernel_size) the parameters and have them be (1, 3), (1, 3), ... (3, 3), (1, 1), (1, 3), (1, 3) ...
In any case, please revert other aspects of the implementation to more similar to the way it was before and start doing experiments with that. I don't see much point starting from such a strange starting point. (i.e. the way the code is right now).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

I also find them confusing but I wrote it this way to follow the naming style in Kaldi.

I'll change them now.

stride=conv_stride)

# batchnorm requires [N, C, T]
self.batchnorm = nn.BatchNorm1d(num_features=dim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a closer match to what Kaldi's system is doing if you were to add affine=False wherever you use batchnorm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be addressed in the next pullrequest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants