[not-for-merge] Replacing the statistic pooling layer with self-attention layer in sr… #2223

tomkocse · 2018-02-15T15:59:23Z

…e16 sv task

vimalmanohar · 2018-02-19T21:19:09Z

egs/wsj/s5/steps/libs/nnet3/xconfig/self_layer.py

+
+class XconfigSelfLayer(XconfigLayerBase):
+    """This class is for parsing lines like
+    self-layer name=attention config=mean+stddev(-99:3:9:99) input=tdnn1


Better to use a different name like weighted-stats-layer instead of a generic one like self-layer.

vimalmanohar · 2018-02-19T21:20:30Z

egs/wsj/s5/steps/libs/nnet3/xconfig/self_layer.py

+                       'dim': -1,
+                       'config': '',
+                       'affine-dim': 300,
+                       'num-heads': 0}


Can num-heads be 0? Is it checked anywhere that it should not be zero?

vimalmanohar · 2018-02-19T21:27:18Z

src/nnet3/nnet-general-component.h

+ # then transpose the weights back to column vector.
+
+ configs and their defaults:  input-dim=-1, input-period=1, left-context=-1, right-context=-1,
+    num-heads=1, num-log-count-features=0, output-stddevs=true


num-heads=0 seems to be the default.

vimalmanohar · 2018-02-19T21:27:49Z

egs/wsj/s5/steps/libs/nnet3/xconfig/self_layer.py

+        # The second affine node.
+        affine_options = 'param-stddev=0.04472135955 bias-stddev=1.0 bias-mean=0.0 max-change=0.75 l2-regularize=0.0'
+        configs.append(
+            'component name={0}.second_affine type=NaturalGradientAffineComponent'


Is this needed if num-heads=0?

vimalmanohar · 2018-02-19T21:31:02Z

egs/sre16/v1/local/nnet3/xvector/run_xvector.sh

@@ -1 +0,0 @@
-tuning/run_xvector_1a.sh


Is dropout used?

vimalmanohar · 2018-02-19T21:31:35Z

egs/wsj/s5/steps/libs/nnet3/xconfig/self_layer.py

+             ''.format(self.name))
+
+        # The second affine node.
+        affine_options = 'param-stddev=0.04472135955 bias-stddev=1.0 bias-mean=0.0 max-change=0.75 l2-regularize=0.0'


Explain what the stddev value is..

vimalmanohar · 2018-02-19T21:32:43Z

egs/sre16/v1/local/nnet3/xvector/run_xvector.sh

@@ -1 +0,0 @@
-tuning/run_xvector_1a.sh


sd -> stddev

danpovey · 2018-02-20T01:09:46Z

egs/wsj/s5/steps/libs/nnet3/xconfig/self_layer.py

+                name=self.name, lc=self._left_context, rc=self._right_context,
+                dim=input_dim, input_period=self._input_period,
+                output_period=self._stats_period,
+                var='true' if self._output_stddev else 'false'))


@david-ryan-snyder, you should probably review this too.
This uses StatisticsExtractionComponent but without the StaticsPoolingComponent, which is odd;
it's, in effect, pooling short groups of frames before the self-attention.
Also, this layer seems to be a bit against the spirit of a layer; it has a couple of Affine+Relu+Batchnorm before the poling and attention, so should probably be split into two layers.
@vimalmanohar, if you have specific ideas to improve this, please comment too.

@tomkocse can correct me if I'm wrong, but I don't think this PR is meant to be merged in it's current state. It's related to some experiments we're doing.

I think there's more work to be done on this, both to establish it's value as a layer and to improve the code (again, @tomkocse, correct me if I'm wrong).

@david-ryan-snyder Yes, I agree what you said, let's wait for more results from further experiments.

david-ryan-snyder · 2018-02-20T01:54:28Z

Also, I think this is closely related to the weighted pooling that we experimented with in the past. It appears to perform similarly. We should revisit that before committing to this implementation, as the other weighted pooling was simpler to implement. Anyway, @tomkocse is looking into this stuff more, let's see where this goes first.

danpovey · 2018-02-20T02:17:41Z

src/nnet3/nnet-component-itf.cc

@@ -59,6 +59,8 @@ ComponentPrecomputedIndexes* ComponentPrecomputedIndexes::NewComponentPrecompute
    ans = new StatisticsExtractionComponentPrecomputedIndexes();
  } else if (cpi_type == "StatisticsPoolingComponentPrecomputedIndexes") {
    ans = new StatisticsPoolingComponentPrecomputedIndexes();
+  } else if (cpi_type == "SelfAttentionComponentPrecomputedIndexes") {


@tomkocse, this seems to be a bug fix, can you please make a separate PR for it?

@danpovey I don't see the bug here in the original code, can you point that out ?

oh sorry, I didn't realize that SelfAttentionComponent was new.

pegahgh · 2018-02-20T23:27:37Z

src/nnet3/nnet-general-component.h

+  a linear combination of a series of input frames
+
+ # In SelfAttentionComponent, the first n columns of the input matrix are interpreted
+ # as the weight vectors in multi-head attentions. If only a single-head attention is used,


mention that n is number of heads in the attention layer.

pegahgh · 2018-02-20T23:28:58Z

src/nnet3/nnet-general-component.h

+ # In SelfAttentionComponent, the first n columns of the input matrix are interpreted
+ # as the weight vectors in multi-head attentions. If only a single-head attention is used,
+ # then the first column of the input matrix is the weight vector. The n + 1 th column of
+ # the input matrix is the count from the extraction component, it will not be used in 


Does it mean that you always expect the previous layer to be StatisticExtraction component?

pegahgh · 2018-02-20T23:33:33Z

egs/wsj/s5/steps/libs/nnet3/xconfig/self_layer.py

+        # The first affine node.
+        configs.append(
+            'component name={0}.first_affine type=NaturalGradientAffineComponent'
+            ' input-dim={1} output-dim={2} {3}'


I am not sure, why do you need to add separate part for having affine+nonlin+batchnorm within this component, when you can add this part using relu-batchnorm-layer at script level.

stale · 2020-06-19T07:36:23Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2020-07-19T05:23:52Z

This issue has been automatically closed by a bot strictly because of inactivity. This does not mean that we think that this issue is not important! If you believe it has been closed hastily, add a comment to the issue and mention @kkm000, and I'll gladly reopen it.

stale · 2020-09-17T08:30:18Z

This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open.

Replacing the statistic pooling layer with self-attention layer in sr…

65f68dd

…e16 sv task

vimalmanohar reviewed Feb 19, 2018

View reviewed changes

danpovey reviewed Feb 20, 2018

View reviewed changes

danpovey changed the title ~~Replacing the statistic pooling layer with self-attention layer in sr…~~ [not-for-merge] Replacing the statistic pooling layer with self-attention layer in sr… Feb 20, 2018

danpovey reviewed Feb 20, 2018

View reviewed changes

pegahgh reviewed Feb 20, 2018

View reviewed changes

stale bot added the stale Stale bot on the loose label Jun 19, 2020

stale bot closed this Jul 19, 2020

kkm000 reopened this Jul 19, 2020

stale bot removed the stale Stale bot on the loose label Jul 19, 2020

stale bot added the stale Stale bot on the loose label Sep 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[not-for-merge] Replacing the statistic pooling layer with self-attention layer in sr… #2223

[not-for-merge] Replacing the statistic pooling layer with self-attention layer in sr… #2223

tomkocse commented Feb 15, 2018

vimalmanohar Feb 19, 2018

vimalmanohar Feb 19, 2018

vimalmanohar Feb 19, 2018

vimalmanohar Feb 19, 2018

vimalmanohar Feb 19, 2018

vimalmanohar Feb 19, 2018

vimalmanohar Feb 19, 2018

danpovey Feb 20, 2018

david-ryan-snyder Feb 20, 2018

tomkocse Feb 20, 2018

david-ryan-snyder commented Feb 20, 2018 •

edited

danpovey Feb 20, 2018

tomkocse Feb 20, 2018

danpovey Feb 20, 2018

pegahgh Feb 20, 2018

pegahgh Feb 20, 2018

pegahgh Feb 20, 2018

stale bot commented Jun 19, 2020

stale bot commented Jul 19, 2020

stale bot commented Sep 17, 2020

		@@ -1 +0,0 @@
		tuning/run_xvector_1a.sh No newline at end of file

[not-for-merge] Replacing the statistic pooling layer with self-attention layer in sr… #2223

Are you sure you want to change the base?

[not-for-merge] Replacing the statistic pooling layer with self-attention layer in sr… #2223

Conversation

tomkocse commented Feb 15, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

david-ryan-snyder commented Feb 20, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stale bot commented Jun 19, 2020

stale bot commented Jul 19, 2020

stale bot commented Sep 17, 2020

david-ryan-snyder commented Feb 20, 2018 •

edited