mask for decoder #6

XiaoLiuAI · 2018-08-18T13:54:10Z

hello, I suspect that the mask you used for decoder is not correct.
In decoder, the mask you used is a matrix of which elements in the right upper triangle are one.

mask = K.cumsum(tf.eye(len_s, batch_shape=bs), 1)
In [4]: np.cumsum(np.eye(5), 1) Out[4]: array([[1., 1., 1., 1., 1.], [0., 1., 1., 1., 1.], [0., 0., 1., 1., 1.], [0., 0., 0., 1., 1.], [0., 0., 0., 0., 1.]])

That means, when you compute self attention, the first word will take the entire output sequence into account by WmaskV. That is not correct during training. And this problem could also impact the prediction.

The text was updated successfully, but these errors were encountered:

XiaoLiuAI · 2018-08-19T09:50:00Z

Hi, thank you for your response. But I still want to make sure that I understand correctly. Let me put the attention block below:

       attn = Lambda(lambda x:K.batch_dot(x[0],x[1],axes=[2,2])/self.temper)([q, k])
        if mask is not None:
            mmask = Lambda(lambda x:(-1e+10)*(1-x))(mask)
            attn = Add()([attn, mmask])
        attn = Activation('softmax')(attn)
        attn = self.dropout(attn)
        output = Lambda(lambda x:K.batch_dot(x[0], x[1]))([attn, v])

Is the Activation layer assure that the mask is "column-based"? It applies softmax on the last dimension, which is the column of attention matrix?

XiaoLiuAI · 2018-08-19T10:16:41Z

If I make a matrix with upper right parts zero and multiply element-wise with attention matrix. For example

mask = tf.matrix_band_part(tf.ones_like(q.shape[1], k.shape[1]), -1, 0)
...
attn = Multiply()([attn, mask])

Would it take equivalent effects?

lsdefine · 2018-08-19T10:19:13Z

Sorry, my previous answer is wrong and I have found the right answers.
The experiment shows that using 1-mask+eye. The training accu & dev accu quickly go to near 100% but the model cannot process any user inputs. This situation means the model is using the future information.
The problem is: the axis 1 is not the column because there is a "Batch" axis.

>>> K.eval(GetSubMask(q))   # mask = K.cumsum(tf.eye(len_s, batch_shape=bs), 1)
array([[[1., 0., 0., 0., 0., 0., 0.],
        [1., 1., 0., 0., 0., 0., 0.],
        [1., 1., 1., 0., 0., 0., 0.],
        [1., 1., 1., 1., 0., 0., 0.],
        [1., 1., 1., 1., 1., 0., 0.],
        [1., 1., 1., 1., 1., 1., 0.],
        [1., 1., 1., 1., 1., 1., 1.]]], dtype=float32)
>>> np.cumsum(np.eye(5), 1)   # Your question
array([[1., 1., 1., 1., 1.],
       [0., 1., 1., 1., 1.],
       [0., 0., 1., 1., 1.],
       [0., 0., 0., 1., 1.],
       [0., 0., 0., 0., 1.]])
>>> np.cumsum(np.eye(5), 0)   # If no "Batch" axis, the cum axis is 0
array([[1., 0., 0., 0., 0.],
       [1., 1., 0., 0., 0.],
       [1., 1., 1., 0., 0.],
       [1., 1., 1., 1., 0.],
       [1., 1., 1., 1., 1.]])

lsdefine · 2018-08-19T10:25:14Z

We surely need a lower left triangular mask, as our expectation.

XiaoLiuAI · 2018-08-19T10:33:58Z

Thanks, that is clear.

lsdefine · 2018-08-21T09:19:12Z

Hi, The training corpus is too large. The corpus in the repo is only used for showing the format. You may generate a large corpus for a good result. [image: image.png] Xiao.Liu <notifications@github.com> 于2018年8月21日周二下午3:51写道：

…

Hello, I tried the pinyin example with all the configuration untouched. But the final test result is very bad. The result for 'ji zhi hu die zai yang guang xia fei wu 。' is '斯对管道的资不仅能加中考。' — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#6 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADOKm9Z9_qvfQykEg_zquBbGfd-g0S0Jks5uS7v8gaJpZM4WClEA> .

XiaoLiuAI closed this as completed Aug 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mask for decoder #6

mask for decoder #6

XiaoLiuAI commented Aug 18, 2018

XiaoLiuAI commented Aug 19, 2018

XiaoLiuAI commented Aug 19, 2018

lsdefine commented Aug 19, 2018

lsdefine commented Aug 19, 2018

XiaoLiuAI commented Aug 19, 2018

lsdefine commented Aug 21, 2018 via email

mask for decoder #6

mask for decoder #6

Comments

XiaoLiuAI commented Aug 18, 2018

XiaoLiuAI commented Aug 19, 2018

XiaoLiuAI commented Aug 19, 2018

lsdefine commented Aug 19, 2018

lsdefine commented Aug 19, 2018

XiaoLiuAI commented Aug 19, 2018

lsdefine commented Aug 21, 2018 via email