-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mask for decoder #6
Comments
Hi, thank you for your response. But I still want to make sure that I understand correctly. Let me put the attention block below:
Is the |
If I make a matrix with upper right parts zero and multiply element-wise with attention matrix. For example
Would it take equivalent effects? |
Sorry, my previous answer is wrong and I have found the right answers.
|
We surely need a lower left triangular mask, as our expectation. |
Thanks, that is clear. |
Hi,
The training corpus is too large. The corpus in the repo is only
used for showing the format.
You may generate a large corpus for a good result.
[image: image.png]
Xiao.Liu <notifications@github.com> 于2018年8月21日周二 下午3:51写道:
… Hello, I tried the pinyin example with all the configuration untouched.
But the final test result is very bad. The result for 'ji zhi hu die zai
yang guang xia fei wu 。' is '斯对管道的资不仅能加中考。'
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADOKm9Z9_qvfQykEg_zquBbGfd-g0S0Jks5uS7v8gaJpZM4WClEA>
.
|
hello, I suspect that the mask you used for decoder is not correct.
In decoder, the mask you used is a matrix of which elements in the right upper triangle are one.
mask = K.cumsum(tf.eye(len_s, batch_shape=bs), 1)
In [4]: np.cumsum(np.eye(5), 1) Out[4]: array([[1., 1., 1., 1., 1.], [0., 1., 1., 1., 1.], [0., 0., 1., 1., 1.], [0., 0., 0., 1., 1.], [0., 0., 0., 0., 1.]])
That means, when you compute self attention, the first word will take the entire output sequence into account by WmaskV. That is not correct during training. And this problem could also impact the prediction.
The text was updated successfully, but these errors were encountered: