New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
does the key masking work? #33
Comments
I have the same question, I don't think the mask work!. |
I tested the model, you are right, all of masking are not work! @Kyubyong |
i found that there are two method implement position embedding,key_mask can work on PE which params are training among model,but not work on PE which the paper says,because the padding embeddings are not 0 !who can fix it? @Kyubyong |
I alos found the same question. if the raw code are used, the mask doesn't work at all. I verify the code like this: where length_batch is the length of each sentence in the batch |
I recently ran this model and found that it has not been running. |
in the length_embedding, the second embedding of one should be a shape[1, num_units]? not the max_len - 1 |
Hi @Kyubyong
as you can see the key masking code as following:
# Key Masking
key_masks = tf.sign(tf.abs(tf.reduce_sum(keys, axis=-1))) # (N, T_k)
key_masks = tf.tile(key_masks, [num_heads, 1]) # (h*N, T_k)
key_masks = tf.tile(tf.expand_dims(key_masks, 1), [1, tf.shape(queries)[1], 1]) # (h*N, T_q, T_k)
the params keys,is the sum of word_embedding and position_embedding. it means that even the word in a sentence is padding 0, as add postion_embedding to the word_embedding, there's no 0 vector for the final word_embedding. therefore, the key_masks must all be one, no zero! so I'm confused if the code works?
The text was updated successfully, but these errors were encountered: