does the key masking work? #33

liuwei1206 · 2018-05-26T06:49:14Z

Hi @Kyubyong
as you can see the key masking code as following:

# Key Masking
key_masks = tf.sign(tf.abs(tf.reduce_sum(keys, axis=-1))) # (N, T_k)
key_masks = tf.tile(key_masks, [num_heads, 1]) # (h*N, T_k)
key_masks = tf.tile(tf.expand_dims(key_masks, 1), [1, tf.shape(queries)[1], 1]) # (h*N, T_q, T_k)

the params keys，is the sum of word_embedding and position_embedding. it means that even the word in a sentence is padding 0, as add postion_embedding to the word_embedding, there's no 0 vector for the final word_embedding. therefore, the key_masks must all be one, no zero! so I'm confused if the code works?

The text was updated successfully, but these errors were encountered:

Duum · 2018-07-23T09:10:06Z

I have the same question, I don't think the mask work!.

mingxiansen · 2018-08-01T15:19:44Z

I tested the model, you are right, all of masking are not work! @Kyubyong

bobobe · 2018-09-05T03:48:15Z

i found that there are two method implement position embedding，key_mask can work on PE which params are training among model，but not work on PE which the paper says，because the padding embeddings are not 0 ！who can fix it？ @Kyubyong

shaunzhuyw · 2018-11-09T07:13:20Z

I alos found the same question. if the raw code are used, the mask doesn't work at all. I verify the code like this:
self.length_mask = tf.cast(tf.sequence_mask(length_batch, maxlen), tf.int32)
length_embedding = tf.Variable(tf.concat([tf.zeros(shape=(1, num_units)), tf.ones(shape=(maxlen-1, num_units))], 0), trainable=False)
self.length_mask_embedding = tf.nn.embedding_lookup(length_embedding, self.length_mask)
self.dec_position_embedding *= self.length_mask_embedding
self.dec += self.dec_position_embedding

where length_batch is the length of each sentence in the batch

Yang-Charles · 2018-11-26T12:57:21Z

I recently ran this model and found that it has not been running.
tqdm progressbar no running!

zsgchinese · 2018-12-06T05:34:42Z

I alos found the same question. if the raw code are used, the mask doesn't work at all. I verify the code like this:
self.length_mask = tf.cast(tf.sequence_mask(length_batch, maxlen), tf.int32)
length_embedding = tf.Variable(tf.concat([tf.zeros(shape=(1, num_units)), tf.ones(shape=(maxlen-1, num_units))], 0), trainable=False)
self.length_mask_embedding = tf.nn.embedding_lookup(length_embedding, self.length_mask)
self.dec_position_embedding *= self.length_mask_embedding
self.dec += self.dec_position_embedding

where length_batch is the length of each sentence in the batch

in the length_embedding, the second embedding of one should be a shape[1, num_units]? not the max_len - 1

liuwei1206 closed this as completed Dec 11, 2018

haoransh mentioned this issue Jan 13, 2019

Question on maskings #16

Open

achillesliu mentioned this issue Jan 15, 2019

Why doing Key and Query Masking ? #3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

does the key masking work? #33

does the key masking work? #33

liuwei1206 commented May 26, 2018 •

edited

Duum commented Jul 23, 2018

mingxiansen commented Aug 1, 2018

bobobe commented Sep 5, 2018 •

edited

shaunzhuyw commented Nov 9, 2018

Yang-Charles commented Nov 26, 2018

zsgchinese commented Dec 6, 2018

does the key masking work? #33

does the key masking work? #33

Comments

liuwei1206 commented May 26, 2018 • edited

Duum commented Jul 23, 2018

mingxiansen commented Aug 1, 2018

bobobe commented Sep 5, 2018 • edited

shaunzhuyw commented Nov 9, 2018

Yang-Charles commented Nov 26, 2018

zsgchinese commented Dec 6, 2018

liuwei1206 commented May 26, 2018 •

edited

bobobe commented Sep 5, 2018 •

edited