`tf.sign(tf.abs(tf.reduce_sum` vs `tf.sign(tf.reduce_sum(tf.abs(` for generating masks? #14

pmixer · 2020-09-15T19:50:06Z

Line 185 in 641c378

key_masks = tf.sign(tf.abs(tf.reduce_sum(keys, axis=-1))) # (N, T_k)

Hi Guys,
I'm reading the code for porting the implementation to PyTorch for personal use, the code looks well written and documented, thx for the great work :)

Moreover, as self attention module is borrowed from another project, some details may not be 100% right according to my observation(despite some magic numbers like -2^32+1 for enforcing softmax to output 0 for the entry which kills code readability), as an example, for query and key mask generation, the code used tf.sign+tf.abs+tf.reduce_sum combination for generating the masks but the order seems slightly wrong, as we are trying to mask the query/key of all 0 values in channel/embedding-dim, the right way might be firstly apply abs, then do reduce_sum and finally use sign to generate the results, but current implementation firstly use reduce_sum, later used abs and lastly apply sign, the two approaches should generate same results for most case as sum-to-zero is of low probability for high-dimensional fp32 vectors but it's still wrong and may generate incorrect outputs for corner cases.

Just want to check my assumption as stated above, pls respond if you happen to have time @kang205 @JiachengLi1995, thx!

Regards,
Zan

The text was updated successfully, but these errors were encountered:

kang205 · 2020-09-15T20:05:27Z

Thanks for looking into the details of the implementation. It seems the self-attention modules in the original repo (https://github.com/Kyubyong/transformer) is updated, and the code is looking mucher simpler and cleaner right now. And I remember that I've used that for SASRec in my own experiments (e.g. replacing the code in modules.py), and it seems work as well. I'm not sure if they fixed this issue. If that's the case, we may update the github code to use the newer implementation of self-attention.

pmixer · 2020-09-15T20:18:38Z

Thanks for looking into the details of the implementation. It seems the self-attention modules in the original repo (https://github.com/Kyubyong/transformer) is updated, and the code is looking mucher simpler and cleaner right now. And I remember that I've used that for SASRec in my own experiments (e.g. replacing the code in modules.py), and it seems work as well. I'm not sure if they fixed this issue. If that's the case, we may update the github code to use the newer implementation of self-attention.

Thank you Wang-Cheng! It's just a small issue, and yes, they have updated the transformer repo and commented this part(as they have identified our life would be easier if masks are set as arguments and easily obtained in seq level rather than in seq embedding level by src_masks = tf.math.equal(x, 0) lol), I'll try to get it confirmed by cc the thread to transformer repo you mentioned and keep you updated for future feedback.

The issue got created just FYI, no need to update the repo for experiment usage, but it would be better to let peers know the detail.

Have a good day :)

pmixer · 2020-09-15T20:20:33Z

Thanks for looking into the details of the implementation. It seems the self-attention modules in the original repo (https://github.com/Kyubyong/transformer) is updated, and the code is looking mucher simpler and cleaner right now. And I remember that I've used that for SASRec in my own experiments (e.g. replacing the code in modules.py), and it seems work as well. I'm not sure if they fixed this issue. If that's the case, we may update the github code to use the newer implementation of self-attention.

Oh, I just found they did correct it before commenting the part, pls see:

https://github.com/Kyubyong/transformer/blob/fb023bb097e08d53baf25b46a9da490beba51a21/modules.py#L134

as I expected, abs should be applied before reduce_sum, no need to ping the guys in that repo now :)

kang205 · 2020-09-15T20:23:40Z

Thanks for checking this!

This was referenced Sep 15, 2020

fix a small bug for generating masks #15

Merged

fix a small bug for mask generation JiachengLi1995/TiSASRec#1

Merged

kang205 closed this as completed Sep 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`tf.sign(tf.abs(tf.reduce_sum` vs `tf.sign(tf.reduce_sum(tf.abs(` for generating masks? #14

`tf.sign(tf.abs(tf.reduce_sum` vs `tf.sign(tf.reduce_sum(tf.abs(` for generating masks? #14

pmixer commented Sep 15, 2020

kang205 commented Sep 15, 2020 •

edited

Loading

pmixer commented Sep 15, 2020

pmixer commented Sep 15, 2020

kang205 commented Sep 15, 2020

tf.sign(tf.abs(tf.reduce_sum vs tf.sign(tf.reduce_sum(tf.abs( for generating masks? #14

tf.sign(tf.abs(tf.reduce_sum vs tf.sign(tf.reduce_sum(tf.abs( for generating masks? #14

Comments

pmixer commented Sep 15, 2020

kang205 commented Sep 15, 2020 • edited Loading

pmixer commented Sep 15, 2020

pmixer commented Sep 15, 2020

kang205 commented Sep 15, 2020

`tf.sign(tf.abs(tf.reduce_sum` vs `tf.sign(tf.reduce_sum(tf.abs(` for generating masks? #14

`tf.sign(tf.abs(tf.reduce_sum` vs `tf.sign(tf.reduce_sum(tf.abs(` for generating masks? #14

kang205 commented Sep 15, 2020 •

edited

Loading