ALiBi in self-Attention #4

Ldoun · 2021-11-15T09:06:03Z

What is your question?

first of all nice work! and thank you for sharing the code.
I noticed that the code use ALiBi in encoder-decoder attention but not in the transformer's self-Attention. Have you tried ALiBi in transformer self-attention? And Is there a reason you didn't use it for the self-attention layer?

ofirpress · 2021-11-15T19:15:28Z

Thanks!
We've actually currently implemented ALiBi for self-attention (casually masked) only. We have not implemented it for encoder-decoder attention or non-masked encoder attention yet.

Ldoun · 2021-11-15T22:57:57Z

oh, I missed that. thank you!

Ldoun added the question Further information is requested label Nov 15, 2021

ofirpress closed this as completed Nov 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ALiBi in self-Attention #4

ALiBi in self-Attention #4

Ldoun commented Nov 15, 2021

ofirpress commented Nov 15, 2021

Ldoun commented Nov 15, 2021

ALiBi in self-Attention #4

ALiBi in self-Attention #4

Comments

Ldoun commented Nov 15, 2021

What is your question?

ofirpress commented Nov 15, 2021

Ldoun commented Nov 15, 2021