You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@20171130 That was also my first opinion, but then there is an inconsistency with "groups" definition (to replicate the "attention heads") throughout the paper & the code.
In Equation 2 of the paper, the query and the key are fed into inner-product operation, instead of point multiplication.
So the follow line
Stand-Alone-Self-Attention/attention.py
Line 48 in e0a168e
should be
out = (q_out * k_out).sum(dim=2)
The text was updated successfully, but these errors were encountered: