You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What does the variable “A^k” mean in the transformer structure? Does it mean the attention matrix in different views? In section 3.2 "Then the multi-head attention is performed over conversation tokens h^k{i:j} from different views k and form A^k separately."_
The text was updated successfully, but these errors were encountered:
What does the variable “A^k” mean in the transformer structure? Does it mean the attention matrix in different views?
In section 3.2 "Then the multi-head attention is performed over conversation tokens h^k{i:j} from different views k and form A^k separately."_
The text was updated successfully, but these errors were encountered: