Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A variable in the paper #14

Closed
elevenofji opened this issue Mar 28, 2021 · 1 comment
Closed

A variable in the paper #14

elevenofji opened this issue Mar 28, 2021 · 1 comment

Comments

@elevenofji
Copy link

What does the variable “A^k” mean in the transformer structure? Does it mean the attention matrix in different views?
In section 3.2 "Then the multi-head attention is performed over conversation tokens h^k{i:j} from different views k and form A^k separately."_

@jiaaoc
Copy link
Member

jiaaoc commented Apr 29, 2021

A^k is the attended results in the cross attention for view k

@jiaaoc jiaaoc closed this as completed Apr 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants