A variable in the paper #14

elevenofji · 2021-03-28T11:15:08Z

What does the variable “A^k” mean in the transformer structure? Does it mean the attention matrix in different views?
In section 3.2 "Then the multi-head attention is performed over conversation tokens h^k{i:j} from different views k and form A^k separately."_

jiaaoc · 2021-04-29T00:04:36Z

A^k is the attended results in the cross attention for view k

jiaaoc closed this as completed Apr 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A variable in the paper #14

A variable in the paper #14

elevenofji commented Mar 28, 2021

jiaaoc commented Apr 29, 2021

A variable in the paper #14

A variable in the paper #14

Comments

elevenofji commented Mar 28, 2021

jiaaoc commented Apr 29, 2021