Some question about ATRank #11

crazygirlfym · 2019-09-02T12:25:52Z

It seems that you atrank is different from the one described in paper. For example, bilinear attention is used in paper, but scale-dot attention here. Vanilla attention in paper, but multi-head attention here.

lizy124 · 2019-10-11T02:23:57Z

Query comes from the decoding layer, key and value comes from the encoding layer and is called vanilla attention (which the paper didn't say), the most basic attention. Query, key and value are all from the encoding layer called self attention.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some question about ATRank #11

Some question about ATRank #11

crazygirlfym commented Sep 2, 2019

lizy124 commented Oct 11, 2019

Some question about ATRank #11

Some question about ATRank #11

Comments

crazygirlfym commented Sep 2, 2019

lizy124 commented Oct 11, 2019