Is YOSO adap to attention with bias #1

etrigger · 2022-03-14T06:52:07Z

If the self-attention containes some bias, e.g. (Q*K+bias_1+bias_2)*V, can we still aplly YOSO or Nystromformer to speed up the computation? If we can still do it, can you give me some hit on how to deal with this case?

mlpen · 2022-03-17T04:52:11Z

I assume that you meant softmax(Q*K+bias_1+bias_2)*V, otherwise you can compute it using distributive law. Then, one approach you can do is constructing a new set of Q' and K': Q' = [Q, bias_1, bias_2] and K' = [K, 1, 1] such that Q'K' = QK+bias_1+bias_2. Next, regular attention approximation can be applied to the new Q' and K'.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is YOSO adap to attention with bias #1

Is YOSO adap to attention with bias #1

etrigger commented Mar 14, 2022

mlpen commented Mar 17, 2022

Is YOSO adap to attention with bias #1

Is YOSO adap to attention with bias #1

Comments

etrigger commented Mar 14, 2022

mlpen commented Mar 17, 2022