why "split" to get multi-head? #55

LifangD · 2018-10-18T13:03:53Z

as the paper said or in some other implementation:
self.w_qs = nn.Linear(d_model, n_head * d_k)
the data size is larger.
but in this project, it is

       Q_ = tf.concat(tf.split(Q, num_heads, axis=2), axis=0) # (h*N, T_q, C/h) 
       K_ = tf.concat(tf.split(K, num_heads, axis=2), axis=0) # (h*N, T_k, C/h) 
       V_ = tf.concat(tf.split(V, num_heads, axis=2), axis=0) # (h*N, T_k, C/h)

it's like using partial of Q/K/V to form one head.
Can anyone help to explain why it uses "split" and "concat" to get multi-head?

Thanks!

The text was updated successfully, but these errors were encountered:

ty5491003 · 2019-03-18T12:08:02Z

I noted you closed this issue, do you have the answer? I have the same question.
@LifangD thx.

LifangD · 2019-03-18T14:43:58Z

Because of the parametric way, I think multiple projection to lower dimension is equivalent to "first project then split" , e.g: 100d ->100d->split(25d * 4), the number of parameters is 100 * 100; multiple projection(4times): 100d->25d, the number of parameters is 100*25 *4. And "split and concat " looks more elegant. By the way, I also tried such two methods in my own experiments and found no difference on final performance. @ty5491003

ty5491003 · 2019-03-19T01:16:10Z

I got it, thx /:laughing:/

LifangD closed this as completed Oct 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why "split" to get multi-head? #55

why "split" to get multi-head? #55

LifangD commented Oct 18, 2018 •

edited

ty5491003 commented Mar 18, 2019 •

edited

LifangD commented Mar 18, 2019 •

edited

ty5491003 commented Mar 19, 2019

why "split" to get multi-head? #55

why "split" to get multi-head? #55

Comments

LifangD commented Oct 18, 2018 • edited

ty5491003 commented Mar 18, 2019 • edited

LifangD commented Mar 18, 2019 • edited

ty5491003 commented Mar 19, 2019

LifangD commented Oct 18, 2018 •

edited

ty5491003 commented Mar 18, 2019 •

edited

LifangD commented Mar 18, 2019 •

edited