The wrong imp of the inner-product operation #10

XiaLiPKU · 2019-10-09T07:08:14Z

In Equation 2 of the paper, the query and the key are fed into inner-product operation, instead of point multiplication.

So the follow line

Stand-Alone-Self-Attention/attention.py

Line 48 in e0a168e

out = q_out * k_out

should be
out = (q_out * k_out).sum(dim=2)

The text was updated successfully, but these errors were encountered:

20171130 · 2019-10-10T02:49:14Z

I found the same problem. It seems the implementation in the code is equivalent to having #attention heads = #embed dimensions.

ifeherva · 2020-01-03T21:41:04Z

@XiaLiPKU How would that modify line 49 and 50?

canaltin · 2020-03-19T09:01:08Z

@20171130 That was also my first opinion, but then there is an inconsistency with "groups" definition (to replicate the "attention heads") throughout the paper & the code.

Anyway, your alternative implementation helped me to understand the general concepts:
https://github.com/20171130/AttentionLite/blob/master/model.py

jnhwkim mentioned this issue Oct 6, 2020

Add sum(dim=2) for dot-product #24

Closed

KinWaiCheuk mentioned this issue Dec 3, 2020

matrix multiplication instead of scalar dot product #16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The wrong imp of the inner-product operation #10

The wrong imp of the inner-product operation #10

XiaLiPKU commented Oct 9, 2019

20171130 commented Oct 10, 2019

ifeherva commented Jan 3, 2020

canaltin commented Mar 19, 2020 •

edited

Loading

The wrong imp of the inner-product operation #10

The wrong imp of the inner-product operation #10

Comments

XiaLiPKU commented Oct 9, 2019

20171130 commented Oct 10, 2019

ifeherva commented Jan 3, 2020

canaltin commented Mar 19, 2020 • edited Loading

canaltin commented Mar 19, 2020 •

edited

Loading