Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The wrong imp of the inner-product operation #10

Open
XiaLiPKU opened this issue Oct 9, 2019 · 3 comments
Open

The wrong imp of the inner-product operation #10

XiaLiPKU opened this issue Oct 9, 2019 · 3 comments

Comments

@XiaLiPKU
Copy link

XiaLiPKU commented Oct 9, 2019

In Equation 2 of the paper, the query and the key are fed into inner-product operation, instead of point multiplication.

So the follow line

out = q_out * k_out

should be
out = (q_out * k_out).sum(dim=2)

@20171130
Copy link

I found the same problem. It seems the implementation in the code is equivalent to having #attention heads = #embed dimensions.

@ifeherva
Copy link

ifeherva commented Jan 3, 2020

@XiaLiPKU How would that modify line 49 and 50?

@canaltin
Copy link

canaltin commented Mar 19, 2020

@20171130 That was also my first opinion, but then there is an inconsistency with "groups" definition (to replicate the "attention heads") throughout the paper & the code.

Anyway, your alternative implementation helped me to understand the general concepts:
https://github.com/20171130/AttentionLite/blob/master/model.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants