Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

self-attention model is missing the "hops" feature #173

silky opened this issue Dec 28, 2018 · 2 comments


None yet
2 participants
Copy link

commented Dec 28, 2018

in the original paper, there are two equations for attention, (5) and (6) (on page 3).

near as we can tell, in the code - - the ws2 tensor is of the wrong dimension; i.e. it's only of dim, in language of the paper, d_a instead of a matrix of r x d_a.

is this intentional?



This comment has been minimized.

Copy link

commented Jan 3, 2019

This is correct. The current implementation doesn't account for multi-hop attention. In effect it doesn't implement equation (6) in the paper but stops at equation (5). Thanks for pointing it out. We will get back on the fix which requires introducing a new parameter r which controls the number of attention hops.


This comment has been minimized.

Copy link

commented Jan 8, 2019

@silky We had implemented single hop attention to get a simpler version working. Please feel free to send us a PR to add multi-hop attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.