Skip to content
This repository has been archived by the owner on Nov 22, 2022. It is now read-only.

self-attention model is missing the "hops" feature #173

Open
silky opened this issue Dec 28, 2018 · 2 comments
Open

self-attention model is missing the "hops" feature #173

silky opened this issue Dec 28, 2018 · 2 comments
Assignees

Comments

@silky
Copy link
Contributor

silky commented Dec 28, 2018

in the original paper, there are two equations for attention, (5) and (6) (on page 3).

near as we can tell, in the code - https://github.com/facebookresearch/pytext/blob/master/pytext/models/representations/pooling.py#L22 - the ws2 tensor is of the wrong dimension; i.e. it's only of dim, in language of the paper, d_a instead of a matrix of r x d_a.

is this intentional?

thanks!

@hikushalhere
Copy link
Contributor

This is correct. The current implementation doesn't account for multi-hop attention. In effect it doesn't implement equation (6) in the paper but stops at equation (5). Thanks for pointing it out. We will get back on the fix which requires introducing a new parameter r which controls the number of attention hops.

@hikushalhere
Copy link
Contributor

@silky We had implemented single hop attention to get a simpler version working. Please feel free to send us a PR to add multi-hop attention.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants