Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
self-attention model is missing the "hops" feature #173
in the original paper, there are two equations for attention, (5) and (6) (on page 3).
near as we can tell, in the code - https://github.com/facebookresearch/pytext/blob/master/pytext/models/representations/pooling.py#L22 - the ws2 tensor is of the wrong dimension; i.e. it's only of dim, in language of the paper, d_a instead of a matrix of r x d_a.
is this intentional?
This is correct. The current implementation doesn't account for multi-hop attention. In effect it doesn't implement equation (6) in the paper but stops at equation (5). Thanks for pointing it out. We will get back on the fix which requires introducing a new parameter