self-attention model is missing the "hops" feature #173

silky · 2018-12-28T04:35:28Z

in the original paper, there are two equations for attention, (5) and (6) (on page 3).

near as we can tell, in the code - https://github.com/facebookresearch/pytext/blob/master/pytext/models/representations/pooling.py#L22 - the ws2 tensor is of the wrong dimension; i.e. it's only of dim, in language of the paper, d_a instead of a matrix of r x d_a.

is this intentional?

thanks!

hikushalhere · 2019-01-03T17:05:04Z

This is correct. The current implementation doesn't account for multi-hop attention. In effect it doesn't implement equation (6) in the paper but stops at equation (5). Thanks for pointing it out. We will get back on the fix which requires introducing a new parameter r which controls the number of attention hops.

hikushalhere · 2019-01-08T19:56:32Z

@silky We had implemented single hop attention to get a simpler version working. Please feel free to send us a PR to add multi-hop attention.

seayoung1112 assigned hikushalhere Jan 2, 2019

silky mentioned this issue Jan 8, 2019

graph is only added to tensorboard when exporter is present in the config #170

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

self-attention model is missing the "hops" feature #173

self-attention model is missing the "hops" feature #173

silky commented Dec 28, 2018

hikushalhere commented Jan 3, 2019

hikushalhere commented Jan 8, 2019

self-attention model is missing the "hops" feature #173

self-attention model is missing the "hops" feature #173

Comments

silky commented Dec 28, 2018

hikushalhere commented Jan 3, 2019

hikushalhere commented Jan 8, 2019