New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attention Model Available! #2067

Closed
shyamupa opened this Issue Mar 24, 2016 · 14 comments

Comments

Projects
None yet
8 participants
@shyamupa
Copy link

shyamupa commented Mar 24, 2016

Hi,

I implemented an attention model for doing textual entailment problems. Here is the code. Its a bit worse than the paper, but works decently well. Hope this comes handy for beginners in keras like me.

Comments are welcome!

Shout outs to @farizrahman4u @fchollet @pasky for their help and patience in answering queries on github.

@ymcui

This comment has been minimized.

Copy link

ymcui commented Mar 25, 2016

awesome job.
A minor comment on L139
https://github.com/shyamupa/snli-entailment/blob/master/amodel.py#L139
TimeDistributedDense layer will produce a 3D tensor shape of (batch_size,L,1), and when you apply the softmax activation, the output maybe not correct.
because in the last dimension there is only one unit, causing a constant output of 1, losing the meaning of attention.
I think you can try to use TimeDistributedDense with linear activation, then Flatten it (to get 2D tensor), and apply softmax afterwards.

@shyamupa

This comment has been minimized.

Copy link

shyamupa commented Mar 25, 2016

@ymcui Good catch! I tried your modification and am noticing some improvements. Thanks!

@pasky

This comment has been minimized.

Copy link
Contributor

pasky commented Mar 25, 2016

That's nice work!

What I got stuck on however when I was thinking about this is the fact that in the paper, they use two different RNNs in series, whereas you use only a single common RNN for both premise and hypothesis. I think that'll probably require some small Keras modifications to allow "initialize from node".

@shyamupa

This comment has been minimized.

Copy link

shyamupa commented Mar 25, 2016

True. I implemented what they called shared encoding. The difference b/w the models is about 2 pts in their experiments. Lasagne has this feature to initialize the hidden state, but writing this model there would lead to code bloat. Maybe something can be done for keras RNN too? :) @fchollet

@pasky

This comment has been minimized.

Copy link
Contributor

pasky commented Mar 25, 2016

Oh, I somehow missed that experiment. So this isn't that important. Nice!

@shyamupa

This comment has been minimized.

Copy link

shyamupa commented Mar 25, 2016

Depends on what you mean by important (2% on that dataset is about 200 questions). Also notice that I train embeddings along with the model, while they fix it to word2vec/glove.

@DingKe

This comment has been minimized.

Copy link
Contributor

DingKe commented Mar 26, 2016

@shyamupa The implementation of Bi-GRU seems problematic ( https://github.com/shyamupa/snli-entailment/blob/master/amodel.py#L128). This is an old issue: #2074 #1725 #1703 #1674 #1432 #1282 any plan to fix it officially? @fchollet

@shyamupa

This comment has been minimized.

Copy link

shyamupa commented Mar 26, 2016

I see, I was not aware of this. I was using LSTMs earlier, but switched to GRU because they are supposed to train faster. Hope LSTMs dont have the same issue..

@shyamupa shyamupa closed this Mar 27, 2016

@pasky

This comment has been minimized.

Copy link
Contributor

pasky commented Mar 27, 2016

The LSTMs have the same issue.

@dbonadiman

This comment has been minimized.

Copy link
Contributor

dbonadiman commented Mar 29, 2016

@DingKe I don't think that there are plans to fix the go_backward stuff because it is consistent with the go backward of Theano. At least it won't be solved at Backend level. I think Something needs to be done in the Recurrent class however. I originally added the go_backwards in the Recurrent class by simply wrapping the Theano scan keywords but we need to fix this issue at least in the example.

@pasky

This comment has been minimized.

Copy link
Contributor

pasky commented Mar 29, 2016

@akshaykgupta

This comment has been minimized.

Copy link

akshaykgupta commented Apr 7, 2016

Hi,

I'm trying to implement a similar attention model in Keras. Does the go_backwards bug still exist? If not, can someone give a small example on how to fix it.

Thanks.

@dongfangyixi

This comment has been minimized.

Copy link

dongfangyixi commented Sep 18, 2016

Hello,
Is there any idea about implement attention mechanism with mask?

@philipperemy

This comment has been minimized.

Copy link

philipperemy commented May 29, 2017

I've just started a project to collect all the possible information about attention with Keras:

https://github.com/philipperemy/keras-attention-mechanism

Check this out! It's still at an early stage. I'm currently working on it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment