Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port decomposable attention to Keras v2 #1445

Closed
hadifar opened this issue Oct 21, 2017 · 7 comments
Closed

Port decomposable attention to Keras v2 #1445

hadifar opened this issue Oct 21, 2017 · 7 comments
Labels
examples Code examples in /examples help wanted Contributions welcome! third-party Third-party packages and services

Comments

@hadifar
Copy link

hadifar commented Oct 21, 2017

First of all thanks for your nice and clean implementation. I want to port decomposable attention to Keras v2 But I have a problem. I asked the question in SO but didn't get any answer, would you please help me to fix this issue.
SO thread:
https://stackoverflow.com/questions/46843131/use-merge-layer-lambda-function-on-keras-2-0

@ines ines added the examples Code examples in /examples label Oct 21, 2017
@f11r
Copy link
Contributor

f11r commented Oct 23, 2017

I haven't actually run this code, but I think something like the following should work:

  def __call__(self, sent1, sent2):
        def _outer(AB):
            att_ji = K.batch_dot(AB[1], K.permute_dimensions(AB[0], (0, 2, 1)))
            return K.permute_dimensions(att_ji, (0, 2, 1))

        from keras.layers import Lambda
        return Lambda(_outer, output_shape=(self.max_length, self.max_length))([self.model(sent1), self.model(sent2)])

@honnibal
Copy link
Member

I think the _outer function probably needs to be moved to the function scope as well --- otherwise it might have a problem deserialising the model?

I hope you can submit a pull request with your fixes once you get it working. You'll probably want to have a look for other issues about that example. I'm still not 100% sure the code implements Parikh's model correctly. I think the way the attention averages over the padded sentences might be wrong? There's good discussion of this in other issues.

@honnibal
Copy link
Member

honnibal commented Nov 5, 2017

I had a go at this, and saved my work on a branch here: https://github.com/explosion/spaCy/tree/example/keras-parikh-entailment

This is failing though. I can't be sure, but I think the problem is in the masking stuff? The masking was always a problem, and now I don't see how it's even supposed to work. I tried to find an answer on the issue tracker: https://github.com/fchollet/keras/issues?utf8=%E2%9C%93&q=masking . However Keras auto-closes issues after 30 days, so people just open new ones --- so there are almost 300 issues about masking. This makes it tough to understand the current recommendations :(

@chiragjn
Copy link

chiragjn commented Dec 15, 2017

@honnibal On https://github.com/explosion/spaCy/tree/example/keras-parikh-entailment one mistake I can spot is that Entailment module is expecting nr_hidden * 2 but is being actually fed nr_hidden * 4 units at input Concat[AvgPool sent1, MaxPool sent1, AvgPool sent2, MaxPool sent2]

Though this will make the model compile, masking will indeed be required to deal with variable length sequences in the same batch. If I understand correctly, careful masking will be required in both Align (right before computing softmax) and Compare modules (while taking pools).

I might give it a try using Lambda layer at https://gist.github.com/braingineer/b64ca35223c7782667984d34ddb7a7fa
assuming zero embedding vector is reserved for padding. Will submit a PR if I can make it work.

@martbert
Copy link

@honnibal I really liked the method proposed by Parikh et al. and wanted an implementation in Keras. But like many, I struggled getting an algorithm that performed on par with their's. I had the same hunch about the masking issues and decided to go ahead and write a few layers that take care of this. It worked! You can find the implementation in the following Git repo: https://github.com/martbert/decomp_attn_keras. The implementation can certainly be improved but still, it's a good start.

@ines
Copy link
Member

ines commented Sep 12, 2018

Merging this with #2758.

@lock
Copy link

lock bot commented Oct 12, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Oct 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
examples Code examples in /examples help wanted Contributions welcome! third-party Third-party packages and services
Projects
None yet
Development

No branches or pull requests

6 participants