Port decomposable attention to Keras v2 #1445

hadifar · 2017-10-21T05:42:39Z

First of all thanks for your nice and clean implementation. I want to port decomposable attention to Keras v2 But I have a problem. I asked the question in SO but didn't get any answer, would you please help me to fix this issue.
SO thread:
https://stackoverflow.com/questions/46843131/use-merge-layer-lambda-function-on-keras-2-0

f11r · 2017-10-23T07:07:44Z

I haven't actually run this code, but I think something like the following should work:

  def __call__(self, sent1, sent2):
        def _outer(AB):
            att_ji = K.batch_dot(AB[1], K.permute_dimensions(AB[0], (0, 2, 1)))
            return K.permute_dimensions(att_ji, (0, 2, 1))

        from keras.layers import Lambda
        return Lambda(_outer, output_shape=(self.max_length, self.max_length))([self.model(sent1), self.model(sent2)])

honnibal · 2017-10-24T08:19:31Z

I think the _outer function probably needs to be moved to the function scope as well --- otherwise it might have a problem deserialising the model?

I hope you can submit a pull request with your fixes once you get it working. You'll probably want to have a look for other issues about that example. I'm still not 100% sure the code implements Parikh's model correctly. I think the way the attention averages over the padded sentences might be wrong? There's good discussion of this in other issues.

honnibal · 2017-11-05T17:59:11Z

I had a go at this, and saved my work on a branch here: https://github.com/explosion/spaCy/tree/example/keras-parikh-entailment

This is failing though. I can't be sure, but I think the problem is in the masking stuff? The masking was always a problem, and now I don't see how it's even supposed to work. I tried to find an answer on the issue tracker: https://github.com/fchollet/keras/issues?utf8=%E2%9C%93&q=masking . However Keras auto-closes issues after 30 days, so people just open new ones --- so there are almost 300 issues about masking. This makes it tough to understand the current recommendations :(

chiragjn · 2017-12-15T12:11:27Z

@honnibal On https://github.com/explosion/spaCy/tree/example/keras-parikh-entailment one mistake I can spot is that Entailment module is expecting nr_hidden * 2 but is being actually fed nr_hidden * 4 units at input Concat[AvgPool sent1, MaxPool sent1, AvgPool sent2, MaxPool sent2]

Though this will make the model compile, masking will indeed be required to deal with variable length sequences in the same batch. If I understand correctly, careful masking will be required in both Align (right before computing softmax) and Compare modules (while taking pools).

I might give it a try using Lambda layer at https://gist.github.com/braingineer/b64ca35223c7782667984d34ddb7a7fa
assuming zero embedding vector is reserved for padding. Will submit a PR if I can make it work.

martbert · 2018-02-23T16:07:03Z

@honnibal I really liked the method proposed by Parikh et al. and wanted an implementation in Keras. But like many, I struggled getting an algorithm that performed on par with their's. I had the same hunch about the masking issues and decided to go ahead and write a few layers that take care of this. It worked! You can find the implementation in the following Git repo: https://github.com/martbert/decomp_attn_keras. The implementation can certainly be improved but still, it's a good start.

ines · 2018-09-12T14:58:01Z

Merging this with #2758.

lock · 2018-10-12T14:58:26Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

ines added the examples Code examples in /examples label Oct 21, 2017

ines added a commit that referenced this issue Nov 6, 2017

Port over changes and add note on compat (see #1445)

c646365

ines added the help wanted Contributions welcome! label Nov 9, 2017

ines mentioned this issue Nov 9, 2017

Keras Entailment Example #767

Closed

ines added the third-party Third-party packages and services label Nov 9, 2017

This was referenced Nov 9, 2017

keras_parikh_entailment hyperparameters to reproduce the results #825

Closed

Model weights for the parikh entailment and sentence similarity siamese cnn #997

Closed

error on running the example keras_parikh_entailment #1098

Closed

This was referenced Mar 27, 2018

Deep Learning Example: Opening dataset encoding error on windows #1786

Closed

Memory error while running keras_parikh_entailment with insuranceQA #1677

Closed

ines mentioned this issue Sep 12, 2018

Fix Keras examples #2758

Closed

8 tasks

ines closed this as completed Sep 12, 2018

free-variation mentioned this issue Sep 27, 2018

Update Keras Example for (Parikh et al, 2016) implementation #2803

Merged

lock bot locked as resolved and limited conversation to collaborators Oct 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port decomposable attention to Keras v2 #1445

Port decomposable attention to Keras v2 #1445

hadifar commented Oct 21, 2017

f11r commented Oct 23, 2017

honnibal commented Oct 24, 2017

honnibal commented Nov 5, 2017

chiragjn commented Dec 15, 2017 •

edited

martbert commented Feb 23, 2018

ines commented Sep 12, 2018

lock bot commented Oct 12, 2018

Port decomposable attention to Keras v2 #1445

Port decomposable attention to Keras v2 #1445

Comments

hadifar commented Oct 21, 2017

f11r commented Oct 23, 2017

honnibal commented Oct 24, 2017

honnibal commented Nov 5, 2017

chiragjn commented Dec 15, 2017 • edited

martbert commented Feb 23, 2018

ines commented Sep 12, 2018

lock bot commented Oct 12, 2018

chiragjn commented Dec 15, 2017 •

edited