New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port decomposable attention to Keras v2 #1445
Comments
I haven't actually run this code, but I think something like the following should work:
|
I think the I hope you can submit a pull request with your fixes once you get it working. You'll probably want to have a look for other issues about that example. I'm still not 100% sure the code implements Parikh's model correctly. I think the way the attention averages over the padded sentences might be wrong? There's good discussion of this in other issues. |
I had a go at this, and saved my work on a branch here: https://github.com/explosion/spaCy/tree/example/keras-parikh-entailment This is failing though. I can't be sure, but I think the problem is in the masking stuff? The masking was always a problem, and now I don't see how it's even supposed to work. I tried to find an answer on the issue tracker: https://github.com/fchollet/keras/issues?utf8=%E2%9C%93&q=masking . However Keras auto-closes issues after 30 days, so people just open new ones --- so there are almost 300 issues about masking. This makes it tough to understand the current recommendations :( |
@honnibal On https://github.com/explosion/spaCy/tree/example/keras-parikh-entailment one mistake I can spot is that Entailment module is expecting nr_hidden * 2 but is being actually fed Though this will make the model compile, masking will indeed be required to deal with variable length sequences in the same batch. If I understand correctly, careful masking will be required in both Align (right before computing softmax) and Compare modules (while taking pools). I might give it a try using Lambda layer at https://gist.github.com/braingineer/b64ca35223c7782667984d34ddb7a7fa |
@honnibal I really liked the method proposed by Parikh et al. and wanted an implementation in Keras. But like many, I struggled getting an algorithm that performed on par with their's. I had the same hunch about the masking issues and decided to go ahead and write a few layers that take care of this. It worked! You can find the implementation in the following Git repo: https://github.com/martbert/decomp_attn_keras. The implementation can certainly be improved but still, it's a good start. |
Merging this with #2758. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
First of all thanks for your nice and clean implementation. I want to port decomposable attention to Keras v2 But I have a problem. I asked the question in SO but didn't get any answer, would you please help me to fix this issue.
SO thread:
https://stackoverflow.com/questions/46843131/use-merge-layer-lambda-function-on-keras-2-0
The text was updated successfully, but these errors were encountered: