-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
there is no one-layer MLP in attention Layer #5
Comments
I used Keras TimeDistributed(Dense()) layer to implement this verticle neurons interaction of LSTM output before feeding to attention layer. However, I really doubt if it's useful. |
Isn't line 174 |
I have problems at the same point. I can see that the "one layer MLP" is split in your code into the dense layer and line 174 in the attention-layer. The MLP in the article you mention in your blog consists of 1 input-layer (that are the 200 nodes for the 2*100 values of each wordvector-output of the Double-GRUs with 100 nodes each of the layer before) and one hidden layer (those are the 200 nodes of the dense layer) and one final layer. The final layer has ONE node and I think that is the one value per word that is computed at the beginning of the attention-layer in the code, isn't it? The attention-layer of course gets all MAX_WORDS=100 words as 200-length-vectors each at the same time as 100x200-matrix. So self.W in line 174 would a 200-length-vector of weights that is used on all words equally. So there are independant weights of the MLP from input to hidden (by use of TimeDistributed) and a common weight-vector from hidden to output. Now the questions are: |
I just looked up the TimeDistributed Layer-Wrapper again and realized that it means that the same weights are also shared among the input-hidden-layer-connection of the MLP. And I think I can also remember reading something about "shared weights" in the articles. So questions 1 and 2 are answered for me. It is just the 3rd one that is left: why use the hidden-layers of the MLP as word-context and not the output of l_lstm? |
Hello, Thank you for the excellent code for CNN,LSTM and HAN,I have learnt a lot from your code.
but, One question,
In the paper, I find there is a one-layer MLP in the attention layer, but I can not see this in your implementd AttLayer, in the class AttLayer, there is only one context vector. Could you expain it for me?
The text was updated successfully, but these errors were encountered: