You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
glove_50_300_2.txt is downloaded as head_embeddings. What exactly does the 50 refer to?
How are these used in coref_model.py? It seems that context_outputs is computed a LSTM over the ELMo embeddings. Then for each span, the context_outputs give a distribution over the tokens. But then then this distribution is used to get a weighted sum over the head_embeddings.
What I want to confirm is that, head_embeddings are not used to calculate the distribution over head_embeddings?
Also from the papers "Experimental Setup", I didn't follow the meaning of "window size" for word embeddings and LSTM? using GloVe word embeddings (Pennington et al., 2014) with a window size of 2 for the head word embeddings and a window size of 10 for the LSTM inputs.
The text was updated successfully, but these errors were encountered:
50 refers to the threshold for the minimum frequency of the vocabulary.
Yes, your interpretation is exactly right. You can think of it as separating the key and values of the attention mechanisms. In this case, the head_embeddings are the values, and the keys are determined by the context_embeddings. In hindsight, I probably should have removed this separation for simplicity in the final version, since the improvement wasn't very large.
The window size is referring to the hyperparameter on the x-axis in Figure 2b of the GloVe paper (https://nlp.stanford.edu/pubs/glove.pdf). The wording in the paper is a bit confusing. We were just trying to say that the context_embeddings have a window size of 10 and the head_embeddings have a window size of 2.
For "3." above, I see how using a smaller window size for the head_embeddings compared to the context_embeddings makes sense. Because the head_embeddings are used to represent a span which is typically a few tokens vs context_embeddings which are used to represent entire sentences.
Nice idea.
e2e-coref/coref_model.py
Line 379 in a24d107
What I want to confirm is that, head_embeddings are not used to calculate the distribution over head_embeddings?
using GloVe word embeddings (Pennington et al., 2014) with a window size of 2 for the head word embeddings and a window size of 10 for the LSTM inputs.
The text was updated successfully, but these errors were encountered: