How exactly does head embedding get used? #30

kalpitdixit · 2018-08-03T17:55:38Z

glove_50_300_2.txt is downloaded as head_embeddings. What exactly does the 50 refer to?
How are these used in coref_model.py? It seems that context_outputs is computed a LSTM over the ELMo embeddings. Then for each span, the context_outputs give a distribution over the tokens. But then then this distribution is used to get a weighted sum over the head_embeddings.

e2e-coref/coref_model.py

Line 379 in a24d107

span_head_emb = tf.reduce_sum(span_attention * span_text_emb, 1) # [k, emb]

What I want to confirm is that, head_embeddings are not used to calculate the distribution over head_embeddings?
Also from the papers "Experimental Setup", I didn't follow the meaning of "window size" for word embeddings and LSTM?
using GloVe word embeddings (Pennington et al., 2014) with a window size of 2 for the head word embeddings and a window size of 10 for the LSTM inputs.

kentonl · 2018-08-03T19:08:48Z

50 refers to the threshold for the minimum frequency of the vocabulary.
Yes, your interpretation is exactly right. You can think of it as separating the key and values of the attention mechanisms. In this case, the head_embeddings are the values, and the keys are determined by the context_embeddings. In hindsight, I probably should have removed this separation for simplicity in the final version, since the improvement wasn't very large.
The window size is referring to the hyperparameter on the x-axis in Figure 2b of the GloVe paper (https://nlp.stanford.edu/pubs/glove.pdf). The wording in the paper is a bit confusing. We were just trying to say that the context_embeddings have a window size of 10 and the head_embeddings have a window size of 2.

Hope that helps!

kalpitdixit · 2018-08-04T19:06:46Z

Thanks for the fast and complete answers!

For "3." above, I see how using a smaller window size for the head_embeddings compared to the context_embeddings makes sense. Because the head_embeddings are used to represent a span which is typically a few tokens vs context_embeddings which are used to represent entire sentences.
Nice idea.

kalpitdixit closed this as completed Aug 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How exactly does head embedding get used? #30

How exactly does head embedding get used? #30

kalpitdixit commented Aug 3, 2018

kentonl commented Aug 3, 2018

kalpitdixit commented Aug 4, 2018

How exactly does head embedding get used? #30

How exactly does head embedding get used? #30

Comments

kalpitdixit commented Aug 3, 2018

kentonl commented Aug 3, 2018

kalpitdixit commented Aug 4, 2018