Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How exactly does head embedding get used? #30

Closed
kalpitdixit opened this issue Aug 3, 2018 · 2 comments
Closed

How exactly does head embedding get used? #30

kalpitdixit opened this issue Aug 3, 2018 · 2 comments

Comments

@kalpitdixit
Copy link

  1. glove_50_300_2.txt is downloaded as head_embeddings. What exactly does the 50 refer to?
  2. How are these used in coref_model.py? It seems that context_outputs is computed a LSTM over the ELMo embeddings. Then for each span, the context_outputs give a distribution over the tokens. But then then this distribution is used to get a weighted sum over the head_embeddings.
    span_head_emb = tf.reduce_sum(span_attention * span_text_emb, 1) # [k, emb]

    What I want to confirm is that, head_embeddings are not used to calculate the distribution over head_embeddings?
  3. Also from the papers "Experimental Setup", I didn't follow the meaning of "window size" for word embeddings and LSTM?
    using GloVe word embeddings (Pennington et al., 2014) with a window size of 2 for the head word embeddings and a window size of 10 for the LSTM inputs.
@kentonl
Copy link
Owner

kentonl commented Aug 3, 2018

  1. 50 refers to the threshold for the minimum frequency of the vocabulary.
  2. Yes, your interpretation is exactly right. You can think of it as separating the key and values of the attention mechanisms. In this case, the head_embeddings are the values, and the keys are determined by the context_embeddings. In hindsight, I probably should have removed this separation for simplicity in the final version, since the improvement wasn't very large.
  3. The window size is referring to the hyperparameter on the x-axis in Figure 2b of the GloVe paper (https://nlp.stanford.edu/pubs/glove.pdf). The wording in the paper is a bit confusing. We were just trying to say that the context_embeddings have a window size of 10 and the head_embeddings have a window size of 2.

Hope that helps!

@kalpitdixit
Copy link
Author

Thanks for the fast and complete answers!

For "3." above, I see how using a smaller window size for the head_embeddings compared to the context_embeddings makes sense. Because the head_embeddings are used to represent a span which is typically a few tokens vs context_embeddings which are used to represent entire sentences.
Nice idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants