Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backward hidden state used to initialize decoder instead of Forward #2

Open
eridgd opened this issue Jun 3, 2019 · 0 comments
Open

Comments

@eridgd
Copy link

eridgd commented Jun 3, 2019

Hi,

I came across your paper while searching for seq2seq examples with aligned one-to-one mapping from inputs to labels. I appreciate the clarity of the code and am grateful that you made it available.

I haven't been able to run & step through the code yet, but one thing I noticed that seems to differ from the paper is this line where the initial hidden state for the decoder is extracted from the encoder https://github.com/liah-chan/sequence2sequenceNER/blob/master/scripts/train.py#L92:

last_hidden = (encoder_hidden[0][1].unsqueeze(0), encoder_hidden[1][1].unsqueeze(0))

encoder_hidden is the tuple (hidden, cell) each with shape (num_layers * num_directions, batch, hidden_size). It seems that encoder_hidden[0][1] would therefore be the hidden state of the backward direction at word 0, whereas the paper in §4.3 states that the forward output at word=seq_length is the initial hidden state:

On the decoder side, we use a single layer LSTM that generates label predictions step by step from the start to the end of the sentence. The last hidden state of the forward encoder RNN (−→ht) is used as the initial decoder hidden state.

Was your intent to use the forward or backward state for initialization? And do you think it would make a real difference since the decoder also receives the aligned F/B output states for each step?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant