You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I came across your paper while searching for seq2seq examples with aligned one-to-one mapping from inputs to labels. I appreciate the clarity of the code and am grateful that you made it available.
encoder_hidden is the tuple (hidden, cell) each with shape (num_layers * num_directions, batch, hidden_size). It seems that encoder_hidden[0][1] would therefore be the hidden state of the backward direction at word 0, whereas the paper in §4.3 states that the forward output at word=seq_length is the initial hidden state:
On the decoder side, we use a single layer LSTM that generates label predictions step by step from the start to the end of the sentence. The last hidden state of the forward encoder RNN (−→ht) is used as the initial decoder hidden state.
Was your intent to use the forward or backward state for initialization? And do you think it would make a real difference since the decoder also receives the aligned F/B output states for each step?
Thanks!
The text was updated successfully, but these errors were encountered:
Hi,
I came across your paper while searching for seq2seq examples with aligned one-to-one mapping from inputs to labels. I appreciate the clarity of the code and am grateful that you made it available.
I haven't been able to run & step through the code yet, but one thing I noticed that seems to differ from the paper is this line where the initial hidden state for the decoder is extracted from the encoder https://github.com/liah-chan/sequence2sequenceNER/blob/master/scripts/train.py#L92:
last_hidden = (encoder_hidden[0][1].unsqueeze(0), encoder_hidden[1][1].unsqueeze(0))
encoder_hidden is the tuple (hidden, cell) each with shape (num_layers * num_directions, batch, hidden_size). It seems that
encoder_hidden[0][1]
would therefore be the hidden state of the backward direction at word 0, whereas the paper in §4.3 states that the forward output at word=seq_length is the initial hidden state:Was your intent to use the forward or backward state for initialization? And do you think it would make a real difference since the decoder also receives the aligned F/B output states for each step?
Thanks!
The text was updated successfully, but these errors were encountered: