Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialization of decoder's states #15

Open
tma15 opened this issue Apr 1, 2019 · 0 comments
Open

Initialization of decoder's states #15

tma15 opened this issue Apr 1, 2019 · 0 comments

Comments

@tma15
Copy link

tma15 commented Apr 1, 2019

Thank you for the great work. NMTKit is easy to follow and helps me for understanding the use of DyNet.

I'm wondering initialization of states in a decoder by the final states of an encoder.

https://github.com/odashi/nmtkit/blob/master/nmtkit/luong_decoder.cc#L51

Decoder::State LuongDecoder::prepare(
    const vector<DE::Expression> & seed,
    dynet::ComputationGraph * cg,
    const bool is_training) {
  NMTKIT_CHECK_EQ(2 * num_layers_, seed.size(), "Invalid number of initial states.");
  vector<DE::Expression> states;
  for (unsigned i = 0; i < num_layers_; ++i) {
    enc2dec_[i].prepare(cg);
    states.emplace_back(enc2dec_[i].compute(seed[i]));
  }
  for (unsigned i = 0; i < num_layers_; ++i) {
    states.emplace_back(DE::tanh(states[i]));
  }
  rnn_.set_dropout(is_training ? dropout_rate_ : 0.0f);
  rnn_.new_graph(*cg);
  rnn_.start_new_sequence(states);
  dec2out_.prepare(cg);
  // Zero vector for the initial feeding value.
  const DE::Expression init_feed = DE::input(
      *cg, {out_embed_size_}, vector<float>(out_embed_size_, 0.0f));
  return {{rnn_.state()}, {init_feed}};
}

In my understanding, seed is the final state (cell and hidden state) of an encoder obtained by the encoder's getStates(). When calculating states of 2 * num_layers_ in the above function, the last num_layers_ elements of states are obtained by using the first num_layers_ elements of it because of the following:

  for (unsigned i = 0; i < num_layers_; ++i) {
    enc2dec_[i].prepare(cg);
    states.emplace_back(enc2dec_[i].compute(seed[i]));
  }
  for (unsigned i = 0; i < num_layers_; ++i) {
    states.emplace_back(DE::tanh(states[i]));
  }

I guess all elements of states are calculated by using only cell in an encoder but not cell and hidden state, for example, states = {c_1, c_2, ..., c_n, tanh(c_1), tanh(c_2), ..., tanh(c_n)}.
Do you have any reason instead of using the following example?

  for (unsigned i = 0; i < num_layers_; ++i) {
    enc2dec_[i].prepare(cg);
    states.emplace_back(enc2dec_[i].compute(seed[i]));
  }
  for (unsigned i = 0; i < num_layers_; ++i) {
    states.emplace_back(DE::tanh(seed[num_layers_ + i])); // dimension of seed[num_layers_ + i] will need to be reduced if encoder is bidirectional
  }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant