You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Decoder::State LuongDecoder::prepare(
const vector<DE::Expression> & seed,
dynet::ComputationGraph * cg,
constbool is_training) {
NMTKIT_CHECK_EQ(2 * num_layers_, seed.size(), "Invalid number of initial states.");
vector<DE::Expression> states;
for (unsigned i = 0; i < num_layers_; ++i) {
enc2dec_[i].prepare(cg);
states.emplace_back(enc2dec_[i].compute(seed[i]));
}
for (unsigned i = 0; i < num_layers_; ++i) {
states.emplace_back(DE::tanh(states[i]));
}
rnn_.set_dropout(is_training ? dropout_rate_ : 0.0f);
rnn_.new_graph(*cg);
rnn_.start_new_sequence(states);
dec2out_.prepare(cg);
// Zero vector for the initial feeding value.const DE::Expression init_feed = DE::input(
*cg, {out_embed_size_}, vector<float>(out_embed_size_, 0.0f));
return {{rnn_.state()}, {init_feed}};
}
In my understanding, seed is the final state (cell and hidden state) of an encoder obtained by the encoder's getStates(). When calculating states of 2 * num_layers_ in the above function, the last num_layers_ elements of states are obtained by using the first num_layers_ elements of it because of the following:
for (unsigned i = 0; i < num_layers_; ++i) {
enc2dec_[i].prepare(cg);
states.emplace_back(enc2dec_[i].compute(seed[i]));
}
for (unsigned i = 0; i < num_layers_; ++i) {
states.emplace_back(DE::tanh(states[i]));
}
I guess all elements of states are calculated by using only cell in an encoder but not cell and hidden state, for example, states = {c_1, c_2, ..., c_n, tanh(c_1), tanh(c_2), ..., tanh(c_n)}.
Do you have any reason instead of using the following example?
for (unsigned i = 0; i < num_layers_; ++i) {
enc2dec_[i].prepare(cg);
states.emplace_back(enc2dec_[i].compute(seed[i]));
}
for (unsigned i = 0; i < num_layers_; ++i) {
states.emplace_back(DE::tanh(seed[num_layers_ + i])); // dimension of seed[num_layers_ + i] will need to be reduced if encoder is bidirectional
}
The text was updated successfully, but these errors were encountered:
Thank you for the great work. NMTKit is easy to follow and helps me for understanding the use of DyNet.
I'm wondering initialization of states in a decoder by the final states of an encoder.
https://github.com/odashi/nmtkit/blob/master/nmtkit/luong_decoder.cc#L51
In my understanding,
seed
is the final state (cell and hidden state) of an encoder obtained by the encoder'sgetStates()
. When calculatingstates
of2 * num_layers_
in the above function, the lastnum_layers_
elements ofstates
are obtained by using the firstnum_layers_
elements of it because of the following:I guess all elements of
states
are calculated by using only cell in an encoder but not cell and hidden state, for example,states = {c_1, c_2, ..., c_n, tanh(c_1), tanh(c_2), ..., tanh(c_n)}
.Do you have any reason instead of using the following example?
The text was updated successfully, but these errors were encountered: