Normally, a transformer model will take the last hidden state of an encoder as the input to a decoder. However, I saw that the hidden state obtained from the encoder of UniXcoder is not used as the decoder's input to generate output. Actually, I only saw that the variable past_hidden_state is used. So I'm not sure if the implementation is correct.