Good work there! I've read the Decoder code and the pre-training data preprocessing code, and am a bit confused right now. It seems that neither codes add an token to the target id list. Is this intended? I thought this should be necessary for correct training.