-
|
To fine-tune whisper's decoder, I should make a batch of training samples. Each sample has an audio_features tensor, which is the output of encoder, and a tokens tensor. @jongwook How did you do it while training, please? Similar problem is discussed here. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
You can pad the tensor (to the right) with any token, and the choice does not affect the training because of the autoregressive attention mask used in the decoder. It is also important that the loss is masked accordingly. In the code, the mask is implemented as the following: Line 175 in 28769fc |
Beta Was this translation helpful? Give feedback.
You can pad the tensor (to the right) with any token, and the choice does not affect the training because of the autoregressive attention mask used in the decoder. It is also important that the loss is masked accordingly.
In the code, the mask is implemented as the following:
whisper/whisper/model.py
Line 175 in 28769fc