-
|
I am hoping to use Whisper for a research project, and I need to make sure that it hasn't been trained on the data I'll be using for evaluation. I'm using Switchboard NXT (because of the rich annotation set available), and will be evaluating on the conventional dev and test sections of the corpus. I couldn't find info in the paper about whether existing corpora like Switchboard were used, though I may have missed it. Is Switchboard included in Whisper's training data? If so, was all of it included, or was some portion of dev/test data omitted? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
Hi, we did not use Switchboard as part of our training mix. However, due to the noisy nature of the large training dataset, there might be a small number of samples of the audio from the dataset scattered into the mix. |
Beta Was this translation helpful? Give feedback.
Hi, we did not use Switchboard as part of our training mix. However, due to the noisy nature of the large training dataset, there might be a small number of samples of the audio from the dataset scattered into the mix.