Datasets: Removal of ASR transcriptions #384
Unanswered
richardburleigh
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
From the paper:
In order to avoid learning “transcript-ese”, we developed many heuristics to detect and remove machine-generated transcripts from the training datasetIs this something you are willing to release? It would really benefit community efforts to fine-tune Whisper.
Beta Was this translation helpful? Give feedback.
All reactions