-
Notifications
You must be signed in to change notification settings - Fork 25.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ASR] Add official ASR CTC example to examples/pytorch/speech-recognition
#13620
[ASR] Add official ASR CTC example to examples/pytorch/speech-recognition
#13620
Conversation
…into add_asr_example
…into add_asr_example
# 3. Next, we create the vocabulary of the model by extracting all unique characters from | ||
# the training and evaluation datasets | ||
# We need to make sure that only first rank saves vocabulary | ||
if training_args.world_size == 1 or dist.get_rank() == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this caused me an headache for 3 days -> in distributed training each process was creating a different ordering of characters in the vocabulary which essentially meant that each process has different label ids.
By using sorted(...)
and making sure that only the first process creates & saves the vocabulary, the problem is solved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find!
examples/pytorch/speech-recognition/run_speech_recognition_ctc.py
Outdated
Show resolved
Hide resolved
examples/pytorch/speech-recognition
examples/pytorch/speech-recognition
…/transformers into add_asr_example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for adding this example and great job figuring out the problem in a distributed setup!
examples/pytorch/speech-recognition/run_speech_recognition_ctc.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good! Thanks for adding this example
examples/pytorch/speech-recognition/run_speech_recognition_ctc.py
Outdated
Show resolved
Hide resolved
examples/pytorch/speech-recognition/run_speech_recognition_ctc.py
Outdated
Show resolved
Hide resolved
examples/pytorch/speech-recognition/run_speech_recognition_ctc.py
Outdated
Show resolved
Hide resolved
examples/pytorch/speech-recognition/run_speech_recognition_ctc.py
Outdated
Show resolved
Hide resolved
examples/pytorch/speech-recognition/run_speech_recognition_ctc.py
Outdated
Show resolved
Hide resolved
examples/pytorch/speech-recognition/run_speech_recognition_ctc.py
Outdated
Show resolved
Hide resolved
examples/pytorch/speech-recognition/run_speech_recognition_ctc.py
Outdated
Show resolved
Hide resolved
examples/pytorch/speech-recognition/run_speech_recognition_ctc.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Suraj Patil <surajp815@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you very much for figuring out the DDP problem!
The torchaudio
loader seems to be the best fit for the example 🙂
Although I think Windows users will be out of luck when they try to load mp3's (soundfile
is used as a backend there, and it specifically excludes mp3: http://www.mega-nerd.com/libsndfile/#Features)
P.S. So sorry for the typo spam 😅
examples/pytorch/speech-recognition/run_speech_recognition_ctc.py
Outdated
Show resolved
Hide resolved
examples/pytorch/speech-recognition/run_speech_recognition_ctc.py
Outdated
Show resolved
Hide resolved
examples/pytorch/speech-recognition/run_speech_recognition_ctc.py
Outdated
Show resolved
Hide resolved
examples/pytorch/speech-recognition/run_speech_recognition_ctc.py
Outdated
Show resolved
Hide resolved
examples/pytorch/speech-recognition/run_speech_recognition_ctc.py
Outdated
Show resolved
Hide resolved
examples/pytorch/speech-recognition/run_speech_recognition_ctc.py
Outdated
Show resolved
Hide resolved
examples/pytorch/speech-recognition/run_speech_recognition_ctc.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
…/transformers into add_asr_example
examples/pytorch/speech-recognition/run_speech_recognition_ctc.py
Outdated
Show resolved
Hide resolved
…ition` (huggingface#13620) * up * rename * add asr example * add auto feature extractor * some more fixes * correct layerdrop * correct for multi-gpu dist * clean up * refactor * refactor * more fixes * more fixes * clean-up * finish * up * Apply suggestions from code review * fix isort * update * up * add note * apply surajs suggestions * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> * isort * small change * Apply suggestions from code review Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * Apply suggestions from code review Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * add hubert * Update examples/pytorch/speech-recognition/run_speech_recognition_ctc.py Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
…ition` (huggingface#13620) * up * rename * add asr example * add auto feature extractor * some more fixes * correct layerdrop * correct for multi-gpu dist * clean up * refactor * refactor * more fixes * more fixes * clean-up * finish * up * Apply suggestions from code review * fix isort * update * up * add note * apply surajs suggestions * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> * isort * small change * Apply suggestions from code review Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * Apply suggestions from code review Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * add hubert * Update examples/pytorch/speech-recognition/run_speech_recognition_ctc.py Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
…ition` (huggingface#13620) * up * rename * add asr example * add auto feature extractor * some more fixes * correct layerdrop * correct for multi-gpu dist * clean up * refactor * refactor * more fixes * more fixes * clean-up * finish * up * Apply suggestions from code review * fix isort * update * up * add note * apply surajs suggestions * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> * isort * small change * Apply suggestions from code review Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * Apply suggestions from code review Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * add hubert * Update examples/pytorch/speech-recognition/run_speech_recognition_ctc.py Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
This PR adds a generic speech recognition for CTC example. It has been tested for single GPU and distributed training on Common Voice and is being tested on Librispeech currently.
Once
datasets
has https://github.com/huggingface/datasets/pull/2324/files merged and made a new release I will slightly adapt the script to leverage the new audio feature.A couple of example runs with this script:
This example folder should have two additional scripts: 1 for Seq2Seq ASR + 1 for CTC + LM decoding which are left for future work