Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ASR] Add official ASR CTC example to examples/pytorch/speech-recognition #13620

Merged
merged 33 commits into from
Sep 24, 2021

Conversation

patrickvonplaten
Copy link
Contributor

@patrickvonplaten patrickvonplaten commented Sep 17, 2021

This PR adds a generic speech recognition for CTC example. It has been tested for single GPU and distributed training on Common Voice and is being tested on Librispeech currently.

Once datasets has https://github.com/huggingface/datasets/pull/2324/files merged and made a new release I will slightly adapt the script to leverage the new audio feature.

A couple of example runs with this script:

This example folder should have two additional scripts: 1 for Seq2Seq ASR + 1 for CTC + LM decoding which are left for future work

# 3. Next, we create the vocabulary of the model by extracting all unique characters from
# the training and evaluation datasets
# We need to make sure that only first rank saves vocabulary
if training_args.world_size == 1 or dist.get_rank() == 0:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this caused me an headache for 3 days -> in distributed training each process was creating a different ordering of characters in the vocabulary which essentially meant that each process has different label ids.

By using sorted(...) and making sure that only the first process creates & saves the vocabulary, the problem is solved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find!

@patrickvonplaten patrickvonplaten changed the title [WIP][ASR] Add official ASR CTC example to examples/pytorch/speech-recognition [ASR] Add official ASR CTC example to examples/pytorch/speech-recognition Sep 22, 2021
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for adding this example and great job figuring out the problem in a distributed setup!

examples/pytorch/speech-recognition/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good! Thanks for adding this example

patrickvonplaten and others added 3 commits September 23, 2021 11:19
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Copy link
Member

@anton-l anton-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you very much for figuring out the DDP problem!

The torchaudio loader seems to be the best fit for the example 🙂
Although I think Windows users will be out of luck when they try to load mp3's (soundfile is used as a backend there, and it specifically excludes mp3: http://www.mega-nerd.com/libsndfile/#Features)

P.S. So sorry for the typo spam 😅

examples/pytorch/speech-recognition/README.md Show resolved Hide resolved
src/transformers/models/hubert/configuration_hubert.py Outdated Show resolved Hide resolved
src/transformers/models/hubert/configuration_hubert.py Outdated Show resolved Hide resolved
patrickvonplaten and others added 4 commits September 23, 2021 16:51
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
@patrickvonplaten patrickvonplaten merged commit 4a320f6 into huggingface:master Sep 24, 2021
@patrickvonplaten patrickvonplaten deleted the add_asr_example branch September 24, 2021 05:01
stas00 pushed a commit to stas00/transformers that referenced this pull request Oct 12, 2021
…ition` (huggingface#13620)

* up

* rename

* add asr example

* add auto feature extractor

* some more fixes

* correct layerdrop

* correct for multi-gpu dist

* clean up

* refactor

* refactor

* more fixes

* more fixes

* clean-up

* finish

* up

* Apply suggestions from code review

* fix isort

* update

* up

* add note

* apply surajs suggestions

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* isort

* small change

* Apply suggestions from code review

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* Apply suggestions from code review

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* add hubert

* Update examples/pytorch/speech-recognition/run_speech_recognition_ctc.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
Albertobegue pushed a commit to Albertobegue/transformers that referenced this pull request Jan 13, 2022
…ition` (huggingface#13620)

* up

* rename

* add asr example

* add auto feature extractor

* some more fixes

* correct layerdrop

* correct for multi-gpu dist

* clean up

* refactor

* refactor

* more fixes

* more fixes

* clean-up

* finish

* up

* Apply suggestions from code review

* fix isort

* update

* up

* add note

* apply surajs suggestions

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* isort

* small change

* Apply suggestions from code review

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* Apply suggestions from code review

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* add hubert

* Update examples/pytorch/speech-recognition/run_speech_recognition_ctc.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
Albertobegue pushed a commit to Albertobegue/transformers that referenced this pull request Jan 27, 2022
…ition` (huggingface#13620)

* up

* rename

* add asr example

* add auto feature extractor

* some more fixes

* correct layerdrop

* correct for multi-gpu dist

* clean up

* refactor

* refactor

* more fixes

* more fixes

* clean-up

* finish

* up

* Apply suggestions from code review

* fix isort

* update

* up

* add note

* apply surajs suggestions

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* isort

* small change

* Apply suggestions from code review

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* Apply suggestions from code review

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* add hubert

* Update examples/pytorch/speech-recognition/run_speech_recognition_ctc.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants