New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new EGS2 recipe 'reazonspeech' #4885
Conversation
9358a45
to
76c6f37
Compare
Hi @fujimotos Please fix the CI, then I can merge it~ Thanks |
This pull request is now in conflict :( |
ReazonSpeech is >19000h Japanese corpus collected from TV programs. This adds ASR recipe that trains Conformer model on it. * The dataset is automatically downloaded from Hugging Face using `datasets.load_download()` * The training recipe is almost the same as CSJ, except for minor parameter tweakings (to optimize it for our machine spec) The pre-trained model is available on Hugging Face: https://huggingface.co/reazon-research/reazonspeech-espnet-v1 Signed-off-by: Fujimoto Seiji <fujimoto@clear-code.com>
Based on a feedback on PR#4885. Instead of hard-cording the data path, make it adjustable by db.sh Signed-off-by: Fujimoto Seiji <fujimoto@clear-code.com>
Based on a feedback on PR#4885. Instead of letting users find out how to install the required modules by themselves, let's automate those steps. Signed-off-by: Fujimoto Seiji <fujimoto@clear-code.com>
This resolves coding-style warnings emitted by linters: * Quote the `test_sets` variable to resolve a shellcheck worning. * Apply isort and black to reformat `local/data.py`. Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>
76c6f37
to
a92301e
Compare
tools/installers/install_datasets.sh seems to be empty. |
@ftshijt The CI error should be fixed by a92301e:
Also I fixed the merge conflict by rebasing to the master HEAD, Feel free to ask me if anything is not clear! |
@sw005320 It must be OK. Basically @wanchichen added
So I did |
Got it. Thanks! |
Codecov Report
@@ Coverage Diff @@
## master #4885 +/- ##
==========================================
+ Coverage 67.06% 76.58% +9.52%
==========================================
Files 603 603
Lines 53737 53737
==========================================
+ Hits 36039 41155 +5116
+ Misses 17698 12582 -5116
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Thanks for the fix! Not sure how that happened. |
ReazonSpeech is >19000h Japanese corpus collected from TV programs.
This adds ASR recipe that trains Conformer model on it.
The dataset is automatically downloaded from Hugging Face using
datasets.load_download()
The training recipe is almost the same as CSJ, except for minor
parameter tweakings (to optimize it for our machine spec)
The pre-trained model is available on Hugging Face:
https://huggingface.co/reazon-research/reazonspeech-espnet-v1