Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

single language track setups #4895

Merged
merged 5 commits into from Jan 31, 2023
Merged

single language track setups #4895

merged 5 commits into from Jan 31, 2023

Conversation

DanBerrebbi
Copy link
Contributor

I changed some single language langs :
I removed pol because similar to rus and nob because similar to swe,
I added French because there was no roman language(french, spanish, italian, portuguese ...) and added Swahili because there was no African language.
For dataset selections, it is summarized on the last page of https://docs.google.com/document/d/1sb8SyDjcMf7FDiZHH8wVcZ0EADtXdNBF3LpA9Cu0I1k/edit

Capture d’écran 2023-01-30 à 11 51 17

For test sets format, I think that it is good to keep only one test set per language with all the datasets of this lang. This way it is an easy decoding process and then WE can split it the decoded file to have scores per dataset and so compute metrics for domain shifts ... . So we have flexibility for scoring and the user has a simple process.

Points to be discussed :

  • lang choices
  • VoxPopuli not working
  • Should we use speed pert ? In my opinion no

@mergify mergify bot added the ESPnet2 label Jan 30, 2023
Copy link
Collaborator

@ftshijt ftshijt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Please follow our discussion and fix the CI issues. Let's accelerate the process~ Thanks @DanBerrebbi

egs2/msuperb/asr1/local/single_lang_data_prep.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jan 31, 2023

Codecov Report

Merging #4895 (2bf7dd2) into master (e37ee27) will increase coverage by 3.48%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #4895      +/-   ##
==========================================
+ Coverage   73.10%   76.58%   +3.48%     
==========================================
  Files         603      603              
  Lines       53709    53737      +28     
==========================================
+ Hits        39264    41155    +1891     
+ Misses      14445    12582    -1863     
Flag Coverage Δ
test_integration_espnet1 66.33% <ø> (ø)
test_integration_espnet2 47.60% <ø> (ø)
test_python 66.45% <ø> (+3.55%) ⬆️
test_utils 23.35% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
espnet2/uasr/espnet_model.py 0.00% <0.00%> (ø)
espnet/nets/pytorch_backend/e2e_vc_transformer.py 86.72% <0.00%> (+0.11%) ⬆️
espnet/nets/pytorch_backend/rnn/attentions.py 98.12% <0.00%> (+0.13%) ⬆️
espnet/nets/pytorch_backend/e2e_vc_tacotron2.py 80.48% <0.00%> (+0.15%) ⬆️
espnet/nets/chainer_backend/e2e_asr_transformer.py 69.59% <0.00%> (+0.20%) ⬆️
espnet/nets/pytorch_backend/lm/seq_rnn.py 86.88% <0.00%> (+0.21%) ⬆️
espnet2/bin/asr_transducer_inference.py 94.04% <0.00%> (+0.39%) ⬆️
espnet2/svs/espnet_model.py 6.25% <0.00%> (+0.52%) ⬆️
espnet2/enh/layers/dnn_beamformer.py 97.74% <0.00%> (+0.56%) ⬆️
espnet2/diar/espnet_model.py 96.29% <0.00%> (+0.61%) ⬆️
... and 49 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sw005320 sw005320 added ASR Automatic speech recogntion Recipe labels Jan 31, 2023
@sw005320 sw005320 added this to the v.202301 milestone Jan 31, 2023
@ftshijt
Copy link
Collaborator

ftshijt commented Jan 31, 2023

Many thanks! Looks great to me.

@ftshijt ftshijt merged commit 6a83c97 into espnet:master Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ASR Automatic speech recogntion ESPnet2 Recipe
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants