New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Recipe] Add iwslt22 low resource speech translation task for egs2 #4994
[Recipe] Add iwslt22 low resource speech translation task for egs2 #4994
Conversation
2b15d04
to
39f5490
Compare
I have rebased it onto the lastest master branch. |
Thanks a lot! |
# train_full comprises a 19 hour version of this corpus, | ||
# including 2 additional hours of data that was labeled by annotators as potentially noisy | ||
mkdir -p data/train/org | ||
mkdir -p data/train_full/org |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mkdir -p data/train_full/org | |
mkdir -p data/train_full/org |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks! Looks perfect to me. I just leave some notes for minor formatting issues.
egs2/iwslt22_low_resource/st1/run.sh
Outdated
tgt_case=tc | ||
|
||
./st.sh \ | ||
--st_tag wav2vec-transformer-warmup-15k \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--st_tag wav2vec-transformer-warmup-15k \ |
egs2/iwslt22_low_resource/st1/run.sh
Outdated
|
||
./st.sh \ | ||
--st_tag wav2vec-transformer-warmup-15k \ | ||
--ignore_init_mismatch true \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--ignore_init_mismatch true \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary since you do not use any pre-trained model with mismatched keys
egs2/iwslt22_low_resource/st1/run.sh
Outdated
--tgt_nbpe $tgt_nbpe \ | ||
--tgt_case ${tgt_case} \ | ||
--feats_type "raw" \ | ||
--feats_normalize uttmvn \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--feats_normalize uttmvn \ | |
--feats_normalize utterance_mvn \ |
egs2/iwslt22_low_resource/st1/run.sh
Outdated
--nj 16 \ | ||
--inference_nj 16 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--nj 16 \ | |
--inference_nj 16 \ |
egs2/iwslt22_low_resource/st1/run.sh
Outdated
--nj 16 \ | ||
--inference_nj 16 \ | ||
--src_lang ${src_lang} \ | ||
--use_src_lang false \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--use_src_lang false \ | |
--use_src_lang false \ |
Hi @ftshijt, Thank you for reviewing my PR! |
It is fine to make it another PR. But if it is ready, it would be welcome if you add the link in this PR as well. |
There are also some CI issues. Please fix those as well (see https://github.com/espnet/espnet/actions/runs/4391542488/jobs/7714797586) |
Thank you! I have fixed ci errors. |
Codecov Report
@@ Coverage Diff @@
## master #4994 +/- ##
==========================================
- Coverage 76.99% 76.08% -0.91%
==========================================
Files 606 606
Lines 53748 53713 -35
==========================================
- Hits 41381 40870 -511
- Misses 12367 12843 +476
Flags with carried forward coverage won't be shown. Click here to find out more. see 49 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
31cf90f
to
eb6782f
Compare
|
||
|dataset|score|verbose_score| | ||
|---|---|---| | ||
|decode_pen2_st_model_valid.acc.ave/test|2.6|22.5/4.5/1.8/0.8 (BP = 0.736 ratio = 0.765 hyp_len = 17223 ref_len = 22504)| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ratio is too different (hypotheses are too short) to me.
Can you tune the length penalty (in later PR?)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course. No problem.
Summary
This PR creates an espnet2 recipe for iwslt22 low-resource speech translation task .
The dataset comprises two different sets (see more information at here):
Todo
My approach is leveraging existing Tamasheq wav2vec2 features and transformer as the model architecture.
Here are the results:
st_wav2vec-transformer-warmup-15k
BLEU
st_full_wav2vec-transformer-warmup-15k
BLEU