Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Recipe] Add iwslt22 low resource speech translation task for egs2 #4994

Merged
merged 8 commits into from Mar 14, 2023

Conversation

freddy5566
Copy link
Contributor

Summary

This PR creates an espnet2 recipe for iwslt22 low-resource speech translation task .
The dataset comprises two different sets (see more information at here):

  • 17 hours of clean speech in Tamasheq, translated to the French language (taq_fra_clean)
  • 19 hours version of this corpus, including 2 additional hours of data that annotators labeled as potentially noisy (taq_fra_full)

Todo

  • update egs/README.md or egs2/README.md with corresponding recipes
  • add corresponding entry in egs2/TEMPLATE/db.sh for a new corpus
  • create confs and other related scripts
  • run experiments for clean and full sets of data
  • upload trained models to huggingface

My approach is leveraging existing Tamasheq wav2vec2 features and transformer as the model architecture.

Here are the results:

st_wav2vec-transformer-warmup-15k

BLEU

dataset score verbose_score
decode_pen2_st_model_valid.acc.ave/test 2.6 22.5/4.5/1.8/0.8 (BP = 0.736 ratio = 0.765 hyp_len = 17223 ref_len = 22504)

st_full_wav2vec-transformer-warmup-15k

BLEU

dataset score verbose_score
decode_pen2_st_model_valid.acc.ave/test 3.6 24.7/5.4/2.1/1.0 (BP = 0.894 ratio = 0.899 hyp_len = 20241 ref_len = 22504)

@freddy5566
Copy link
Contributor Author

freddy5566 commented Mar 11, 2023

I have rebased it onto the lastest master branch.

@sw005320 sw005320 requested a review from ftshijt March 13, 2023 12:14
@sw005320 sw005320 added Recipe ST Speech translation labels Mar 13, 2023
@sw005320 sw005320 added this to the v.202303 milestone Mar 13, 2023
@sw005320
Copy link
Contributor

Thanks a lot!
@ftshijt, can you check this PR?

# train_full comprises a 19 hour version of this corpus,
# including 2 additional hours of data that was labeled by annotators as potentially noisy
mkdir -p data/train/org
mkdir -p data/train_full/org
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mkdir -p data/train_full/org
mkdir -p data/train_full/org

Copy link
Collaborator

@ftshijt ftshijt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks! Looks perfect to me. I just leave some notes for minor formatting issues.

tgt_case=tc

./st.sh \
--st_tag wav2vec-transformer-warmup-15k \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--st_tag wav2vec-transformer-warmup-15k \


./st.sh \
--st_tag wav2vec-transformer-warmup-15k \
--ignore_init_mismatch true \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--ignore_init_mismatch true \

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary since you do not use any pre-trained model with mismatched keys

--tgt_nbpe $tgt_nbpe \
--tgt_case ${tgt_case} \
--feats_type "raw" \
--feats_normalize uttmvn \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--feats_normalize uttmvn \
--feats_normalize utterance_mvn \

Comment on lines 37 to 38
--nj 16 \
--inference_nj 16 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--nj 16 \
--inference_nj 16 \

--nj 16 \
--inference_nj 16 \
--src_lang ${src_lang} \
--use_src_lang false \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--use_src_lang false \
--use_src_lang false \

@freddy5566
Copy link
Contributor Author

Hi @ftshijt,

Thank you for reviewing my PR!
I have fixed the format and linter. Sorry, I didn't notice that my tab setting was a bit messy on the server.
BTW, Should I upload models to huggingface now?

@ftshijt
Copy link
Collaborator

ftshijt commented Mar 13, 2023

Hi @ftshijt,

Thank you for reviewing my PR! I have fixed the format and linter. Sorry, I didn't notice that my tab setting was a bit messy on the server. BTW, Should I upload models to huggingface now?

It is fine to make it another PR. But if it is ready, it would be welcome if you add the link in this PR as well.

@ftshijt
Copy link
Collaborator

ftshijt commented Mar 13, 2023

There are also some CI issues. Please fix those as well (see https://github.com/espnet/espnet/actions/runs/4391542488/jobs/7714797586)

@freddy5566
Copy link
Contributor Author

There are also some CI issues. Please fix those as well (see https://github.com/espnet/espnet/actions/runs/4391542488/jobs/7714797586)

Thank you! I have fixed ci errors.

@codecov
Copy link

codecov bot commented Mar 13, 2023

Codecov Report

Merging #4994 (eb6782f) into master (611a291) will decrease coverage by 0.91%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #4994      +/-   ##
==========================================
- Coverage   76.99%   76.08%   -0.91%     
==========================================
  Files         606      606              
  Lines       53748    53713      -35     
==========================================
- Hits        41381    40870     -511     
- Misses      12367    12843     +476     
Flag Coverage Δ
test_integration_espnet1 66.28% <ø> (+0.13%) ⬆️
test_integration_espnet2 55.57% <ø> (+7.80%) ⬆️
test_python 65.39% <ø> (-1.45%) ⬇️
test_utils 23.28% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

see 49 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more


|dataset|score|verbose_score|
|---|---|---|
|decode_pen2_st_model_valid.acc.ave/test|2.6|22.5/4.5/1.8/0.8 (BP = 0.736 ratio = 0.765 hyp_len = 17223 ref_len = 22504)|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ratio is too different (hypotheses are too short) to me.
Can you tune the length penalty (in later PR?)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course. No problem.

@sw005320 sw005320 added the auto-merge Enable auto-merge label Mar 14, 2023
@mergify mergify bot merged commit 4bd37a2 into espnet:master Mar 14, 2023
@freddy5566 freddy5566 deleted the feature/iwslt22-low-resource branch March 14, 2023 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Enable auto-merge ESPnet2 README Recipe ST Speech translation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants