Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enh_s2t joint model #4226

Merged
merged 3 commits into from
Apr 19, 2022
Merged

enh_s2t joint model #4226

merged 3 commits into from
Apr 19, 2022

Conversation

simpleoier
Copy link
Collaborator

@simpleoier simpleoier commented Mar 31, 2022

Support Enhancement and [ASR, SLU, ST] downstream tasks.

Future PR (TODO)

  • Pass the category info instead of using utterance ids
  • Pre-trained models upload.
  • MixIT

@mergify mergify bot added the ESPnet2 label Mar 31, 2022
@sw005320 sw005320 requested a review from Emrys365 March 31, 2022 17:41
@sw005320 sw005320 added ASR Automatic speech recogntion SE Speech enhancement labels Mar 31, 2022
@sw005320 sw005320 added this to the v.0.10.7 milestone Mar 31, 2022
@sw005320 sw005320 added the Refactoring Refactoring label Mar 31, 2022
@simpleoier simpleoier force-pushed the enh_s2t branch 5 times, most recently from 5c87878 to dec5023 Compare April 1, 2022 03:34
@mergify mergify bot added the Installation label Apr 1, 2022
@mergify
Copy link
Contributor

mergify bot commented Apr 1, 2022

This pull request is now in conflict :(

@mergify mergify bot added the conflicts label Apr 1, 2022
@mergify mergify bot removed the conflicts label Apr 1, 2022
@simpleoier simpleoier added ST Speech translation SLU Spoken language understanding labels Apr 1, 2022
@simpleoier simpleoier force-pushed the enh_s2t branch 6 times, most recently from 8fc5f5f to 2c79166 Compare April 3, 2022 12:34
@simpleoier simpleoier changed the title [WIP] enh_s2t joint model enh_s2t joint model Apr 3, 2022
@simpleoier simpleoier force-pushed the enh_s2t branch 3 times, most recently from eef0e10 to 45a202e Compare April 3, 2022 22:48
@mergify
Copy link
Contributor

mergify bot commented Apr 12, 2022

This pull request is now in conflict :(

Copy link
Collaborator

@Emrys365 Emrys365 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finished my second-round review, except for two s3prl scripts as I am not familiar with them.

use_k2=false # Whether to use k2 based decoder
batch_size=1
inference_tag= # Suffix to the result dir for decoding.
inference_config= # Config for decoding.
Copy link
Collaborator

@Emrys365 Emrys365 Apr 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #4251 is merged, would you consider also applying the updates here?

But it might take additional efforts to merge the updated espnet2/bin/enh_inference.py script into espnet2/bin/enh_s2t_inference.py.

Expand to see the diff in #4251
diff --git a/egs2/TEMPLATE/enh1/enh.sh b/egs2/TEMPLATE/enh1/enh.sh
index fcb4f324f..0afc31881 100755
--- a/egs2/TEMPLATE/enh1/enh.sh
+++ b/egs2/TEMPLATE/enh1/enh.sh
@@ -78,6 +78,8 @@ download_model=
 # Evaluation related
 scoring_protocol="STOI SDR SAR SIR SI_SNR"
 ref_channel=0
+inference_tag=  # Prefix to the result dir for ENH inference.
+inference_enh_config= # Config for enhancement.
 score_with_asr=false
 asr_exp=""       # asr model for scoring WER
 lm_exp=""       # lm model for scoring WER
@@ -151,8 +153,9 @@ Options:
     --init_param    # pretrained model path and module name (default="${init_param}")
 
     # Enhancement related
-    --inference_args   # Arguments for enhancement in the inference stage (default="${inference_args}")
-    --inference_model  # Enhancement model path for inference (default="${inference_model}").
+    --inference_args       # Arguments for enhancement in the inference stage (default="${inference_args}")
+    --inference_model      # Enhancement model path for inference (default="${inference_model}").
+    --inference_enh_config # Configuration file for overwriting some model attributes during SE inference. (default="${inference_enh_config}")
 
     # Evaluation related
     --scoring_protocol    # Metrics to be used for scoring (default="${scoring_protocol}")
@@ -247,6 +250,14 @@ if [ -n "${speed_perturb_factors}" ]; then
   enh_exp="${enh_exp}_sp"
 fi
 
+if [ -z "${inference_tag}" ]; then
+    if [ -n "${inference_enh_config}" ]; then
+        inference_tag="$(basename "${inference_enh_config}" .yaml)"
+    else
+        inference_tag=enhanced
+    fi
+fi
+
 # ========================== Main stages start from here. ==========================
 
 if ! "${skip_data_prep}"; then
@@ -614,7 +625,7 @@ if ! "${skip_eval}"; then
 
         for dset in "${valid_set}" ${test_sets}; do
             _data="${data_feats}/${dset}"
-            _dir="${enh_exp}/enhanced_${dset}"
+            _dir="${enh_exp}/${inference_tag}_${dset}"
             _logdir="${_dir}/logdir"
             mkdir -p "${_logdir}"
 
@@ -646,6 +657,7 @@ if ! "${skip_eval}"; then
                     --data_path_and_name_and_type "${_data}/${_scp},speech_mix,${_type}" \
                     --key_file "${_logdir}"/keys.JOB.scp \
                     --train_config "${enh_exp}"/config.yaml \
+                    ${inference_enh_config:+--inference_config "$inference_enh_config"} \
                     --model_file "${enh_exp}"/"${inference_model}" \
                     --output_dir "${_logdir}"/output.JOB \
                     ${_opts} ${inference_args}
@@ -686,7 +698,7 @@ if ! "${skip_eval}"; then
                 if "${score_obs}"; then
                     _dir="${data_feats}/${dset}/scoring"
                 else
-                    _dir="${enh_exp}/enhanced_${dset}/scoring"
+                    _dir="${enh_exp}/${inference_tag}_${dset}/scoring"
                 fi
 
                 _logdir="${_dir}/logdir"
@@ -713,7 +725,7 @@ if ! "${skip_eval}"; then
                         # To compute the score of observation, input original wav.scp
                         _inf_scp+="--inf_scp ${data_feats}/${dset}/wav.scp "
                     else
-                        _inf_scp+="--inf_scp ${enh_exp}/enhanced_${dset}/spk${spk}.scp "
+                        _inf_scp+="--inf_scp ${enh_exp}/${inference_tag}_${dset}/spk${spk}.scp "
                     fi
                 done
 
@@ -749,7 +761,7 @@ if ! "${skip_eval}"; then
             ./scripts/utils/show_enh_score.sh "${_dir}/../.." > "${_dir}/../../RESULTS.md"
         done
         log "Evaluation result for observation: ${data_feats}/RESULTS.md"
-        log "Evaluation result for enhancement: ${enh_exp}/enhanced/RESULTS.md"
+        log "Evaluation result for enhancement: ${enh_exp}/RESULTS.md"
 
     fi
 else
@@ -808,7 +820,7 @@ if "${score_with_asr}"; then
                         # Using same wav.scp for all speakers
                         cp "${_data}/wav.scp" "${_ddir}/wav.scp"
                     else
-                        cp "${enh_exp}/enhanced_${dset}/scoring/wav_spk${spk}" "${_ddir}/wav.scp"
+                        cp "${enh_exp}/${inference_tag}_${dset}/scoring/wav_spk${spk}" "${_ddir}/wav.scp"
                     fi
                     cp data/${dset}/text_spk${spk} ${_ddir}/text
                     cp ${_data}/{spk2utt,utt2spk,utt2num_samples,feats_type} ${_ddir}

Copy link
Collaborator

@Emrys365 Emrys365 Apr 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • BTW, it would be better to add an additional argument: pretrained_model as in egs2/TEMPLATE/asr1/asr.sh#L93, which allows initializing from a pretrained model.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly this also applies to egs2/TEMPLATE/enh_st1/enh_st.sh.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I've added it.

egs2/TEMPLATE/enh_asr1/enh_asr.sh Outdated Show resolved Hide resolved
egs2/TEMPLATE/enh_asr1/enh_asr.sh Outdated Show resolved Hide resolved
egs2/TEMPLATE/enh_asr1/enh_asr.sh Outdated Show resolved Hide resolved
egs2/TEMPLATE/enh_asr1/enh_asr.sh Outdated Show resolved Hide resolved
espnet2/tasks/enh_s2t.py Outdated Show resolved Hide resolved
egs2/TEMPLATE/enh_asr1/enh_asr.sh Show resolved Hide resolved
egs2/TEMPLATE/enh_st1/enh_st.sh Show resolved Hide resolved
espnet2/tasks/enh_s2t.py Outdated Show resolved Hide resolved
espnet2/tasks/enh_s2t.py Show resolved Hide resolved
@simpleoier
Copy link
Collaborator Author

I finished my second-round review, except for two s3prl scripts as I am not familiar with them.

Thanks. I think those s3prl files are fine.

Copy link
Collaborator

@ftshijt ftshijt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks cool for me on the ST side. You have my pass for that part.

@simpleoier simpleoier mentioned this pull request Apr 18, 2022
@sw005320
Copy link
Contributor

OK for me as well.
@simpleoier, if you think this PR is ready, I’ll merge this PR.

@simpleoier
Copy link
Collaborator Author

Hi @sw005320 , it is ready. Thanks!

@sw005320 sw005320 merged commit 42eb310 into espnet:master Apr 19, 2022
@Emrys365
Copy link
Collaborator

@simpleoier, I think we may also need to include the newly added tasks in the integration test: ci/test_integration_espnet2.sh#L78

To do that, some toy recipes need to be created in egs2/mini_an4.

@sw005320
Copy link
Contributor

@simpleoier, I think we may also need to include the newly added tasks in the integration test: ci/test_integration_espnet2.sh#L78

To do that, some toy recipes need to be created in egs2/mini_an4.

Good idea!

@simpleoier
Copy link
Collaborator Author

I'll work on the integration test. Thanks for the suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ASR Automatic speech recogntion ESPnet1 ESPnet2 Installation README Refactoring Refactoring SE Speech enhancement SLU Spoken language understanding ST Speech translation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants