Add dynamic mixing in the speech separation task. #4387

LiChenda · 2022-05-23T18:19:32Z

This PR add a simple dynamic mixing (DM) option for the speech separation task.
The current version support the mixing of a variable number of speakers.
To enable DM, set related options in the training config file:

model_conf:
    dynamic_mixing: True
    dynamic_mixing_gain_db: 2.0

and pass dynamic_mixing=true to enh.sh.

dynamic_mixing_gain_db is the maximum random gain (in dB) for each source before the mixing.
The gain (in dB) of each source is unifromly sampled in [-dynamic_mixing_gain_db, dynamic_mixing_gain_db] . Default value is 0.0.

When applying DM, we do not need wav.scp, spk{2~N}.scp anymore like that in the regular training set (only source file is needed ). So far I haven't figured out how to unify the data preparing stage (stage 1~5) for the dataset with and without DM.
And even when use DM for training, the valid and testing set shoud be prepared without DM.

So, in the current version of DM, users need to collect all the source files in spk1.scp manually in the training set.

codecov · 2022-05-23T18:38:56Z

Codecov Report

Merging #4387 (d0117f7) into master (f2778f7) will increase coverage by 0.06%.
The diff coverage is 56.98%.

@@            Coverage Diff             @@
##           master    #4387      +/-   ##
==========================================
+ Coverage   82.40%   82.46%   +0.06%     
==========================================
  Files         481      487       +6     
  Lines       41238    42112     +874     
==========================================
+ Hits        33982    34729     +747     
- Misses       7256     7383     +127

Flag	Coverage Δ
test_integration_espnet1	`66.36% <ø> (-0.02%)`	⬇️
test_integration_espnet2	`48.27% <56.98%> (-0.90%)`	⬇️
test_python	`69.65% <23.65%> (+0.25%)`	⬆️
test_utils	`23.29% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
espnet2/enh/espnet_model.py	`85.48% <ø> (-9.85%)`	⬇️
espnet2/train/preprocessor.py	`37.87% <43.47%> (+1.15%)`	⬆️
espnet2/tasks/enh.py	`99.33% <95.83%> (+0.10%)`	⬆️
espnet2/utils/kwargs2args.py	`21.42% <0.00%> (-71.43%)`	⬇️
espnet2/enh/separator/fasnet_separator.py	`90.00% <0.00%> (-6.16%)`	⬇️
espnet2/enh/separator/skim_separator.py	`92.10% <0.00%> (-4.77%)`	⬇️
espnet/nets/pytorch_backend/e2e_asr_maskctc.py	`87.40% <0.00%> (-4.45%)`	⬇️
espnet2/train/trainer.py	`76.59% <0.00%> (-2.51%)`	⬇️
espnet2/enh/loss/criterions/time_domain.py	`95.07% <0.00%> (-1.49%)`	⬇️
... and 28 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

egs2/wsj0_2mix/enh1/conf/tuning/train_enh_skim_tasnet_noncausal_dm.yaml

mergify · 2022-06-06T22:11:29Z

This pull request is now in conflict :(

espnet2/tasks/enh.py

espnet2/train/preprocessor.py

espnet2/tasks/enh.py

espnet2/train/preprocessor.py

Emrys365

LGTM in general.

Emrys365 · 2022-06-16T07:10:17Z

BTW, could you add an integration test for this new preprocessor? You can refer to https://github.com/espnet/espnet/blob/master/ci/test_integration_espnet2.sh#L87

egs2/TEMPLATE/enh1/enh.sh

Emrys365 · 2022-07-20T11:39:58Z

egs2/TEMPLATE/enh1/enh.sh

+            done
+
+        else 
+            # prepare train and valid data parameters


Maybe it is better to echo some message here to warn the user that dynamic mixing is being used.

I agree. Can you add a comment, @LiChenda?

Emrys365

LGTM

Co-authored-by: Wangyou Zhang <C0me_On@163.com>

sw005320

LGTM.
I have some minor comments.

sw005320 · 2022-07-29T12:28:36Z

egs2/TEMPLATE/enh1/enh.sh

+            done
+
+        else 
+            # prepare train and valid data parameters


I agree. Can you add a comment, @LiChenda?

sw005320 · 2022-07-29T12:29:03Z

egs2/mini_an4/enh1/conf/train_with_dynamic_mixing.yaml

+    layer: 1
+    unit: 128
+    dropout: 0.2
+


Can you add a comment about this configuration?

sw005320 · 2022-07-29T12:29:15Z

egs2/wsj0_2mix/enh1/conf/tuning/train_enh_skim_tasnet_noncausal_dm.yaml

+scheduler_conf:
+    step_size: 2
+    gamma: 0.97
+


sw005320 · 2022-08-04T11:20:51Z

@LiChenda, can you add some comments?
After that, I’ll merge this PR.

LiChenda · 2022-08-04T11:23:10Z

@LiChenda, can you add some comments? After that, I’ll merge this PR.

Oh, I forgot to hande it. I'll do it now.

sw005320 · 2022-08-04T11:44:55Z

Thanks!

dynamic mixing

8f03e3c

mergify bot added the ESPnet2 label May 23, 2022

Emrys365 reviewed May 25, 2022

View reviewed changes

egs2/wsj0_2mix/enh1/conf/tuning/train_enh_skim_tasnet_noncausal_dm.yaml Outdated Show resolved Hide resolved

LiChenda added 3 commits June 6, 2022 23:58

update dynamic mixing as preprocessor

77d0a9d

remove unused parameters

0d6e1e7

update enh.sh

cd8f4ce

mergify bot added the conflicts label Jun 6, 2022

LiChenda added 3 commits June 8, 2022 14:21

fix for integration testing

a7df550

merge and fix conficts

002dc6e

Fix for ci

17f93dd

mergify bot removed the conflicts label Jun 8, 2022

Emrys365 added New Features SE Speech enhancement labels Jun 8, 2022

add utt2spk

42e7efc