New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dynamic mixing in the speech separation task. #4387
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4387 +/- ##
==========================================
+ Coverage 82.40% 82.46% +0.06%
==========================================
Files 481 487 +6
Lines 41238 42112 +874
==========================================
+ Hits 33982 34729 +747
- Misses 7256 7383 +127
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
egs2/wsj0_2mix/enh1/conf/tuning/train_enh_skim_tasnet_noncausal_dm.yaml
Outdated
Show resolved
Hide resolved
This pull request is now in conflict :( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general.
BTW, could you add an integration test for this new preprocessor? You can refer to https://github.com/espnet/espnet/blob/master/ci/test_integration_espnet2.sh#L87 |
done | ||
|
||
else | ||
# prepare train and valid data parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is better to echo some message here to warn the user that dynamic mixing is being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Can you add a comment, @LiChenda?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Wangyou Zhang <C0me_On@163.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I have some minor comments.
done | ||
|
||
else | ||
# prepare train and valid data parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Can you add a comment, @LiChenda?
layer: 1 | ||
unit: 128 | ||
dropout: 0.2 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment about this configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
scheduler_conf: | ||
step_size: 2 | ||
gamma: 0.97 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@LiChenda, can you add some comments? |
Oh, I forgot to hande it. I'll do it now. |
Thanks! |
This PR add a simple dynamic mixing (DM) option for the speech separation task.
The current version support the mixing of a variable number of speakers.
To enable DM, set related options in the training config file:
and pass
dynamic_mixing=true
toenh.sh
.dynamic_mixing_gain_db
is the maximum random gain (in dB) for each source before the mixing.The gain (in dB) of each source is unifromly sampled in
[-dynamic_mixing_gain_db, dynamic_mixing_gain_db]
. Default value is0.0
.When applying DM, we do not need
wav.scp, spk{2~N}.scp
anymore like that in the regular training set (only source file is needed ). So far I haven't figured out how to unify the data preparing stage (stage 1~5) for the dataset with and without DM.And even when use DM for training, the valid and testing set shoud be prepared without DM.
So, in the current version of DM, users need to collect all the source files in
spk1.scp
manually in the training set.