-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add EnhPreprocessor for Speech Enhancement #4321
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4321 +/- ##
==========================================
- Coverage 82.58% 82.43% -0.15%
==========================================
Files 469 469
Lines 40196 40358 +162
==========================================
+ Hits 33194 33270 +76
- Misses 7002 7088 +86
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
I modified the DataLoader initialization to set an epoch-dependent random seed for each worker, while the previous default DataLoader tends to use varying and uncontrollable random seeds at each epoch (see https://github.com/pytorch/pytorch/blob/master/torch/utils/data/_utils/worker.py#L216-L235 and https://github.com/pytorch/pytorch/blob/master/torch/utils/data/dataloader.py#L535). |
@popcornell, could I ask you to review this PR? |
@Emrys365, is it possible to make a unit test to cover this? |
I was thinking about it, but I could not find an example unit test for the preprocessor. |
I think it is no problem to upload such files for the integration test (or create it with some files randomly and regard them as RIRs or noises). |
OK. I added two noise samples (from MUSAN) and two RIR samples (from https://www.openslr.org/28/) to egs/mini_an4/asr1/downloads.tar.gz:
|
If it is fine for you two, I'll do it this way then ! |
help="Whether to apply preprocessing to data or not", | ||
) | ||
group.add_argument( | ||
"--speech_volume_normalize", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't be useful to add also an option for varying dynamically the range of level for speech ?
In dynamic mixing you want often to do that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean randomly sampling from a range rather than a fixed value? It sounds reasonable to me.
We can just do that like for noise_db_range
. @kamo-naoyuki What do you think?
One concern is that we need to care about the behavior when
self.train
is False. The scale should be deterministic in these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that would be good. There are many instances where you want to reduce for example the overall level of the speech utterances e.g. to simulate far field speech (correctly you re scale back after reverberation but sometime you want to simulate the low gain of speech captured by distant mics).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added support of it (only for EnhPreprocessor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it better to scale the energy in dB instead of the peak value ?
This pull request is now in conflict :( |
Hi @popcornell, could you review again? I made some changes to resolve all comments. |
noise = np.pad( | ||
noise, | ||
[(offset, nsamples - f.frames - offset), (0, 0)], | ||
mode="wrap", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice but could lead to funny noise clips if the noise is very short. Maybe raise a warning ? (but could be too verbose as it is in the dataloader).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment. I have added support for it.
Thanks, @Emrys365! |
This PR adds a general preprocessor for adding noise and reverberation on the fly.
Below is an example usage of the newly added EnhPreprocessor in
egs2/wsj0_2mix/enh1
:run.sh
conf/tuning/train_enh_dprnn_tasnet_with_preprocessor.yaml