Add EnhPreprocessor for Speech Enhancement #4321

Emrys365 · 2022-04-26T17:51:36Z

This PR adds a general preprocessor for adding noise and reverberation on the fly.

Below is an example usage of the newly added EnhPreprocessor in egs2/wsj0_2mix/enh1:

run.sh

The added --extra_wav_list argument specifies the name of additional audio scp files to dump in stage 3, which will be used in EnhPreprocessor.
See egs2/TEMPLATE/enh1/enh.sh#L359-L368 for more details.

#!/usr/bin/env bash
# Set bash to 'debug' mode, it will exit on :
# -e 'error', -u 'undefined variable', -o ... 'error in pipeline', -x 'print commands',
set -e
set -u
set -o pipefail

min_or_max=min # "min" or "max". This is to determine how the mixtures are generated in local/data.sh.
sample_rate=8k


train_set="tr_${min_or_max}_${sample_rate}"
valid_set="cv_${min_or_max}_${sample_rate}"
test_sets="tt_${min_or_max}_${sample_rate} "

./enh.sh \
    --audio_format wav \
    --train_set "${train_set}" \
    --valid_set "${valid_set}" \
    --test_sets "${test_sets}" \
    --fs "${sample_rate}" \
    --lang en \
    --ngpu 1 \
    --use_preprocessor true \
    --extra_wav_list "rirs.scp noises.scp" \
    --local_data_opts "--sample_rate ${sample_rate} --min_or_max ${min_or_max}" \
    --enh_config conf/tuning/train_enh_dprnn_tasnet_with_preprocessor.yaml \
    "$@"

conf/tuning/train_enh_dprnn_tasnet_with_preprocessor.yaml

expand

optim: adam
init: xavier_uniform
max_epoch: 150
batch_type: folded
batch_size: 4
iterator_type: chunk
chunk_length: 32000
num_workers: 4
optim_conf:
    lr: 1.0e-03
    eps: 1.0e-08
    weight_decay: 0
patience: 4
val_scheduler_criterion:
- valid
- loss
best_model_criterion:
-   - valid
    - si_snr
    - max
-   - valid
    - loss
    - min
keep_nbest_models: 1
scheduler: reducelronplateau
scheduler_conf:
    mode: min
    factor: 0.7
    patience: 1

# preprocessor config
rir_scp: dump/raw/tr_min_8k/rirs.scp
rir_apply_prob: 1.0
noise_scp: dump/raw/tr_min_8k/noises.scp
noise_apply_prob: 1.0
noise_db_range: "0_30"
use_reverberant_ref: true
num_spk: 2
num_noise_type: 1
sample_rate: 8000
force_single_channel: true

encoder: conv
encoder_conf:
    channel: 64
    kernel_size: 2
    stride: 1
decoder: conv
decoder_conf:
    channel: 64
    kernel_size: 2
    stride: 1
separator: dprnn
separator_conf:
    num_spk: 2
    layer: 6
    rnn_type: lstm
    bidirectional: True  # this is for the inter-block rnn
    nonlinear: relu
    unit: 128
    segment_size: 250
    dropout: 0.1
    nonlinear: relu

# A list for criterions
# The overlall loss in the multi-task learning will be:
# loss = weight_1 * loss_1 + ... + weight_N * loss_N
# The default `weight` for each sub-loss is 1.0
criterions:
  # The first criterion
  - name: si_snr
    conf:
      eps: 1.0e-7
    wrapper: pit
    wrapper_conf:
      weight: 1.0
      independent_perm: True

codecov · 2022-04-26T18:10:19Z

Codecov Report

Merging #4321 (afe5131) into master (5fa6dcc) will decrease coverage by 0.14%.
The diff coverage is 38.72%.

@@            Coverage Diff             @@
##           master    #4321      +/-   ##
==========================================
- Coverage   82.58%   82.43%   -0.15%     
==========================================
  Files         469      469              
  Lines       40196    40358     +162     
==========================================
+ Hits        33194    33270      +76     
- Misses       7002     7088      +86

Flag	Coverage Δ
test_integration_espnet1	`66.58% <ø> (ø)`
test_integration_espnet2	`49.51% <37.25%> (-0.08%)`	⬇️
test_python	`69.19% <30.39%> (-0.18%)`	⬇️
test_utils	`23.45% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
espnet2/train/preprocessor.py	`36.71% <16.10%> (-0.85%)`	⬇️
espnet2/iterators/sequence_iter_factory.py	`97.43% <100.00%> (+0.21%)`	⬆️
espnet2/tasks/asr.py	`91.71% <100.00%> (+0.04%)`	⬆️
espnet2/tasks/enh.py	`99.18% <100.00%> (+0.12%)`	⬆️
espnet2/tasks/enh_s2t.py	`96.64% <100.00%> (ø)`
espnet2/tasks/hubert.py	`88.37% <100.00%> (ø)`
espnet2/tasks/st.py	`90.55% <100.00%> (+0.05%)`	⬆️
espnet2/enh/loss/criterions/abs_loss.py	`80.00% <0.00%> (-5.72%)`	⬇️
espnet2/text/phoneme_tokenizer.py	`83.39% <0.00%> (-3.19%)`	⬇️
espnet2/bin/asr_inference.py	`84.83% <0.00%> (-1.78%)`	⬇️
... and 6 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

Emrys365 · 2022-04-28T11:20:01Z

I modified the DataLoader initialization to set an epoch-dependent random seed for each worker, while the previous default DataLoader tends to use varying and uncontrollable random seeds at each epoch (see https://github.com/pytorch/pytorch/blob/master/torch/utils/data/_utils/worker.py#L216-L235 and https://github.com/pytorch/pytorch/blob/master/torch/utils/data/dataloader.py#L535).

sw005320 · 2022-04-28T11:44:07Z

@popcornell, could I ask you to review this PR?

sw005320 · 2022-04-28T11:44:56Z

@Emrys365, is it possible to make a unit test to cover this?
If it is difficult, we may consider having an integration test.

Emrys365 · 2022-04-28T11:47:39Z

I was thinking about it, but I could not find an example unit test for the preprocessor.
Maybe an integration test is easier, but it will need some noise and RIR signals for loading.

sw005320 · 2022-04-28T12:07:12Z

I think it is no problem to upload such files for the integration test (or create it with some files randomly and regard them as RIRs or noises).

Emrys365 · 2022-04-28T13:21:22Z

OK. I added two noise samples (from MUSAN) and two RIR samples (from https://www.openslr.org/28/) to egs/mini_an4/asr1/downloads.tar.gz:

downloads
├── an4
│   ├── etc
│   │   ├── an4.dic
│   │   ├── an4.filler
│   │   ├── an4.phone
│   │   ├── an4_test.fileids
│   │   ├── an4_test.transcription
│   │   ├── an4_train.fileids
│   │   ├── an4_train.transcription
│   │   ├── an4.ug.lm
│   │   └── an4.ug.lm.DMP
│   ├── LICENSE
│   ├── README
│   └── wav
│       ├── an4_clstk
│       │   ├── fash
│       │   │   ├── an251-fash-b.sph
│       │   │   ├── an253-fash-b.sph
│       │   │   └── cen7-fash-b.sph
│       │   ├── fbbh
│       │   │   └── cen8-fbbh-b.sph
│       │   └── mwhw
│       │       ├── an152-mwhw-b.sph
│       │       └── cen8-mwhw-b.sph
│       └── an4test_clstk
│           ├── fcaw
│           │   └── cen8-fcaw-b.sph
│           └── mmxg
│               └── cen8-mmxg-b.sph
├── noise
│   ├── noise-free-sound-0043.wav
│   └── noise-sound-bible-0001.wav
└── rirs
    ├── rir1.wav
    └── rir2.wav

popcornell · 2022-04-29T01:31:53Z

@sw005320 @Emrys365 Looks like one of you need to invite me to do the review.

sw005320 · 2022-04-29T01:52:56Z

@sw005320 @Emrys365 Looks like one of you need to invite me to do the review.

I think you can just go through the code and leave the comments.
Please let me know if it would not work.

popcornell · 2022-04-29T02:13:37Z

If it is fine for you two, I'll do it this way then !

espnet2/tasks/enh.py

popcornell · 2022-04-29T02:21:39Z

espnet2/tasks/enh.py

+            help="Whether to apply preprocessing to data or not",
+        )
+        group.add_argument(
+            "--speech_volume_normalize",


Wouldn't be useful to add also an option for varying dynamically the range of level for speech ?
In dynamic mixing you want often to do that too.

Do you mean randomly sampling from a range rather than a fixed value? It sounds reasonable to me.
We can just do that like for noise_db_range. @kamo-naoyuki What do you think?

One concern is that we need to care about the behavior when self.train is False. The scale should be deterministic in these cases.

Yes that would be good. There are many instances where you want to reduce for example the overall level of the speech utterances e.g. to simulate far field speech (correctly you re scale back after reverberation but sometime you want to simulate the low gain of speech captured by distant mics).

I just added support of it (only for EnhPreprocessor).

Isn't it better to scale the energy in dB instead of the peak value ?

espnet2/train/preprocessor.py

mergify · 2022-05-18T21:34:43Z

This pull request is now in conflict :(

kamo-naoyuki · 2022-05-18T22:10:33Z

I changed ci tests to check the import order by isort #4372, so please also apply isort after resolving the conflicts. Sorry for bothering you.

Emrys365 · 2022-05-19T03:41:24Z

I changed ci tests to check the import order by isort #4372, so please also apply isort after resolving the conflicts. Sorry for bothering you.

Sure, no problem.

Emrys365 · 2022-05-21T07:36:49Z

Hi @popcornell, could you review again? I made some changes to resolve all comments.

popcornell · 2022-05-23T02:51:37Z

espnet2/train/preprocessor.py

+                    noise = np.pad(
+                        noise,
+                        [(offset, nsamples - f.frames - offset), (0, 0)],
+                        mode="wrap",


This is nice but could lead to funny noise clips if the noise is very short. Maybe raise a warning ? (but could be too verbose as it is in the dataloader).

Thanks for the comment. I have added support for it.

sw005320 · 2022-06-06T22:09:19Z

Thanks, @Emrys365!

Add EnhPreprocessor to the Enh task

11bdb07

Emrys365 added ESPnet2 SE Speech enhancement labels Apr 26, 2022

sw005320 added the New Features label Apr 26, 2022

sw005320 added this to the v.202205 milestone Apr 26, 2022

Emrys365 added 4 commits April 27, 2022 15:17

Merge branch 'master' of github.com:espnet/espnet into enh_func

ebe8823

Fix backward compatibility issue

828c3f9

Minor fix

ff61b71

Fix the random seed issue in parallel workers of DataLoader

a769214

Emrys365 added 2 commits April 28, 2022 21:20

Add an integration test for EnhPreprocessor

cf18771

Update Speech Enhancement documentation

30d2b98

mergify bot added ESPnet1 README CI Travis, Circle CI, etc labels Apr 28, 2022

popcornell reviewed Apr 29, 2022

View reviewed changes

Fix the signal scale when computing early signals

26569d1

mergify bot added the conflicts label May 18, 2022

Merge branch 'master' into enh_func

b7af78d

mergify bot removed the conflicts label May 20, 2022

Emrys365 added 3 commits May 20, 2022 20:33

Support variable maximum volume in EnhPreprocessor

97dccec

Fix the isort issue

adeac00

Adjust help message for --noise_db_range

618edd7

Emrys365 force-pushed the enh_func branch from 9626686 to 651163b Compare May 20, 2022 13:07

Adjust behavior of speech_volume_normalize when self.train is False

35bb7f1

Emrys365 force-pushed the enh_func branch from 651163b to 35bb7f1 Compare May 20, 2022 19:15

popcornell reviewed May 23, 2022

View reviewed changes

kan-bayashi modified the milestones: v.202205, v.202206 May 26, 2022

Emrys365 added 3 commits May 28, 2022 15:02

Merge branch 'master' of github.com:espnet/espnet into enh_func

5f24f9e

Add support for checking noise and speech length ratio in dynamic mixing

af80007

Add support for checking noise and speech length ratio in dynamic mixing

afe5131

sw005320 merged commit 4806c23 into espnet:master Jun 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add EnhPreprocessor for Speech Enhancement #4321

Add EnhPreprocessor for Speech Enhancement #4321

Emrys365 commented Apr 26, 2022 •

edited

codecov bot commented Apr 26, 2022 •

edited

Emrys365 commented Apr 28, 2022 •

edited

sw005320 commented Apr 28, 2022

sw005320 commented Apr 28, 2022

Emrys365 commented Apr 28, 2022

sw005320 commented Apr 28, 2022

Emrys365 commented Apr 28, 2022 •

edited

popcornell commented Apr 29, 2022

sw005320 commented Apr 29, 2022

popcornell commented Apr 29, 2022

popcornell Apr 29, 2022

Emrys365 Apr 29, 2022 •

edited

popcornell Apr 29, 2022

Emrys365 May 20, 2022

popcornell May 23, 2022 •

edited

mergify bot commented May 18, 2022

kamo-naoyuki commented May 18, 2022

Emrys365 commented May 19, 2022

Emrys365 commented May 21, 2022

popcornell May 23, 2022

Emrys365 Jun 6, 2022

sw005320 commented Jun 6, 2022

Add EnhPreprocessor for Speech Enhancement #4321

Add EnhPreprocessor for Speech Enhancement #4321

Conversation

Emrys365 commented Apr 26, 2022 • edited

codecov bot commented Apr 26, 2022 • edited

Codecov Report

Emrys365 commented Apr 28, 2022 • edited

sw005320 commented Apr 28, 2022

sw005320 commented Apr 28, 2022

Emrys365 commented Apr 28, 2022

sw005320 commented Apr 28, 2022

Emrys365 commented Apr 28, 2022 • edited

popcornell commented Apr 29, 2022

sw005320 commented Apr 29, 2022

popcornell commented Apr 29, 2022

popcornell Apr 29, 2022

Choose a reason for hiding this comment

Emrys365 Apr 29, 2022 • edited

Choose a reason for hiding this comment

popcornell Apr 29, 2022

Choose a reason for hiding this comment

Emrys365 May 20, 2022

Choose a reason for hiding this comment

popcornell May 23, 2022 • edited

Choose a reason for hiding this comment

mergify bot commented May 18, 2022

kamo-naoyuki commented May 18, 2022

Emrys365 commented May 19, 2022

Emrys365 commented May 21, 2022

popcornell May 23, 2022

Choose a reason for hiding this comment

Emrys365 Jun 6, 2022

Choose a reason for hiding this comment

sw005320 commented Jun 6, 2022

Emrys365 commented Apr 26, 2022 •

edited

codecov bot commented Apr 26, 2022 •

edited

Emrys365 commented Apr 28, 2022 •

edited

Emrys365 commented Apr 28, 2022 •

edited

Emrys365 Apr 29, 2022 •

edited

popcornell May 23, 2022 •

edited