Add diffusion-base SE model to ESPnet-SE #5572

LiChenda · 2023-11-28T17:10:55Z

What?

Implement DCUNET in "Welker S, Richter J, Gerkmann T. Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain".
New Python files:

├── decoder
├── encoder
├── separator
├── layers
│   └── dcunet.py
├── diffusion
│   ├── __init__.py
│   ├── abs_diffusion.py
│   ├── sampling
│   │   ├── __init__.py
│   │   ├── correctors.py
│   │   └── predictors.py
│   ├── score_based_diffusion.py
│   └── sdes.py
└── diffusion_enh.py

Add an enhancement recipe in the WSJ dataset.
Update STFT/iSTFT enc/dec with spectrum transform functions (exponent and log transform)

Why?

Extend ESPnet-SE to support diffusion-based generative enhancement models.

Others

Working in progress, Debugging and tuning models.

for more information, see https://pre-commit.ci

egs2/wsj/derever1/conf/tuning/train_enh_blstm_tf.yaml

egs2/wsj/derever1/conf/tuning/train_enh_sgmse_ncsnpp.yaml

egs2/wsj/derever1/local/convert2wav.sh

egs2/wsj/derever1/local/create_wsj0_reverb.py

egs2/wsj/derever1/run.sh

espnet2/enh/diffusion_enh.py

espnet2/enh/espnet_model.py

Emrys365 · 2024-01-02T20:41:29Z

espnet2/layers/stft.py

 from torch_complex.tensor import ComplexTensor
 from typeguard import check_argument_types

 from espnet2.enh.layers.complex_utils import to_complex
 from espnet2.layers.inversible_interface import InversibleInterface
 from espnet.nets.pytorch_backend.nets_utils import make_pad_mask

+is_torch_1_10_plus = V(torch.__version__) >= V("1.10.0")


You don't need to add it because current ESPnet already dropped support for PyTorch versions before 1.12.1.

See line 93 of espnet2/layers/stft.py. Add this check to use native support for FFT and STFT on all CPU targets including ARM.

test/espnet2/enh/diffusion/test_score_based_diffusion.py

LiChenda · 2024-01-02T20:46:10Z

Maybe it is better to rename the recipe to egs2/wsj0_reverb/enh1 or egs2/wsj0_chime3/enh1 instead of using wsj?

Done.

for more information, see https://pre-commit.ci

egs2/wsj0_chime3/enh1/local/data.sh

egs2/wsj0_reverb/enh1/local/data.sh

Co-authored-by: Wangyou Zhang <C0me_On@163.com>

LiChenda · 2024-01-03T20:01:13Z

Hi, @sw005320 , I also asked @popcornell to help review this PR.

sw005320

I have some high-level comments.

espnet2/tasks/enh.py

espnet2/enh/layers/ncsnpp_utils/upfirdn2d.py

sw005320 · 2024-01-03T22:08:20Z

espnet2/enh/layers/ncsnpp_utils/up_or_down_sampling.py

It seems that a lot of lines are not tested, according to Codecov. Can you double-check it?

It may be ok as this code is taken mostly as it is from sgmse

The files in ncsnpp_utils are taken from the sgmse repo. Some of them are not used, and thus not tested. Should I add tests to those unused code lines or just remove them? @sw005320 .

I see.
The discussions are whether we can keep the unused functions or not.

If we will use them in the future, we can add tests

If not, maybe, we can keep them as they are or we can remove them.

I added unit tests for the unused NCSNpp functions as much as I could.

sw005320 · 2024-01-03T22:08:58Z

espnet2/enh/layers/ncsnpp_utils/normalization.py

It seems that a lot of lines are not tested, according to Codecov. Can you double-check it?

sw005320 · 2024-01-03T22:09:27Z

espnet2/enh/layers/ncsnpp_utils/layerspp.py

It seems that a lot of lines are not tested, according to Codecov. Can you double-check it?

espnet2/enh/encoder/stft_encoder.py

sw005320 · 2024-01-03T22:12:59Z

espnet2/enh/encoder/stft_encoder.py

+        spec_factor: float = 0.15,
+        spec_abs_exponent: float = 0.5,


Where do these values come from?
Can you explain them?

I think they come largely from SGMSE https://arxiv.org/pdf/2203.17004.pdf but for spec factor they used 1/3 there

if the waveform is standard dev normalized to 1, using 1/3 as spec factor bounds the STFT max value to +- 1.5.
I think using 0.15 bounds instead to -+ 0.75 which may actually be better.
Do you normalize the waveform std @LiChenda ?

diffusion is super sensitive to input min max range

@sw005320 , these numbers come from section V.D in [1]. I added comments to the code. @popcornell , in their journal paper [1], they use 0.15 and 0.5. I did not normalize the waveform std.

[1] J. Richter, S. Welker, J.-M. Lemercier, B. Lay, and T. Gerkmann, “Speech Enhancement and Dereverberation With Diffusion-Based Generative Models,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2351–2364, 2023.

Do you normalize by the .abs().amax() to be always within -1 and 1 ?
https://github.com/sp-uhh/sgmse/blob/c6e3291ee56b07792c9d8c7d7d49487b3042e01b/sgmse/data_module.py#L72

I added the normalize option but didn't use it in my model training by default. Because I feel it is not reasonable for causal models.

espnet/espnet2/enh/diffusion_enh.py

Line 113 in 12c0eb2

normfac = speech_mix.abs().max() * 1.1 + 1e-5

test/espnet2/enh/decoder/test_stft_decoder.py

espnet2/enh/layers/ncsnpp_utils/up_or_down_sampling.py

for more information, see https://pre-commit.ci

sw005320 · 2024-01-16T16:47:46Z

LGTM.
If it is ready, please let me know.
I’ll merge this PR.

for more information, see https://pre-commit.ci

LiChenda · 2024-01-16T23:07:09Z

LGTM. If it is ready, please let me know. I’ll merge this PR.

Hi, @sw005320 , I added unit tests for the unused NCSNpp functions as much as I could. Please merge it when the CI test is passed.

LiChenda and others added 8 commits November 28, 2023 04:57

add spec transform function to stft enc/dec

9ee47bf

WIP

7bc6e91

sgmse training done

7c200c1

update inference script

394e29f

update inference script

af5c350

fix a bug

5259d09

merge

7700d01

fix score func

d10d0fc

mergify bot added the ESPnet2 label Nov 28, 2023

pre-commit-ci bot and others added 3 commits November 28, 2023 17:12

[pre-commit.ci] auto fixes from pre-commit.com hooks

ce54932

for more information, see https://pre-commit.ci

fix an issue

c652235

merge remote

c7c8ba8

sw005320 added SE Speech enhancement New Features labels Nov 28, 2023

sw005320 added this to the v.202312 milestone Nov 28, 2023

sw005320 requested a review from Emrys365 November 28, 2023 19:42

LiChenda and others added 7 commits November 29, 2023 14:47

add normlization

879eff7

[pre-commit.ci] auto fixes from pre-commit.com hooks

b7c660b

for more information, see https://pre-commit.ci

add normlization

e11c838

merge

957d0ae

update norm

7847779

add ncsnpp model

6d970e8

update ncsnpp model

7ec7746

mergify bot added the Installation label Nov 30, 2023

pre-commit-ci bot and others added 6 commits November 30, 2023 02:21

[pre-commit.ci] auto fixes from pre-commit.com hooks

e9b2c37

for more information, see https://pre-commit.ci

update ncsnpp config file

d5cb89c

update denoising and dereverb recipe

2e95e5a

[pre-commit.ci] auto fixes from pre-commit.com hooks

9fddc92

for more information, see https://pre-commit.ci

add lazy import for NCSNpp

5301fc7

Merge remote-tracking branch 'origin/sgmse' into sgmse

e3e9a32

LiChenda added 3 commits January 3, 2024 04:43

update test

3ee811a

Merge remote-tracking branch 'upstream/master' into sgmse

765b781

solve conficts

cd7b4b2

Emrys365 reviewed Jan 2, 2024

View reviewed changes

pre-commit-ci bot and others added 6 commits January 2, 2024 20:46

[pre-commit.ci] auto fixes from pre-commit.com hooks

f09484b

for more information, see https://pre-commit.ci

update for review

6daab39

update 2 for review

eaa0ccc

update for test

749b256

merge

64787a9

[pre-commit.ci] auto fixes from pre-commit.com hooks

258b697

for more information, see https://pre-commit.ci

LiChenda changed the title ~~[WIP] Add diffusion-base SE model to ESPnet-SE~~ Add diffusion-base SE model to ESPnet-SE Jan 3, 2024

Emrys365 reviewed Jan 3, 2024

View reviewed changes

egs2/wsj0_chime3/enh1/local/data.sh Outdated Show resolved Hide resolved

egs2/wsj0_reverb/enh1/local/data.sh Outdated Show resolved Hide resolved

LiChenda and others added 2 commits January 2, 2024 23:43

Update egs2/wsj0_chime3/enh1/local/data.sh

d4ed763

Co-authored-by: Wangyou Zhang <C0me_On@163.com>

Update egs2/wsj0_reverb/enh1/local/data.sh

94939f4

Co-authored-by: Wangyou Zhang <C0me_On@163.com>

sw005320 reviewed Jan 3, 2024

View reviewed changes

popcornell reviewed Jan 7, 2024

View reviewed changes

test/espnet2/enh/decoder/test_stft_decoder.py Show resolved Hide resolved

popcornell reviewed Jan 7, 2024

View reviewed changes

espnet2/enh/layers/ncsnpp_utils/up_or_down_sampling.py Outdated Show resolved Hide resolved

LiChenda and others added 2 commits January 16, 2024 05:43

reflect to comments

1e29489

[pre-commit.ci] auto fixes from pre-commit.com hooks

12c0eb2

for more information, see https://pre-commit.ci

LiChenda and others added 5 commits January 17, 2024 06:57

add test

b7da277

update

662e2d5

Merge remote-tracking branch 'origin/sgmse' into sgmse

e44c956

Merge remote-tracking branch 'upstream/master' into sgmse

c23c0bc

[pre-commit.ci] auto fixes from pre-commit.com hooks

d3382bd

for more information, see https://pre-commit.ci

sw005320 added the auto-merge Enable auto-merge label Jan 16, 2024

mergify bot merged commit 0dc18d6 into espnet:master Jan 17, 2024
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add diffusion-base SE model to ESPnet-SE #5572

Add diffusion-base SE model to ESPnet-SE #5572

LiChenda commented Nov 28, 2023

Emrys365 Jan 2, 2024

LiChenda Jan 2, 2024

LiChenda commented Jan 2, 2024

LiChenda commented Jan 3, 2024

sw005320 left a comment

sw005320 Jan 3, 2024

popcornell Jan 7, 2024

LiChenda Jan 15, 2024

sw005320 Jan 16, 2024

LiChenda Jan 16, 2024

sw005320 Jan 3, 2024

sw005320 Jan 3, 2024

sw005320 Jan 3, 2024

popcornell Jan 7, 2024 •

edited

popcornell Jan 7, 2024

popcornell Jan 7, 2024

LiChenda Jan 15, 2024

popcornell Jan 15, 2024

LiChenda Jan 16, 2024

popcornell Jan 16, 2024

sw005320 commented Jan 16, 2024

LiChenda commented Jan 16, 2024

Add diffusion-base SE model to ESPnet-SE #5572

Add diffusion-base SE model to ESPnet-SE #5572

Conversation

LiChenda commented Nov 28, 2023

What?

Why?

Others

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LiChenda commented Jan 2, 2024

LiChenda commented Jan 3, 2024

sw005320 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

popcornell Jan 7, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sw005320 commented Jan 16, 2024

LiChenda commented Jan 16, 2024

popcornell Jan 7, 2024 •

edited