SE function updates: new models and support for handling various sampling frequencies #5800

Emrys365 · 2024-06-05T08:18:55Z

What?

This PR updates the speech enhancement functions from the following aspects:

Added three new models: BSRNN, TF-GridNetV3 (modified based on V2 to support various sampling rates, not officially named), and mapping-based TCN
Improved espnet2/bin/enh_inference.py and espnet2/bin/enh_scoring.py to support various sampling rates
Improved espnet2/iterators/chunk_iter_factory.py to support
- keeping short samples (for avoiding wasting many short training samples)
- and truncating samples to a max length (for avoiding OOM) when handling different sampling rates
Added a new augmentation type bandwidth_limitation in espnet2/layers/augmentation.py
Added a new argument always_forward_in_48k in espnet2/enh/espnet_model.py to support handling different sampling rates in a single SE model
Improved l1_timedomain+magspec_loss in espnet2/enh/loss/criterions/time_domain.py to be more numerically stable.

Why?

These updates provide important functions for the forthcoming URGENT Challenge.

Codecov Report

Attention: Patch coverage is 81.69935% with 84 lines in your changes missing coverage. Please review.

Project coverage is 52.59%. Comparing base (293c1cc) to head (e707989).

Files	Patch %	Lines
espnet2/enh/espnet_model.py	38.46%	16 Missing ⚠️
espnet2/enh/separator/tfgridnetv3_separator.py	91.53%	16 Missing ⚠️
espnet2/layers/augmentation.py	25.00%	15 Missing ⚠️
espnet2/bin/enh_inference.py	55.55%	12 Missing ⚠️
espnet2/bin/enh_scoring.py	9.09%	10 Missing ⚠️
espnet2/iterators/chunk_iter_factory.py	70.37%	8 Missing ⚠️
espnet2/enh/layers/bsrnn.py	98.21%	2 Missing ⚠️
espnet2/train/preprocessor.py	33.33%	2 Missing ⚠️
espnet2/enh/loss/criterions/time_domain.py	66.66%	1 Missing ⚠️
espnet2/enh/separator/bsrnn_separator.py	96.29%	1 Missing ⚠️
... and 1 more

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5800      +/-   ##
==========================================
- Coverage   54.59%   52.59%   -2.01%     
==========================================
  Files         771      775       +4     
  Lines       70732    71155     +423     
==========================================
- Hits        38616    37422    -1194     
- Misses      32116    33733    +1617

Flag	Coverage Δ
test_python_espnet2	`52.59% <81.69%> (-0.39%)`	⬇️
test_utils	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

for more information, see https://pre-commit.ci

LiChenda

Hi @Emrys365 , I updated some of my comments.

LiChenda · 2024-06-11T15:22:22Z

espnet2/bin/enh_scoring.py

+                        ref_ = ref[i]
+                        inf_ = inf[int(perm[i])]
+                    elif sample_rate > 16000:
+                        mode = "wb"


We'd better add some log or warning here.

It has been added in the lines below.

LiChenda · 2024-06-11T15:23:19Z

espnet2/enh/decoder/stft_decoder.py

@@ -116,7 +116,6 @@ def _reconfig_for_fs(self, fs):
        Args:
            fs (int): new sampling rate
        """
-        assert fs % self.default_fs == 0 or self.default_fs % fs == 0


Why this assertion was removed?

This is to allow the process of speech with a sampling rate that is fractional of the encoder's default sampling rate. It doesn't have to be exactly 1/n or a multiple, so I removed this line.

LiChenda · 2024-06-11T15:24:22Z

espnet2/enh/encoder/stft_encoder.py

@@ -120,7 +120,6 @@ def _reconfig_for_fs(self, fs):
        Args:
            fs (int): new sampling rate
        """  # noqa: H405
-        assert fs % self.default_fs == 0 or self.default_fs % fs == 0


Why this assertion was removed?

This is to allow the process of speech with a sampling rate that is fractional of the encoder's default sampling rate. It doesn't have to be exactly 1/n or a multiple, so I removed this line.

LiChenda · 2024-06-11T15:26:06Z

espnet2/enh/separator/tcn_separator2.py

What is the consideration of adding a new tcn_separator2.py instead of updating functions in tcn_separator.py ? Is it necessary?

I have merged them!

sw005320 · 2024-06-13T12:58:44Z

OK for me.
Once CI is passed, I'll merge this PR.

sw005320 · 2024-06-13T13:43:00Z

@Emrys365, we have an issue in the CI https://github.com/espnet/espnet/actions/runs/9500003127/job/26182089849?pr=5800
Please fix it.

Emrys365 · 2024-06-13T14:57:23Z

OK! It should be fixed now.

kohei0209 · 2024-06-13T15:47:47Z

@Emrys365 Sorry for the late response. Codes look good to me!

Emrys365 added ESPnet2 SE Speech enhancement labels Jun 5, 2024

pre-commit-ci bot and others added 2 commits June 5, 2024 08:20

[pre-commit.ci] auto fixes from pre-commit.com hooks

54a2e42

for more information, see https://pre-commit.ci

Fix CI errors

4870579

mergify bot added the CI Travis, Circle CI, etc label Jun 5, 2024

pre-commit-ci bot and others added 2 commits June 5, 2024 11:12

[pre-commit.ci] auto fixes from pre-commit.com hooks

6f10aed

for more information, see https://pre-commit.ci

Fix CI errors in espnet2/bin/enh_inference.py

6cd285a

Emrys365 added 2 commits June 8, 2024 00:10

Fix a bug in espnet2/bin/enh_inference.py

6bfb5a7

Update egs2/mini_an4/enh1/conf/train_with_chunk_iterator_debug.yaml

93db866

Update egs2/mini_an4/enh1/conf/train_with_chunk_iterator_debug.yaml

0c73791

Emrys365 and others added 3 commits June 9, 2024 22:56

Update tfgridnetv3 related code

347aac8

[pre-commit.ci] auto fixes from pre-commit.com hooks

3a07fd4

for more information, see https://pre-commit.ci

Fix code style in espnet2/enh/separator/tfgridnetv3_separator.py

b761de6

sw005320 added this to the v.202405 milestone Jun 10, 2024

LiChenda reviewed Jun 11, 2024

View reviewed changes

Emrys365 and others added 2 commits June 12, 2024 14:43

Merge TCN2 into TCN

17d977d

Merge branch 'master' into urgent_recipe

3b9519e

sw005320 added auto-merge Enable auto-merge labels Jun 13, 2024

Fix CI errors related to TCN

e707989

mergify bot merged commit 63c4c09 into espnet:master Jun 13, 2024
35 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SE function updates: new models and support for handling various sampling frequencies #5800

SE function updates: new models and support for handling various sampling frequencies #5800

Emrys365 commented Jun 5, 2024

sw005320 commented Jun 7, 2024

LiChenda commented Jun 8, 2024

codecov bot commented Jun 8, 2024 •

edited

Loading

LiChenda left a comment

LiChenda Jun 11, 2024

Emrys365 Jun 12, 2024 •

edited

Loading

LiChenda Jun 11, 2024

Emrys365 Jun 12, 2024 •

edited

Loading

LiChenda Jun 11, 2024

Emrys365 Jun 12, 2024

LiChenda Jun 11, 2024

Emrys365 Jun 12, 2024

sw005320 commented Jun 13, 2024

sw005320 commented Jun 13, 2024

Emrys365 commented Jun 13, 2024

kohei0209 commented Jun 13, 2024

SE function updates: new models and support for handling various sampling frequencies #5800

SE function updates: new models and support for handling various sampling frequencies #5800

Conversation

Emrys365 commented Jun 5, 2024

What?

Why?

See also

sw005320 commented Jun 7, 2024

LiChenda commented Jun 8, 2024

codecov bot commented Jun 8, 2024 • edited Loading

Codecov Report

LiChenda left a comment

Choose a reason for hiding this comment

LiChenda Jun 11, 2024

Choose a reason for hiding this comment

Emrys365 Jun 12, 2024 • edited Loading

Choose a reason for hiding this comment

LiChenda Jun 11, 2024

Choose a reason for hiding this comment

Emrys365 Jun 12, 2024 • edited Loading

Choose a reason for hiding this comment

LiChenda Jun 11, 2024

Choose a reason for hiding this comment

Emrys365 Jun 12, 2024

Choose a reason for hiding this comment

LiChenda Jun 11, 2024

Choose a reason for hiding this comment

Emrys365 Jun 12, 2024

Choose a reason for hiding this comment

sw005320 commented Jun 13, 2024

sw005320 commented Jun 13, 2024

Emrys365 commented Jun 13, 2024

kohei0209 commented Jun 13, 2024

codecov bot commented Jun 8, 2024 •

edited

Loading

Emrys365 Jun 12, 2024 •

edited

Loading

Emrys365 Jun 12, 2024 •

edited

Loading