Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SE function updates: new models and support for handling various sampling frequencies #5800

Merged
merged 14 commits into from
Jun 13, 2024

Conversation

Emrys365
Copy link
Collaborator

@Emrys365 Emrys365 commented Jun 5, 2024

What?

This PR updates the speech enhancement functions from the following aspects:

  • Added three new models: BSRNN, TF-GridNetV3 (modified based on V2 to support various sampling rates, not officially named), and mapping-based TCN
  • Improved espnet2/bin/enh_inference.py and espnet2/bin/enh_scoring.py to support various sampling rates
  • Improved espnet2/iterators/chunk_iter_factory.py to support
    • keeping short samples (for avoiding wasting many short training samples)
    • and truncating samples to a max length (for avoiding OOM) when handling different sampling rates
  • Added a new augmentation type bandwidth_limitation in espnet2/layers/augmentation.py
  • Added a new argument always_forward_in_48k in espnet2/enh/espnet_model.py to support handling different sampling rates in a single SE model
  • Improved l1_timedomain+magspec_loss in espnet2/enh/loss/criterions/time_domain.py to be more numerically stable.

Why?

These updates provide important functions for the forthcoming URGENT Challenge.

See also

…mprove espnet2/bin/enh_inference.py and espnet2/bin/enh_scoring.py to support various sampling rates; improve ChunkIterator to support keeping short samples and truncating samples to a max length; Add bandwidth_limitation in espnet2/layers/augmentation.py; Update espnet2/enh/espnet_model.py to support handling different sampling rates in one model
@Emrys365 Emrys365 added ESPnet2 SE Speech enhancement labels Jun 5, 2024
@mergify mergify bot added the CI Travis, Circle CI, etc label Jun 5, 2024
@sw005320
Copy link
Contributor

sw005320 commented Jun 7, 2024

Cool!

@LiChenda and @kohei0209, can you review this PR?

@LiChenda
Copy link
Contributor

LiChenda commented Jun 8, 2024

Cool!

@LiChenda and @kohei0209, can you review this PR?

Sure!

Copy link

codecov bot commented Jun 8, 2024

Codecov Report

Attention: Patch coverage is 81.69935% with 84 lines in your changes missing coverage. Please review.

Project coverage is 52.59%. Comparing base (293c1cc) to head (e707989).

Files Patch % Lines
espnet2/enh/espnet_model.py 38.46% 16 Missing ⚠️
espnet2/enh/separator/tfgridnetv3_separator.py 91.53% 16 Missing ⚠️
espnet2/layers/augmentation.py 25.00% 15 Missing ⚠️
espnet2/bin/enh_inference.py 55.55% 12 Missing ⚠️
espnet2/bin/enh_scoring.py 9.09% 10 Missing ⚠️
espnet2/iterators/chunk_iter_factory.py 70.37% 8 Missing ⚠️
espnet2/enh/layers/bsrnn.py 98.21% 2 Missing ⚠️
espnet2/train/preprocessor.py 33.33% 2 Missing ⚠️
espnet2/enh/loss/criterions/time_domain.py 66.66% 1 Missing ⚠️
espnet2/enh/separator/bsrnn_separator.py 96.29% 1 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5800      +/-   ##
==========================================
- Coverage   54.59%   52.59%   -2.01%     
==========================================
  Files         771      775       +4     
  Lines       70732    71155     +423     
==========================================
- Hits        38616    37422    -1194     
- Misses      32116    33733    +1617     
Flag Coverage Δ
test_python_espnet2 52.59% <81.69%> (-0.39%) ⬇️
test_utils ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sw005320 sw005320 added this to the v.202405 milestone Jun 10, 2024
Copy link
Contributor

@LiChenda LiChenda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Emrys365 , I updated some of my comments.

ref_ = ref[i]
inf_ = inf[int(perm[i])]
elif sample_rate > 16000:
mode = "wb"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd better add some log or warning here.

Copy link
Collaborator Author

@Emrys365 Emrys365 Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been added in the lines below.

@@ -116,7 +116,6 @@ def _reconfig_for_fs(self, fs):
Args:
fs (int): new sampling rate
"""
assert fs % self.default_fs == 0 or self.default_fs % fs == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this assertion was removed?

Copy link
Collaborator Author

@Emrys365 Emrys365 Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to allow the process of speech with a sampling rate that is fractional of the encoder's default sampling rate. It doesn't have to be exactly 1/n or a multiple, so I removed this line.

@@ -120,7 +120,6 @@ def _reconfig_for_fs(self, fs):
Args:
fs (int): new sampling rate
""" # noqa: H405
assert fs % self.default_fs == 0 or self.default_fs % fs == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this assertion was removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to allow the process of speech with a sampling rate that is fractional of the encoder's default sampling rate. It doesn't have to be exactly 1/n or a multiple, so I removed this line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the consideration of adding a new tcn_separator2.py instead of updating functions in tcn_separator.py ? Is it necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have merged them!

@sw005320 sw005320 added auto-merge Enable auto-merge labels Jun 13, 2024
@sw005320
Copy link
Contributor

OK for me.
Once CI is passed, I'll merge this PR.

@sw005320
Copy link
Contributor

@Emrys365, we have an issue in the CI https://github.com/espnet/espnet/actions/runs/9500003127/job/26182089849?pr=5800
Please fix it.

@Emrys365
Copy link
Collaborator Author

OK! It should be fixed now.

@kohei0209
Copy link
Contributor

@Emrys365 Sorry for the late response. Codes look good to me!

@mergify mergify bot merged commit 63c4c09 into espnet:master Jun 13, 2024
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Enable auto-merge CI Travis, Circle CI, etc ESPnet2 SE Speech enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants