-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SE function updates: new models and support for handling various sampling frequencies #5800
Conversation
…mprove espnet2/bin/enh_inference.py and espnet2/bin/enh_scoring.py to support various sampling rates; improve ChunkIterator to support keeping short samples and truncating samples to a max length; Add bandwidth_limitation in espnet2/layers/augmentation.py; Update espnet2/enh/espnet_model.py to support handling different sampling rates in one model
for more information, see https://pre-commit.ci
Cool! @LiChenda and @kohei0209, can you review this PR? |
Sure! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5800 +/- ##
==========================================
- Coverage 54.59% 52.59% -2.01%
==========================================
Files 771 775 +4
Lines 70732 71155 +423
==========================================
- Hits 38616 37422 -1194
- Misses 32116 33733 +1617
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Emrys365 , I updated some of my comments.
ref_ = ref[i] | ||
inf_ = inf[int(perm[i])] | ||
elif sample_rate > 16000: | ||
mode = "wb" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd better add some log or warning here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been added in the lines below.
@@ -116,7 +116,6 @@ def _reconfig_for_fs(self, fs): | |||
Args: | |||
fs (int): new sampling rate | |||
""" | |||
assert fs % self.default_fs == 0 or self.default_fs % fs == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this assertion was removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to allow the process of speech with a sampling rate that is fractional of the encoder's default sampling rate. It doesn't have to be exactly 1/n or a multiple, so I removed this line.
@@ -120,7 +120,6 @@ def _reconfig_for_fs(self, fs): | |||
Args: | |||
fs (int): new sampling rate | |||
""" # noqa: H405 | |||
assert fs % self.default_fs == 0 or self.default_fs % fs == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this assertion was removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to allow the process of speech with a sampling rate that is fractional of the encoder's default sampling rate. It doesn't have to be exactly 1/n or a multiple, so I removed this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the consideration of adding a new tcn_separator2.py
instead of updating functions in tcn_separator.py
? Is it necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have merged them!
OK for me. |
@Emrys365, we have an issue in the CI https://github.com/espnet/espnet/actions/runs/9500003127/job/26182089849?pr=5800 |
OK! It should be fixed now. |
@Emrys365 Sorry for the late response. Codes look good to me! |
What?
This PR updates the speech enhancement functions from the following aspects:
espnet2/bin/enh_inference.py
andespnet2/bin/enh_scoring.py
to support various sampling ratesespnet2/iterators/chunk_iter_factory.py
to supportbandwidth_limitation
inespnet2/layers/augmentation.py
always_forward_in_48k
inespnet2/enh/espnet_model.py
to support handling different sampling rates in a single SE modell1_timedomain+magspec_loss
inespnet2/enh/loss/criterions/time_domain.py
to be more numerically stable.Why?
These updates provide important functions for the forthcoming URGENT Challenge.
See also