Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new separator for speech enhancement/separation tasks #4062

Merged
merged 11 commits into from
Feb 28, 2022

Conversation

LiChenda
Copy link
Contributor

This PR is an implementation for Skipping Memory (SkiM) net.
It is a low-computational-cost model for long sequence modeling.
In the original paper, it is used for low-latency real-time Continuous Speech Separation (CSS).
Currently, the CSS task is not ready in the main ESPNet branch. So this PR mainly Implements the SkiM model.
For testing the model itself, I also add two related configuration files (causal and non-causal) in the WSJ0-2mix recipe. (By using a small encoder kernel size, the utterance-level input also generates a long feature sequence.)

@mergify mergify bot added the ESPnet2 label Feb 11, 2022
@codecov
Copy link

codecov bot commented Feb 11, 2022

Codecov Report

Merging #4062 (ce2dee8) into master (637d8c3) will increase coverage by 0.07%.
The diff coverage is 98.35%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4062      +/-   ##
==========================================
+ Coverage   80.94%   81.02%   +0.07%     
==========================================
  Files         435      437       +2     
  Lines       37651    37822     +171     
==========================================
+ Hits        30477    30645     +168     
- Misses       7174     7177       +3     
Flag Coverage Δ
test_integration_espnet1 67.13% <ø> (ø)
test_integration_espnet2 51.81% <19.23%> (-0.27%) ⬇️
test_python 66.83% <98.35%> (+0.14%) ⬆️
test_utils 24.45% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
espnet2/enh/layers/dprnn.py 100.00% <ø> (ø)
espnet2/enh/layers/tcn.py 95.90% <96.15%> (-0.37%) ⬇️
espnet2/enh/separator/skim_separator.py 97.05% <97.05%> (ø)
espnet2/enh/layers/skim.py 99.17% <99.17%> (ø)
espnet2/tasks/enh.py 99.06% <100.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 637d8c3...ce2dee8. Read the comment docs.

@Emrys365 Emrys365 added the SE Speech enhancement label Feb 11, 2022
@sw005320 sw005320 added this to the v.0.10.7 milestone Feb 11, 2022
Copy link
Contributor

@sw005320 sw005320 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a result to wsj0_2mix/enh1/README.md with pre-trained models?
WSJ0_2mix is not a suitable example, but it is still useful.

espnet2/enh/layers/skim.py Show resolved Hide resolved
@@ -253,20 +258,37 @@ def forward(self, y):
Returns:
cLN_y: [M, N, K]
"""
dim = 3
if y.dim() == 4:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if-branch is never used in the codebase, so I removed this case.

@sw005320
Copy link
Contributor

LGTM for me.
@Emrys365, do you have some comments?
I think this PR is already in a good shape.

@LiChenda
Copy link
Contributor Author

LiChenda commented Feb 15, 2022

How about adding a result to wsj0_2mix/enh1/README.md with pre-trained models? WSJ0_2mix is not a suitable example, but it is still useful.

Yes, I have some pre-trained models on WSJ0_2mix. However, I rewrote the code for this open-source PR to avoid some potential IP issues, thus the previously trained model can't be directly loaded due to some renaming. Modifying the parameter keys in the previous checkpoint file might be too hacking. So, I am training a new model with this codebase. The progress looks good. I'll upload a pre-trained model when the training is done.

Copy link
Collaborator

@Emrys365 Emrys365 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I just added some minor comments.

espnet2/enh/separator/skim_separator.py Outdated Show resolved Hide resolved
espnet2/enh/layers/tcn.py Outdated Show resolved Hide resolved
espnet2/enh/layers/tcn.py Outdated Show resolved Hide resolved
egs2/wsj0_2mix/enh1/conf/tuning/train_enh_skim_tasnet.yaml Outdated Show resolved Hide resolved
egs2/wsj0_2mix/enh1/conf/tuning/train_enh_skim_tasnet.yaml Outdated Show resolved Hide resolved
espnet2/enh/layers/skim.py Outdated Show resolved Hide resolved
espnet2/enh/layers/skim.py Outdated Show resolved Hide resolved
@sw005320
Copy link
Contributor

After we fix @Emrys365's comments and upload a model, we can marge this PR.

@mergify mergify bot added the README label Feb 23, 2022
@LiChenda
Copy link
Contributor Author

The CI failure seems to be caused by a pytorch_complex-related issue. Maybe we can retry the CI test when the issue is solved.

@Emrys365 Emrys365 merged commit 5edb478 into espnet:master Feb 28, 2022
@LiChenda LiChenda deleted the skim branch April 1, 2022 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants