Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GAN SVS] Add VISinger2, UHifiGAN, Avocodo #5123

Merged
merged 65 commits into from May 23, 2023
Merged

Conversation

jerryuhoo
Copy link
Contributor

@jerryuhoo jerryuhoo commented Apr 17, 2023

@ftshijt @A-Quarter-Mile
This is an update for VISinger, I added multiple modules.

  • Update VISinger 1 generator
    • Add an option to use phoneme predictor or not
    • Add an option to use flow or not
  • Add VISinger 2 generator
  • Add VISinger 2 vocoder generator (DDSP)
  • Add VISinger 2 vocoder discriminator (+MFD)
  • Add Avocodo
  • Add UHifiGAN
  • Add yin feature (For Pits)
    • Pits
  • Unit tests for all those changes
  • Fix F0 bug for SVS

jerryuhoo and others added 30 commits March 18, 2023 17:30
Remove relu to avoid gradient vanishing.
Combine melody information in pitch predictor.
This is not a bug, but an improvement in VISinger2, I will add it later.
note that there's a bug when changing downsample parameters.
This change is for both gan_tts and gan_svs
@ftshijt
Copy link
Collaborator

ftshijt commented May 11, 2023

Hi @jerryuhoo could you let me know when you have finished the development? Then I can also help to fix the CI issue in the import test for you.

@jerryuhoo
Copy link
Contributor Author

Hi @jerryuhoo could you let me know when you have finished the development? Then I can also help to fix the CI issue in the import test for you.

Those listed models and functions are done, but I'm still investigating the performance gap. It is caused by either the posterior encoder or the vocoder, but currently I cannot find the bug.

@ftshijt
Copy link
Collaborator

ftshijt commented May 18, 2023

Sorry that I did not find time to fix the CI, let's discuss the details in today's meeting

@jerryuhoo
Copy link
Contributor Author

Some code can be improved such as ddsp module (some part of the ddsp code is not used) and modules in MFD. For example, in visinger2_vocoder.py, maybe we can use LogMelFbank instead of TorchSTFT. But LogMelFbank doesn't have a feature of domain="double", which considers both linear and log fbanks.

@ftshijt
Copy link
Collaborator

ftshijt commented May 22, 2023

Some code can be improved such as ddsp module (some part of the ddsp code is not used) and modules in MFD. For example, in visinger2_vocoder.py, maybe we can use LogMelFbank instead of TorchSTFT. But LogMelFbank doesn't have a feature of domain="double", which considers both linear and log fbanks.

You may add TODOs to the codebase. As this PR consists of important bug fix, I believe we can quickly merge it by fixing the ci.

@ftshijt
Copy link
Collaborator

ftshijt commented May 22, 2023

I've fixed the import test and a few comments errors for you

@ftshijt
Copy link
Collaborator

ftshijt commented May 22, 2023

Last request for this PR, can we fix the ci tests for the imported functions https://github.com/espnet/espnet/actions/runs/5043122127/jobs/9044514638 ? After that, I can merge it

@codecov
Copy link

codecov bot commented May 23, 2023

Codecov Report

Merging #5123 (912cfe9) into master (d1074ce) will increase coverage by 0.00%.
The diff coverage is 73.88%.

@@            Coverage Diff            @@
##           master    #5123     +/-   ##
=========================================
  Coverage   74.99%   75.00%             
=========================================
  Files         618      630     +12     
  Lines       55603    56816   +1213     
=========================================
+ Hits        41700    42614    +914     
- Misses      13903    14202    +299     
Flag Coverage Δ
test_integration_espnet1 66.28% <ø> (ø)
test_integration_espnet2 47.61% <29.41%> (+<0.01%) ⬆️
test_python 65.66% <73.88%> (+0.20%) ⬆️
test_utils 23.28% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
espnet2/bin/svs_inference.py 0.00% <0.00%> (ø)
espnet2/gan_svs/espnet_model.py 0.00% <0.00%> (ø)
espnet2/gan_svs/vits/phoneme_predictor.py 100.00% <ø> (ø)
espnet2/svs/espnet_model.py 7.01% <0.00%> (-0.56%) ⬇️
espnet2/tasks/gan_svs.py 0.00% <0.00%> (ø)
espnet2/train/preprocessor.py 29.16% <0.00%> (ø)
espnet2/tts/feats_extract/yin.py 0.00% <0.00%> (ø)
espnet2/tts/feats_extract/ying.py 0.00% <0.00%> (ø)
espnet2/gan_svs/visinger2/ddsp.py 28.44% <28.44%> (ø)
espnet2/gan_svs/uhifigan/uhifigan.py 68.53% <68.53%> (ø)
... and 18 more

... and 7 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ftshijt ftshijt merged commit 09a7e49 into espnet:master May 23, 2023
23 of 25 checks passed
@@ -566,7 +566,7 @@ def apply_spectral_norm(self):
"""Apply spectral normalization module from all of the layers."""

def _apply_spectral_norm(m: torch.nn.Module):
if isinstance(m, torch.nn.Conv2d):
if isinstance(m, torch.nn.Conv1d):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#5215 Hi @jerryuhoo , could you double check the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants