[GAN SVS] Add VISinger2, UHifiGAN, Avocodo #5123

jerryuhoo · 2023-04-17T15:36:01Z

@ftshijt @A-Quarter-Mile
This is an update for VISinger, I added multiple modules.

Remove relu to avoid gradient vanishing.

Combine melody information in pitch predictor.

This is not a bug, but an improvement in VISinger2, I will add it later.

note that there's a bug when changing downsample parameters.

This change is for both gan_tts and gan_svs

for more information, see https://pre-commit.ci

ftshijt · 2023-05-11T05:45:52Z

Hi @jerryuhoo could you let me know when you have finished the development? Then I can also help to fix the CI issue in the import test for you.

jerryuhoo · 2023-05-11T11:24:31Z

Hi @jerryuhoo could you let me know when you have finished the development? Then I can also help to fix the CI issue in the import test for you.

Those listed models and functions are done, but I'm still investigating the performance gap. It is caused by either the posterior encoder or the vocoder, but currently I cannot find the bug.

for more information, see https://pre-commit.ci

ftshijt · 2023-05-18T20:54:44Z

Sorry that I did not find time to fix the CI, let's discuss the details in today's meeting

jerryuhoo · 2023-05-22T03:27:01Z

Some code can be improved such as ddsp module (some part of the ddsp code is not used) and modules in MFD. For example, in visinger2_vocoder.py, maybe we can use LogMelFbank instead of TorchSTFT. But LogMelFbank doesn't have a feature of domain="double", which considers both linear and log fbanks.

ftshijt · 2023-05-22T07:41:02Z

Some code can be improved such as ddsp module (some part of the ddsp code is not used) and modules in MFD. For example, in visinger2_vocoder.py, maybe we can use LogMelFbank instead of TorchSTFT. But LogMelFbank doesn't have a feature of domain="double", which considers both linear and log fbanks.

You may add TODOs to the codebase. As this PR consists of important bug fix, I believe we can quickly merge it by fixing the ci.

ftshijt · 2023-05-22T07:47:06Z

I've fixed the import test and a few comments errors for you

for more information, see https://pre-commit.ci

ftshijt · 2023-05-22T17:11:42Z

Last request for this PR, can we fix the ci tests for the imported functions https://github.com/espnet/espnet/actions/runs/5043122127/jobs/9044514638 ? After that, I can merge it

…hifigan

codecov · 2023-05-23T07:00:05Z

Codecov Report

Merging #5123 (912cfe9) into master (d1074ce) will increase coverage by 0.00%.
The diff coverage is 73.88%.

@@            Coverage Diff            @@
##           master    #5123     +/-   ##
=========================================
  Coverage   74.99%   75.00%             
=========================================
  Files         618      630     +12     
  Lines       55603    56816   +1213     
=========================================
+ Hits        41700    42614    +914     
- Misses      13903    14202    +299

Flag	Coverage Δ
test_integration_espnet1	`66.28% <ø> (ø)`
test_integration_espnet2	`47.61% <29.41%> (+<0.01%)`	⬆️
test_python	`65.66% <73.88%> (+0.20%)`	⬆️
test_utils	`23.28% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
espnet2/bin/svs_inference.py	`0.00% <0.00%> (ø)`
espnet2/gan_svs/espnet_model.py	`0.00% <0.00%> (ø)`
espnet2/gan_svs/vits/phoneme_predictor.py	`100.00% <ø> (ø)`
espnet2/svs/espnet_model.py	`7.01% <0.00%> (-0.56%)`	⬇️
espnet2/tasks/gan_svs.py	`0.00% <0.00%> (ø)`
espnet2/train/preprocessor.py	`29.16% <0.00%> (ø)`
espnet2/tts/feats_extract/yin.py	`0.00% <0.00%> (ø)`
espnet2/tts/feats_extract/ying.py	`0.00% <0.00%> (ø)`
espnet2/gan_svs/visinger2/ddsp.py	`28.44% <28.44%> (ø)`
espnet2/gan_svs/uhifigan/uhifigan.py	`68.53% <68.53%> (ø)`
... and 18 more

... and 7 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

ftshijt · 2023-06-12T20:47:00Z

espnet2/gan_tts/hifigan/hifigan.py

@@ -566,7 +566,7 @@ def apply_spectral_norm(self):
        """Apply spectral normalization module from all of the layers."""

        def _apply_spectral_norm(m: torch.nn.Module):
-            if isinstance(m, torch.nn.Conv2d):
+            if isinstance(m, torch.nn.Conv1d):


#5215 Hi @jerryuhoo , could you double check the issue?

jerryuhoo and others added 30 commits March 18, 2023 17:30

add uhifigan

09a6e59

update parameters and improve compatibility

0f4f19c

add avocodo and improve code structure

a948b13

fix avocodo inference

db7869a

fix compatibility of different vocoders

29e98bf

add visinger2 vocoder

e67d66e

fix visinger2 vocoder bug

7c0d234

Remove relu to avoid gradient vanishing.

add visinger2 vocoder discriminator

e9d4994

add teacher forcing inference in visinger

b11d74a

Fix teacher forcing SVS bug in last commit.

cfed0fe

Refactor length regulator and fix VISinger bug.

fd06cd1

Combine melody information in pitch predictor.

remove decoder_input_pitch in the last commit

3cf9852

This is not a bug, but an improvement in VISinger2, I will add it later.

visinger2 generator draft

e0a7118

add uhifigan avocodo mfd vocoder

f072f02

note that there's a bug when changing downsample parameters.

fix visinger2 inference

815bdf9

fix uhifigan-avocodo inference

308a129

add pisinger draft

bd4204d

update pisinger

ad7c2cd

fix pisinger inference

3f0b78d

Merge branch 'espnet:master' into uhifigan

1e8489b

Merge branch 'espnet:master' into uhifigan

05c8545

fix loading ying feature

5c1102b

update visinger

6e3fd96

fix visinger2 vocoder unit test

e06bb2f

Refactor test data into function

b2b0ea1

This change is for both gan_tts and gan_svs

add unit test for avocodo

39549aa

add unit test for ddsp and uhifigan

b283814

update VISinger 2

74dc72d

add unit test for flow and phoneme

9fea2ae

Sort imports using isort

f1c3552

[pre-commit.ci] auto fixes from pre-commit.com hooks

3c0c592

for more information, see https://pre-commit.ci

jerryuhoo and others added 5 commits May 15, 2023 02:38

fix visinger 2 mfd bug

63d3530

fix hifigan weight norm bug

1b45bdc

improve uhifigan sine signal expand option

264b67e

clean gan_svs code

627bdf4

[pre-commit.ci] auto fixes from pre-commit.com hooks

231f660

for more information, see https://pre-commit.ci

jerryuhoo added 4 commits May 20, 2023 10:37

fix svs f0 bug

2c86f85

code clean

2aee662

update gan_svs configs

e378a16

fix multi-frequency discriminator sample rate bug

fe8adad

ftshijt added 3 commits May 22, 2023 03:21

fix import

4e2b95a

fix comment

6fa5fba

fix comment

3e7164a

jerryuhoo and others added 3 commits May 22, 2023 10:31

Add TODOs

5899bcc

Unified segment size in avocodo config

4d5e56c

[pre-commit.ci] auto fixes from pre-commit.com hooks

154f1dc

for more information, see https://pre-commit.ci

jerryuhoo and others added 4 commits May 22, 2023 15:47

update gan_svs config

73f980c

update comments for gan_svs

989836f

Merge branch 'master' of https://github.com/ftshijt/espnet into uhifigan

bf3a5aa

Merge branch 'uhifigan' of https://github.com/jerryuhoo/espnet into u…

912cfe9

…hifigan

ftshijt merged commit 09a7e49 into espnet:master May 23, 2023
23 of 25 checks passed

ftshijt reviewed Jun 12, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GAN SVS] Add VISinger2, UHifiGAN, Avocodo #5123

[GAN SVS] Add VISinger2, UHifiGAN, Avocodo #5123

jerryuhoo commented Apr 17, 2023 •

edited

ftshijt commented May 11, 2023

jerryuhoo commented May 11, 2023

ftshijt commented May 18, 2023

jerryuhoo commented May 22, 2023

ftshijt commented May 22, 2023

ftshijt commented May 22, 2023

ftshijt commented May 22, 2023

codecov bot commented May 23, 2023 •

edited

ftshijt Jun 12, 2023

[GAN SVS] Add VISinger2, UHifiGAN, Avocodo #5123

[GAN SVS] Add VISinger2, UHifiGAN, Avocodo #5123

Conversation

jerryuhoo commented Apr 17, 2023 • edited

ftshijt commented May 11, 2023

jerryuhoo commented May 11, 2023

ftshijt commented May 18, 2023

jerryuhoo commented May 22, 2023

ftshijt commented May 22, 2023

ftshijt commented May 22, 2023

ftshijt commented May 22, 2023

codecov bot commented May 23, 2023 • edited

Codecov Report

ftshijt Jun 12, 2023

Choose a reason for hiding this comment

jerryuhoo commented Apr 17, 2023 •

edited

codecov bot commented May 23, 2023 •

edited