Add NaturalSpeech2 #35

HeCheng0625 · 2023-12-16T15:46:26Z

Add NaturalSpeech2 models, train, and inference. NS2 predicts latent of encodec, and use decoder of encodec to generate wavform. We also offer a pretrained checkpoint (trained on LibriTTS) for user to inference.

Paper: https://arxiv.org/abs/2304.09116
CKPT: https://huggingface.co/amphion/naturalspeech2_libritts
Demo: https://huggingface.co/spaces/amphion/NaturalSpeech2

HeCheng0625 · 2023-12-17T10:02:24Z

Add NaturalSpeech2 models, train, and inference. NS2 predicts latent of encodec, and use decoder of encodec to generate wavform. We also offer a pretrained checkpoint (trained on LibriTTS) for user to inference.

RMSnow · 2023-12-17T13:17:52Z

Where is the pretrained checkpoint? And are there any generated samples?

lmxue · 2023-12-18T04:28:32Z

egs/tts/NaturalSpeech2/run_inference.sh

Please merge run_inference.sh and run_trian.sh into a single file, i.e., run.sh, and provide a recipe for NaturalSpeech2.

Please merge run_inference.sh and run_trian.sh into a single file, i.e., run.sh, and provide a recipe for NaturalSpeech2.

Same suggestions.

lmxue · 2023-12-18T04:36:28Z

models/tts/naturalspeech2/base_trainer.py

What's the difference between this trainer and TTS trainer in models/tts/base/tts_trainer.py?

Because some initialization is useless for ns2. I don't want to inherit TTS trainer.

lmxue · 2023-12-18T04:40:05Z

models/tts/naturalspeech2/ns2_trainer.py

Why doesn't the NS2 trainer directly inherit TTS trainer (defined in models/tts/base/tts_trainer.py) but instead inherit a newly defined trainer that is similar to the TTS trainer?

Why doesn't the NS2 trainer directly inherit TTS trainer (defined in models/tts/base/tts_trainer.py) but instead inherit a newly defined trainer that is similar to the TTS trainer?

Same question

lmxue · 2023-12-18T04:41:57Z

models/tts/naturalspeech2/prior_encoder.py

It would be better to move this module to the directory of modules/encoder or modules/naturalspech2

why? Prior encoder and diffusion are also parts of NS2 models.

I'm not sure about your discussions. A general advice: if prior_encoder.py can be used commonly for other model except for ns2, then you need to move it into modules.

I think prior_encoder is designed especially for NS2 (until now). So I will put it under models/tts/ns2

I think the models folder should only contain the model (e.g., fastspeech2, vits, valle) only, the related module should be placed in the modules folder, especially since you have created the folder modules/naturalspeech2.

lmxue · 2023-12-18T05:06:18Z

models/tts/naturalspeech2/wavenet.py

Future improvement: merge this wavenet with Amphion wavenet vocoder (https://github.com/open-mmlab/Amphion/blob/main/models/vocoders/autoregressive/wavenet/wavenet.py)

Future improvement: merge this wavenet with Amphion wavenet vocoder (https://github.com/open-mmlab/Amphion/blob/main/models/vocoders/autoregressive/wavenet/wavenet.py)

Approve for the "merge" idea. This name wavenet.py is confusing to some extent. It is not a vocoder. I think it is more like diffwavenet, which has existed in Amphion:

Amphion/modules/diffusion/bidilconv/bidilated_conv.py

Line 14 in b17eea8

class BiDilConv(nn.Module):

@HeCheng0625 You can merge this wavenet.py with the existing one.

I will do it future. Now the wavenet is designed only for NS2. And it has a lot of different input compared to BiDilConv.

RMSnow

Add the copyright information for all the newly added files, including .py and .sh files.

RMSnow · 2023-12-18T07:07:57Z

egs/tts/NaturalSpeech2/run_inference.sh

Please merge run_inference.sh and run_trian.sh into a single file, i.e., run.sh, and provide a recipe for NaturalSpeech2.

Same suggestions.

RMSnow · 2023-12-18T07:09:31Z

models/tts/naturalspeech2/base_trainer.py

+from models.base.base_sampler import build_samplers
+
+
+class TTSTrainer:


There is no inheritance? Although Line27 says "it inherits..."

RMSnow · 2023-12-18T07:10:26Z

models/tts/naturalspeech2/ns2.py

Add copyright information for all the newly added files.

RMSnow · 2023-12-18T07:11:24Z

models/tts/naturalspeech2/ns2_trainer.py

Why doesn't the NS2 trainer directly inherit TTS trainer (defined in models/tts/base/tts_trainer.py) but instead inherit a newly defined trainer that is similar to the TTS trainer?

Same question

RMSnow · 2023-12-18T07:26:07Z

models/tts/naturalspeech2/prior_encoder.py

I'm not sure about your discussions. A general advice: if prior_encoder.py can be used commonly for other model except for ns2, then you need to move it into modules.

RMSnow · 2023-12-18T07:30:01Z

models/tts/naturalspeech2/wavenet.py

Future improvement: merge this wavenet with Amphion wavenet vocoder (https://github.com/open-mmlab/Amphion/blob/main/models/vocoders/autoregressive/wavenet/wavenet.py)

Approve for the "merge" idea. This name wavenet.py is confusing to some extent. It is not a vocoder. I think it is more like diffwavenet, which has existed in Amphion:

Amphion/modules/diffusion/bidilconv/bidilated_conv.py

Line 14 in b17eea8

class BiDilConv(nn.Module):

@HeCheng0625 You can merge this wavenet.py with the existing one.

HeCheng0625 · 2023-12-18T11:33:22Z

Where is the pretrained checkpoint? And are there any generated samples?

Paper: https://arxiv.org/abs/2304.09116
CKPT: https://huggingface.co/amphion/naturalspeech2_libritts
Demo: https://huggingface.co/spaces/amphion/NaturalSpeech2

HeCheng0625 added 6 commits December 16, 2023 17:58

ns2 models

f741820

add ns2 module

082a44e

add ns2 train

fa3f2cd

add config

1b77952

inference

95bd20d

commit

49544c5

HeCheng0625 requested review from zhizhengwu, RMSnow and lmxue December 16, 2023 15:46

use black to refine codeformat

2415e78

a small fix

ad081ec

lmxue reviewed Dec 18, 2023

View reviewed changes

RMSnow requested changes Dec 18, 2023

View reviewed changes

HeCheng0625 added 2 commits December 18, 2023 16:54

Add copyright

a18cd1a

add recipes

3c2b5c4

HeCheng0625 requested review from RMSnow and lmxue December 18, 2023 10:25

HeCheng0625 added 2 commits December 18, 2023 19:21

add demo link

c5c8e28

fix a typo

76cce63

HeCheng0625 added 2 commits December 18, 2023 20:50

use real tts trainer

40d0ff4

refine codeformat

5cea9a3

lmxue approved these changes Dec 18, 2023

View reviewed changes

RMSnow approved these changes Dec 18, 2023

View reviewed changes

RMSnow merged commit cc620a3 into open-mmlab:main Dec 18, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NaturalSpeech2 #35

Add NaturalSpeech2 #35

HeCheng0625 commented Dec 16, 2023 •

edited by zhizhengwu

HeCheng0625 commented Dec 17, 2023

RMSnow commented Dec 17, 2023

lmxue Dec 18, 2023

RMSnow Dec 18, 2023

lmxue Dec 18, 2023

HeCheng0625 Dec 18, 2023

lmxue Dec 18, 2023

RMSnow Dec 18, 2023

lmxue Dec 18, 2023 •

edited

HeCheng0625 Dec 18, 2023 •

edited

RMSnow Dec 18, 2023

HeCheng0625 Dec 18, 2023

lmxue Dec 18, 2023

lmxue Dec 18, 2023

RMSnow Dec 18, 2023

HeCheng0625 Dec 18, 2023

RMSnow left a comment

RMSnow Dec 18, 2023

RMSnow Dec 18, 2023

RMSnow Dec 18, 2023

HeCheng0625 Dec 18, 2023

RMSnow Dec 18, 2023

RMSnow Dec 18, 2023

RMSnow Dec 18, 2023

HeCheng0625 commented Dec 18, 2023

		from models.base.base_sampler import build_samplers


		class TTSTrainer:

Add NaturalSpeech2 #35

Add NaturalSpeech2 #35

Conversation

HeCheng0625 commented Dec 16, 2023 • edited by zhizhengwu

HeCheng0625 commented Dec 17, 2023

RMSnow commented Dec 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmxue Dec 18, 2023 • edited

Choose a reason for hiding this comment

HeCheng0625 Dec 18, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RMSnow left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HeCheng0625 commented Dec 18, 2023

HeCheng0625 commented Dec 16, 2023 •

edited by zhizhengwu

lmxue Dec 18, 2023 •

edited

HeCheng0625 Dec 18, 2023 •

edited