About TTS resume #94

arieszhang1994 · 2024-01-08T07:06:42Z

HI, I found that resume code of TTS is in
https://github.com/open-mmlab/Amphion/blob/main/models/tts/base/tts_trainer.py#L140
and
https://github.com/open-mmlab/Amphion/blob/main/models/tts/base/tts_trainer.py#L302

however, _accelerator_prepare is in
https://github.com/open-mmlab/Amphion/blob/main/models/tts/base/tts_trainer.py#L145

So when resume_type=="resume", self. _check_resume function seems not to work.

Is there something which I missed?

arieszhang1994 · 2024-01-12T14:20:34Z

For another issue, I am confused with the phon_id_collator.get_phone_id_sequence
when i run the infer process with
sh egs/tts/VITS/run.sh --stage 3 --gpu "0" \ --config ckpts/tts/vits_ljspeech/args.json \ --infer_expt_dir ckpts/tts/vits_ljspeech/ \ --infer_output_dir ckpts/tts/vits_ljspeech/result \ --infer_mode "single" \ --infer_text "This is a clip of generated speech with the given text from a TTS model."

in https://github.com/open-mmlab/Amphion/blob/main/models/tts/vits/vits_inference.py#L116

the text is 'This is a clip of generated speech with the given text from a TTS model.'
the phone_seq is '['DH', 'IH0', 'S', 'IH0', 'Z', 'AH0', 'K', 'L', 'IH1', 'P', 'AH0', 'V', 'JH', 'EH1', 'N', 'ER0', 'EY2', 'T', 'AH0', 'D', 'S', 'P', 'IY1', 'CH', 'W', 'IH0', 'DH', 'DH', 'AH0', 'G', 'IH1', 'V', 'AH0', 'N', 'T', 'EH1', 'K', 'S', 'T', 'F', 'ER0', 'M', 'AH0', 'T', 'IY1', 'EH1', 'N', 'IY1', 'S', 'M', 'AA1', 'D', 'AH0', 'L']'
however, the phone_id_seq is [41, 45, 11, 46, 45, 63, 42, 55, 52, 11, 56, 11, 46, 45, 63, 42, 55, 52, 11, 63, 11, 38, 45, 63, 42, 55, 52, 11, 48, 11, 49, 11, 46, 45, 52, 51, 42, 11, 53, 11, 38, 45, 63, 42, 55, 52, 11, 59, 11, 47, 45, 11, 42, 45, 52, 51, 42, 11, 51, 11, 42, 55, 63, 42, 55, 52, 11, 42, 62, 57, 60, 52, 11, 57, 11, 38, 45, 63, 42, 55, 52, 11, 41, 11, 56, 11, 53, 11, 46, 62, 52, 51, 42, 11, 40, 45, 11, 60, 11, 46, 45, 63, 42, 55, 52, 11, 41, 45, 11, 41, 45, 11, 38, 45, 63, 42, 55, 52, 11, 44, 11, 46, 45, 52, 51, 42, 11, 59, 11, 38, 45, 63, 42, 55, 52, 11, 51, 11, 57, 11, 42, 45, 52, 51, 42, 11, 48, 11, 56, 11, 57, 11, 43, 11, 42, 55, 63, 42, 55, 52, 11, 50, 11, 38, 45, 63, 42, 55, 52, 11, 57, 11, 46, 62, 52, 51, 42, 11, 42, 45, 52, 51, 42, 11, 51, 11, 46, 62, 52, 51, 42, 11, 56, 11, 50, 11, 38, 38, 52, 51, 42, 11, 41, 11, 38, 45, 63, 42, 55, 52, 11, 49]

when I run
text.sequence_to_text(phone_id_seq)
the result is
dh ihzero s ihzero z ahzero k l ihone p ahzero v jh ehone n erzero eytwo t ahzero d s p iyone ch w ihzero dh dh ahzero g ihone v ahzero n t ehone k s t f erzero m ahzero t iyone ehone n iyone s m aaone d ahzero l

does amphion do this on purpose?

lmxue · 2024-01-16T11:40:36Z

HI, I found that resume code of TTS is in https://github.com/open-mmlab/Amphion/blob/main/models/tts/base/tts_trainer.py#L140 and https://github.com/open-mmlab/Amphion/blob/main/models/tts/base/tts_trainer.py#L302

however, _accelerator_prepare is in https://github.com/open-mmlab/Amphion/blob/main/models/tts/base/tts_trainer.py#L145

So when resume_type=="resume", self. _check_resume function seems not to work.

Is there something which I missed?

Thanks for your feedback. Please check this PR #108 .

arieszhang1994 · 2024-01-16T11:53:10Z

HI, I found that resume code of TTS is in https://github.com/open-mmlab/Amphion/blob/main/models/tts/base/tts_trainer.py#L140 and https://github.com/open-mmlab/Amphion/blob/main/models/tts/base/tts_trainer.py#L302
however, _accelerator_prepare is in https://github.com/open-mmlab/Amphion/blob/main/models/tts/base/tts_trainer.py#L145
So when resume_type=="resume", self. _check_resume function seems not to work.
Is there something which I missed?

Thanks for your feedback. Please check this PR #108 .

Thank you!
Besides, can you check the second issue I mentioned? I tried to add
phones =" ".join(phone_seq)
phones = "{"+phones"}"
phone_seq=phones.split(" ")
after this line:

Amphion/models/tts/vits/vits_dataset.py

Line 80 in a840088

phones_seq = phones.split(" ")

and retrain a new vits model.

also I change the same code of inference. However, the retrained model fails to synthesize human-understandable English. Although the loss seems normal (dropped to 37). The generated demo sounds like the phone embedding haven't be trained.
It's so weird that I have debugged for several days and stil can't find out the reason now.

* Fix bug for VITS resuming training. Related issue #94

lmxue · 2024-02-15T12:36:11Z

For another issue, I am confused with the phon_id_collator.get_phone_id_sequence when i run the infer process with sh egs/tts/VITS/run.sh --stage 3 --gpu "0" \ --config ckpts/tts/vits_ljspeech/args.json \ --infer_expt_dir ckpts/tts/vits_ljspeech/ \ --infer_output_dir ckpts/tts/vits_ljspeech/result \ --infer_mode "single" \ --infer_text "This is a clip of generated speech with the given text from a TTS model."

in https://github.com/open-mmlab/Amphion/blob/main/models/tts/vits/vits_inference.py#L116

the text is 'This is a clip of generated speech with the given text from a TTS model.' the phone_seq is '['DH', 'IH0', 'S', 'IH0', 'Z', 'AH0', 'K', 'L', 'IH1', 'P', 'AH0', 'V', 'JH', 'EH1', 'N', 'ER0', 'EY2', 'T', 'AH0', 'D', 'S', 'P', 'IY1', 'CH', 'W', 'IH0', 'DH', 'DH', 'AH0', 'G', 'IH1', 'V', 'AH0', 'N', 'T', 'EH1', 'K', 'S', 'T', 'F', 'ER0', 'M', 'AH0', 'T', 'IY1', 'EH1', 'N', 'IY1', 'S', 'M', 'AA1', 'D', 'AH0', 'L']' however, the phone_id_seq is [41, 45, 11, 46, 45, 63, 42, 55, 52, 11, 56, 11, 46, 45, 63, 42, 55, 52, 11, 63, 11, 38, 45, 63, 42, 55, 52, 11, 48, 11, 49, 11, 46, 45, 52, 51, 42, 11, 53, 11, 38, 45, 63, 42, 55, 52, 11, 59, 11, 47, 45, 11, 42, 45, 52, 51, 42, 11, 51, 11, 42, 55, 63, 42, 55, 52, 11, 42, 62, 57, 60, 52, 11, 57, 11, 38, 45, 63, 42, 55, 52, 11, 41, 11, 56, 11, 53, 11, 46, 62, 52, 51, 42, 11, 40, 45, 11, 60, 11, 46, 45, 63, 42, 55, 52, 11, 41, 45, 11, 41, 45, 11, 38, 45, 63, 42, 55, 52, 11, 44, 11, 46, 45, 52, 51, 42, 11, 59, 11, 38, 45, 63, 42, 55, 52, 11, 51, 11, 57, 11, 42, 45, 52, 51, 42, 11, 48, 11, 56, 11, 57, 11, 43, 11, 42, 55, 63, 42, 55, 52, 11, 50, 11, 38, 45, 63, 42, 55, 52, 11, 57, 11, 46, 62, 52, 51, 42, 11, 42, 45, 52, 51, 42, 11, 51, 11, 46, 62, 52, 51, 42, 11, 56, 11, 50, 11, 38, 38, 52, 51, 42, 11, 41, 11, 38, 45, 63, 42, 55, 52, 11, 49]

when I run text.sequence_to_text(phone_id_seq) the result is dh ihzero s ihzero z ahzero k l ihone p ahzero v jh ehone n erzero eytwo t ahzero d s p iyone ch w ihzero dh dh ahzero g ihone v ahzero n t ehone k s t f erzero m ahzero t iyone ehone n iyone s m aaone d ahzero l

does amphion do this on purpose?

When cfg.preprocess.phone_extractor == "lexicon", we convert text to phone sequence based on the dictionary defined in https://raw.githubusercontent.com/open-mmlab/Amphion/main/text/lexicon/librispeech-lexicon.txt.
For the conversion from phone sequence to phone ID sequence, we currently uses the phoneme set from the https://github.com/HarryHe11/vc-dev/blob/main/text/symbols.py. However, it should use the phoneme set from the librispeech-lexicon.txt. I'll refactor this part. Thanks for your feedback.

HarryHe11 · 2024-02-16T09:55:18Z

Hi @arieszhang1994 , If you have any further questions about the TTS resume, feel free to re-open this issue. We are glad to follow up!

zhizhengwu assigned lmxue Jan 8, 2024

lmxue mentioned this issue Jan 15, 2024

Fix bug for VITS resuming training #108

Merged

10 tasks

lmxue added a commit that referenced this issue Jan 17, 2024

Fix bug for VITS resuming training (#108)

4125584

* Fix bug for VITS resuming training. Related issue #94

HarryHe11 mentioned this issue Feb 15, 2024

Regarding Resume in VALLE during training #129

Closed

HarryHe11 closed this as completed Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About TTS resume #94

About TTS resume #94

arieszhang1994 commented Jan 8, 2024

arieszhang1994 commented Jan 12, 2024 •

edited

Loading

lmxue commented Jan 16, 2024

arieszhang1994 commented Jan 16, 2024 •

edited

Loading

lmxue commented Feb 15, 2024

HarryHe11 commented Feb 16, 2024

About TTS resume #94

About TTS resume #94

Comments

arieszhang1994 commented Jan 8, 2024

arieszhang1994 commented Jan 12, 2024 • edited Loading

lmxue commented Jan 16, 2024

arieszhang1994 commented Jan 16, 2024 • edited Loading

lmxue commented Feb 15, 2024

HarryHe11 commented Feb 16, 2024

arieszhang1994 commented Jan 12, 2024 •

edited

Loading

arieszhang1994 commented Jan 16, 2024 •

edited

Loading