Modify SpeechSynthesisDataset class, make it return text #1205

yaozengwei · 2023-11-06T07:21:53Z

This PR is required by the TTS recipe k2-fsa/icefall#1372 in icefall. In this TTS recipe, we convert the transcript text to phonemes in training.

pzelasko

Looks good, but don't you prefer to have G2P inside SpeechSynthesisDataset to avoid spending the time running it inside the training loop?

pzelasko · 2023-11-10T02:19:32Z

lhotse/dataset/speech_synthesis.py

        }
    """

    def __init__(
        self,
-        cuts: CutSet,
+        return_cuts: bool = False,


This change is breaking, can you move return_cuts to some position towards the end of parameter list?

yaozengwei · 2023-11-10T08:41:40Z

Looks good, but don't you prefer to have G2P inside SpeechSynthesisDataset to avoid spending the time running it inside the training loop?

Hi, @pzelasko, I prefer to do the text normalization and tokenization in separate training recipes, since they usually depend on different packages. @csukuangfj suggests doing this in data preparation stage, with converted phonemes saved to manifests.

pzelasko · 2023-11-14T03:19:36Z

Looks good, but don't you prefer to have G2P inside SpeechSynthesisDataset to avoid spending the time running it inside the training loop?

Hi, @pzelasko, I prefer to do the text normalization and tokenization in separate training recipes, since they usually depend on different packages. @csukuangfj suggests doing this in data preparation stage, with converted phonemes saved to manifests.

OK cool. It looks like there are some conflicts after merging the other PR with multi-speaker support, could you resolve them?

yaozengwei · 2023-11-15T02:42:56Z

Looks good, but don't you prefer to have G2P inside SpeechSynthesisDataset to avoid spending the time running it inside the training loop?

Hi, @pzelasko, I prefer to do the text normalization and tokenization in separate training recipes, since they usually depend on different packages. @csukuangfj suggests doing this in data preparation stage, with converted phonemes saved to manifests.

OK cool. It looks like there are some conflicts after merging the other PR with multi-speaker support, could you resolve them?

Ok. Thanks. I have some local changes and will resolve the conflicts later.

In addition, in current implementation, it will load the whole cuts set and generate a char-based vocabulary according to the given texts. But I think TTS recipes usually use phoneme tokens instead. Also, if might cause token-id-mapping mismatch bettwen training and test, when separately given two cuts sets for this class. Shall we remove this and make this calss return raw text and optionally pre-converted (phoneme) tokens?

lhotse/lhotse/dataset/speech_synthesis.py

Line 43 in 7061175

self.token_collater = TokenCollater(cuts, add_eos=add_eos, add_bos=add_bos)

pzelasko · 2023-11-16T04:21:04Z

In addition, in current implementation, it will load the whole cuts set and generate a char-based vocabulary according to the given texts. But I think TTS recipes usually use phoneme tokens instead. Also, if might cause token-id-mapping mismatch bettwen training and test, when separately given two cuts sets for this class. Shall we remove this and make this calss return raw text and optionally pre-converted (phoneme) tokens?

I see that this class accepts cuts in constructor, it's an outdated design we don't use anymore, so thanks for updating it. Instead, it would make sense to pass a tokenizer object/callable that (optionally) converts raw text into some tokens. What you're suggesting also works.

yaozengwei · 2023-11-21T08:07:18Z

@pzelasko Thanks. It is ready to be merged.

yaozengwei · 2023-11-29T11:55:03Z

@pzelasko @desh2608 Would you mind giving a review if you have time? I have fixed the test case.

pzelasko

Thanks, LGTM!

modify SpeechSynthesisDataset, make it return text

0b99c12

yaozengwei mentioned this pull request Nov 6, 2023

Add a TTS recipe VITS on LJSpeech dataset k2-fsa/icefall#1372

Merged

3 tasks

pzelasko reviewed Nov 10, 2023

View reviewed changes

Merge branch 'master' into ljspeech

0f4305c

Merge remote-tracking branch 'lhotse/master' into ljspeech

fc22e04

yaozengwei added 3 commits November 21, 2023 15:52

modify SpeechSynthesisDataset, remove TokenCollater

1d9949c

Merge remote-tracking branch 'lhotse/master' into ljspeech

f043685

Merge branch 'ljspeech' of github.com:yaozengwei/lhotse into ljspeech

da7c0f2

fix test/dataset/test_speech_synthesis_dataset.py

4fdd408

Merge branch 'master' into ljspeech

7a33eff

pzelasko approved these changes Nov 30, 2023

View reviewed changes

pzelasko merged commit b869488 into lhotse-speech:master Nov 30, 2023
9 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify SpeechSynthesisDataset class, make it return text #1205

Modify SpeechSynthesisDataset class, make it return text #1205

yaozengwei commented Nov 6, 2023 •

edited

Loading

pzelasko left a comment

pzelasko Nov 10, 2023

yaozengwei Nov 10, 2023

yaozengwei commented Nov 10, 2023 •

edited

Loading

pzelasko commented Nov 14, 2023

yaozengwei commented Nov 15, 2023

pzelasko commented Nov 16, 2023

yaozengwei commented Nov 21, 2023

yaozengwei commented Nov 29, 2023 •

edited

Loading

pzelasko left a comment

Modify SpeechSynthesisDataset class, make it return text #1205

Modify SpeechSynthesisDataset class, make it return text #1205

Conversation

yaozengwei commented Nov 6, 2023 • edited Loading

pzelasko left a comment

Choose a reason for hiding this comment

pzelasko Nov 10, 2023

Choose a reason for hiding this comment

yaozengwei Nov 10, 2023

Choose a reason for hiding this comment

yaozengwei commented Nov 10, 2023 • edited Loading

pzelasko commented Nov 14, 2023

yaozengwei commented Nov 15, 2023

pzelasko commented Nov 16, 2023

yaozengwei commented Nov 21, 2023

yaozengwei commented Nov 29, 2023 • edited Loading

pzelasko left a comment

Choose a reason for hiding this comment

yaozengwei commented Nov 6, 2023 •

edited

Loading

yaozengwei commented Nov 10, 2023 •

edited

Loading

yaozengwei commented Nov 29, 2023 •

edited

Loading