Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a TTS recipe VITS on LJSpeech dataset #1372

Merged
merged 17 commits into from
Nov 29, 2023
Merged

Conversation

yaozengwei
Copy link
Collaborator

@yaozengwei yaozengwei commented Nov 6, 2023

This PR adds a TTS baseline in icefall.

The model related codes are mostly copied from espnet.

TODO:

  • Support model exporting.
  • Upload checkpints and training logs.
  • Add a document.

Will add a recipe on VCTK dataset later.

@yaozengwei
Copy link
Collaborator Author

This also requires some changes in lhotse. Will make a PR to lhotse soon.

with get_executor() as ex: # Initialize the executor only once.
cuts_filename = f"{prefix}_cuts_{partition}.{suffix}"
if (output_dir / cuts_filename).is_file():
logging.info(f"{partition} already exists - skipping.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logging.info(f"{partition} already exists - skipping.")
logging.info(f"{cuts_filename} already exists - skipping.")

This file computes fbank features of the LJSpeech dataset.
It looks for manifests in the directory data/manifests.

The generated fbank features are saved in data/spectrogram.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The generated fbank features are saved in data/spectrogram.
The generated spectrogram features are saved in data/spectrogram.



"""
This file reads the texts in given manifest and generate the file that maps tokens to IDs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This file reads the texts in given manifest and generate the file that maps tokens to IDs.
This file reads the texts in given manifest and generates the file that maps tokens to IDs.

from pathlib import Path
from typing import Dict

import g2p_en
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add code about how to install g2p_en.

from typing import Dict

import g2p_en
import tacotron_cleaner.cleaners
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add code about how to install tacotron_cleaner?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

# Sort by the number of occurrences in descending order
tokens_and_counts = sorted(counter.items(), key=lambda x: -x[1])

for token, idx in extra_tokens.items():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Items in a dict are iterated in an unknown order.
Please use a list for extra_tokens.

You can use

tokens_and_counts = extra_tokens + tokens_and_counts

counter[t] += 1

# Sort by the number of occurrences in descending order
tokens_and_counts = sorted(counter.items(), key=lambda x: -x[1])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to sort them by count?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make it easy to cut off the vocabulary according the counts if needed. But we don't need this now.

for token, idx in extra_tokens.items():
tokens_and_counts.insert(idx, (token, None))

token2id: Dict[str, int] = {token: i for i, (token, count) in enumerate(tokens_and_counts)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
token2id: Dict[str, int] = {token: i for i, (token, count) in enumerate(tokens_and_counts)}
token2id: Dict[str, int] = {token: i for i, (token, _) in enumerate(tokens_and_counts)}


args = get_args()
manifest_file = Path(args.manifest_file)
out_file = Path(args.tokens)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check that if out_file exists, it returns directly without any further computation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have checked this in prepare.sh


assert manifest.is_file(), f"{manifest} does not exist"
cut_set = load_manifest_lazy(manifest)
assert isinstance(cut_set, CutSet)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert isinstance(cut_set, CutSet)
assert isinstance(cut_set, CutSet), type(cut_set)

log "Stage 0: Download data"

# If you have pre-downloaded it to /path/to/LJSpeech,
# you can create a symlink
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a small description about what the directory LJSpeech contains.

# If you have pre-downloaded it to /path/to/LJSpeech,
# you can create a symlink
#
# ln -sfv /path/to/LJSpeech $dl_dir/LJSpeech
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it LJSpeech or LJSpeech-1.1?

fi

if [ ! -e data/spectrogram/.ljspeech-validated.done ]; then
log "Validating data/fbank for LJSpeech"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log "Validating data/fbank for LJSpeech"
log "Validating data/spectrogram for LJSpeech"

@@ -0,0 +1,97 @@
#!/usr/bin/env bash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you replace it with a symlink?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Thanks.

@yaozengwei
Copy link
Collaborator Author

This also requires some changes in lhotse. Will make a PR to lhotse soon.

See lhotse-speech/lhotse#1205

@yaozengwei
Copy link
Collaborator Author

Training logs, Tensorboard logs, and checkpoints are uploaded to https://huggingface.co/Zengwei/icefall-tts-ljspeech-vits-2023-11-29.

@yaozengwei yaozengwei merged commit 0622dea into k2-fsa:master Nov 29, 2023
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants