Skip to content

Question about transcriptions.csv when training new model #5

Answered by qiuqiao
vocatart asked this question in Q&A
Discussion options

You must be logged in to vote

Hi! In terms of the dataset format for training a new model, it's consistent with the one used by openvpi/DiffSinger. You can utilize openvpi/MakeDiffSinger to create a dataset for your needs.

More specifically, for full_label data, the CSV should contain three columns: name, ph_seq, and ph_dur.

  • name is the filename of the WAV file without the extension.
  • ph_seq is the phoneme annotation sequence for the WAV file, separated by spaces. Any phonemes listed in ignored_phonemes within the configs/binarize_config.yaml will be considered as SP.
  • ph_dur corresponds to the duration sequence for each phoneme in the sequence, measured in seconds, and also separated by spaces.

Here's an example of w…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by qiuqiao
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants