Question about transcriptions.csv when training new model #5

vocatart · 2023-12-17T07:30:15Z

vocatart
Dec 17, 2023

Hi! Very interested in using SOFA as a replacement for MFA. Had a quick question about training a new model. The readme refers to a transcriptions.csv being in the folder of the singer datasets, but it never specifies the format this file needs to be in and what information it contains. Thanks! :)

Answered by qiuqiao

Dec 17, 2023

Hi! In terms of the dataset format for training a new model, it's consistent with the one used by openvpi/DiffSinger. You can utilize openvpi/MakeDiffSinger to create a dataset for your needs.

More specifically, for full_label data, the CSV should contain three columns: name, ph_seq, and ph_dur.

name is the filename of the WAV file without the extension.
ph_seq is the phoneme annotation sequence for the WAV file, separated by spaces. Any phonemes listed in ignored_phonemes within the configs/binarize_config.yaml will be considered as SP.
ph_dur corresponds to the duration sequence for each phoneme in the sequence, measured in seconds, and also separated by spaces.

Here's an example of w…

View full answer

qiuqiao · 2023-12-17T08:12:34Z

qiuqiao
Dec 17, 2023
Maintainer

Hi! In terms of the dataset format for training a new model, it's consistent with the one used by openvpi/DiffSinger. You can utilize openvpi/MakeDiffSinger to create a dataset for your needs.

More specifically, for full_label data, the CSV should contain three columns: name, ph_seq, and ph_dur.

name is the filename of the WAV file without the extension.
ph_seq is the phoneme annotation sequence for the WAV file, separated by spaces. Any phonemes listed in ignored_phonemes within the configs/binarize_config.yaml will be considered as SP.
ph_dur corresponds to the duration sequence for each phoneme in the sequence, measured in seconds, and also separated by spaces.

Here's an example of what the CSV content should look like:

name,ph_seq,ph_dur
myaudio_1,b ie r ong h ua l e y En l ei AP SP,0.07859 0.40293 0.09818 0.41515 0.11034 0.3453 0.15223 0.35642 0.0937 0.24446 0.09266 0.34733 0.24899 0.2778780726
myaudio_2,SP sh uo g an AP SP j iu g an AP z ai d ao x ia zh ir q ian zh ir q iu h uo g e t ong k uai AP SP,0.3567800454 0.18395 0.54211 0.09266 0.63328 0.24868 0.05809 0.11525 0.56118 0.08851 0.64217 0.22428 0.11764 0.18132 0.06359 0.16275 0.07434 0.13661 0.07883 0.19637 0.10148 0.26796 0.04816 0.15877 0.07283 0.18856 0.11828 0.21115 0.09586 0.13446 0.11893 0.35909 0.06789 0.41733 0.16352 0.2607256236
...

For weak_label data, the format remains the same as for full_label, with the exception that the ph_dur column is not required. The binarizer will ignore this column even if ph_dur data is present in the CSV file. Additionally, any phonemes listed as ignored_phonemes in configs/binarize_config.yaml will also be ignored.

Feel free to reach out if you need any more help. ^_^

你好！用于训练新模型的数据集格式，与openvpi/DiffSinger使用的格式是一样的。你可以用openvpi/MakeDiffSinger来创建你需要的数据集。

具体而言，对于full_label数据，CSV文件应该包含name、ph_seq和ph_dur三列：

name是WAV文件的文件名，不包含扩展名。
ph_seq是WAV文件的音素注释序列，音素之间用空格隔开。在configs/binarize_config.yaml中列为ignored_phonemes的任何音素将被视为SP。
ph_dur是与音素序列对应的每个音素的持续时间序列，以秒为单位，同样用空格隔开。

下面是CSV内容的一个示例：

name,ph_seq,ph_dur
myaudio_1,b ie r ong h ua l e y En l ei AP SP,0.07859 0.40293 0.09818 0.41515 0.11034 0.3453 0.15223 0.35642 0.0937 0.24446 0.09266 0.34733 0.24899 0.2778780726
myaudio_2,SP sh uo g an AP SP j iu g an AP z ai d ao x ia zh ir q ian zh ir q iu h uo g e t ong k uai AP SP,0.3567800454 0.18395 0.54211 0.09266 0.63328 0.24868 0.05809 0.11525 0.56118 0.08851 0.64217 0.22428 0.11764 0.18132 0.06359 0.16275 0.07434 0.13661 0.07883 0.19637 0.10148 0.26796 0.04816 0.15877 0.07283 0.18856 0.11828 0.21115 0.09586 0.13446 0.11893 0.35909 0.06789 0.41733 0.16352 0.2607256236
...

对于weak_label数据，其格式与full_label相同，不同之处在于不需要ph_dur列，即使CSV文件中包含了ph_dur数据，binarizer也会忽略它。此外，在configs/binarize_config.yaml中列为ignored_phonemes的音素均会被忽略。

如果需要更多帮助，请随时联系。^_^

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about transcriptions.csv when training new model #5

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Question about transcriptions.csv when training new model #5

vocatart Dec 17, 2023

Replies: 1 comment

qiuqiao Dec 17, 2023 Maintainer

vocatart
Dec 17, 2023

qiuqiao
Dec 17, 2023
Maintainer