You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We trained an english model for DiffSinger, but we find that for the synthesized songs, in the middle part of the song where SP & AP occurs, the model gives strange voicing that sounds like the singer is humming a constant strange sound.
We give an example below which we use arrows to indicate where that strange humming sound happens.
Could you give us some advice on how the model can be improved/trained to eliminate this strange humming sound during breaks/silence?
This seems like a possible labeling issue. If you didn't label the AP and SP areas accurately, the model may pronounce something on these two phonemes.
Do you mean that it relates to the quality of the transcriptions.csv? whether each labelled phoneme correctly correspond to the part in the audio? Any guidance how we could improve other than manually refine the phoneme time positions labelling? thanks.
If you enabled some variance parameters then controlling them can be a workaround. But on the training side I cannot provide more advice without further information.
Hi,
We trained an english model for DiffSinger, but we find that for the synthesized songs, in the middle part of the song where
SP
&AP
occurs, the model gives strange voicing that sounds like the singer is humming a constant strange sound.We give an example below which we use arrows to indicate where that strange humming sound happens.
Could you give us some advice on how the model can be improved/trained to eliminate this strange humming sound during breaks/silence?
The text was updated successfully, but these errors were encountered: