Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange humming sound during SP & AP #179

Closed
loct824 opened this issue Mar 17, 2024 · 3 comments
Closed

Strange humming sound during SP & AP #179

loct824 opened this issue Mar 17, 2024 · 3 comments

Comments

@loct824
Copy link

loct824 commented Mar 17, 2024

Hi,

We trained an english model for DiffSinger, but we find that for the synthesized songs, in the middle part of the song where SP & AP occurs, the model gives strange voicing that sounds like the singer is humming a constant strange sound.

We give an example below which we use arrows to indicate where that strange humming sound happens.

Could you give us some advice on how the model can be improved/trained to eliminate this strange humming sound during breaks/silence?

'phonemes': [{'name': 'SP', 'duration': 1.3062181},
  {'name': 'AP', 'duration': 0.255292},
  {'name': 'sh', 'duration': 0.1184899},
  {'name': 'uh', 'duration': 0.1555967},
  {'name': 'dx', 'duration': 0.0234931},
  {'name': 'ax', 'duration': 0.075178},
  {'name': 'b', 'duration': 0.0830091},
  {'name': 'ih', 'duration': 0.1427231},
  {'name': 'n', 'duration': 0.06},
  {'name': 's', 'duration': 0.12},
  {'name': 't', 'duration': 0.05},
  {'name': 'r', 'duration': 0.05},
  {'name': 'ao', 'duration': 0.26},
  {'name': 'ng', 'duration': 0.17},
  {'name': 'y', 'duration': 0.05},
  {'name': 'ae', 'duration': 0.23},
  {'name': 'q', 'duration': 0.1454828},
  {'name': 'ay', 'duration': 0.1745172},
  {'name': 'l', 'duration': 0.2},
  {'name': 'ay', 'duration': 0.5},
  {'name': 'd', 'duration': 0.09},
  {'name': 'AP', 'duration': 0.22},
  {'name': 'n', 'duration': 0.0799999},
  {'name': 'ow', 'duration': 0.1300001},
  {'name': 'b', 'duration': 0.04},
  {'name': 'ah', 'duration': 0.1566115},
  {'name': 'dx', 'duration': 0.0233885},
  {'name': 'iy', 'duration': 0.22},
  {'name': 'g', 'duration': 0.16},
  {'name': 'eh', 'duration': 0.2},
  {'name': 't', 'duration': 0.0699999},
  {'name': 's', 'duration': 0.2000001},
  {'name': 'm', 'duration': 0.08},
  {'name': 'iy', 'duration': 0.5209856},
  {'name': 'l', 'duration': 0.0690144},
  {'name': 'ay', 'duration': 0.57},
  {'name': 'k', 'duration': 0.1694275},
  {'name': 'AP', 'duration': 0.2605725},
  {'name': 'y', 'duration': 0.13},
  {'name': 'uw', 'duration': 0.3566036},
  {'name': 'uw', 'duration': 0.7538354},
  {'name': 'SP', 'duration': 1.3531745}, <---------------------
  {'name': 'AP', 'duration': 0.2956739}, <---------------------
  {'name': 'k', 'duration': 0.0907126},
  {'name': 'uh', 'duration': 0.1397},
  {'name': 'dx', 'duration': 0.0203},
  {'name': 'ax', 'duration': 0.09},
  {'name': 'ng', 'duration': 0.06},
  {'name': 'k', 'duration': 0.06},
@yqzhishen
Copy link
Member

This seems like a possible labeling issue. If you didn't label the AP and SP areas accurately, the model may pronounce something on these two phonemes.

@loct824
Copy link
Author

loct824 commented Mar 22, 2024

Do you mean that it relates to the quality of the transcriptions.csv? whether each labelled phoneme correctly correspond to the part in the audio? Any guidance how we could improve other than manually refine the phoneme time positions labelling? thanks.

@yqzhishen
Copy link
Member

If you enabled some variance parameters then controlling them can be a workaround. But on the training side I cannot provide more advice without further information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants