Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DiffSinger infer problem #41

Closed
leon2milan opened this issue Mar 7, 2022 · 9 comments
Closed

DiffSinger infer problem #41

leon2milan opened this issue Mar 7, 2022 · 9 comments

Comments

@leon2milan
Copy link

I want to test opencpop preitrain model on unseen song. I don't know how to generate the wav file.

  1. What data I should prepare for model?
  2. How to do it? I saw test_step in FastSpeech2Task, but it seems for tts task. So I need override test_step in DiffSingerMIDITask? Is there other way to solve this? Without packing data into dataloader, just load model, and infer.
@newportchen
Copy link

You should prepare: phoneme | pitch_midi | pitch_dur | is_slur ,then write to the data 'test“ just use IndexedDatasetBuilder like process_data() in base_binarizer.py.

You need to fix some data loading problems (getitem 、collater in fs2_utils.py). Just set it to None . They are not necessary in the synthesis stage.

@leon2milan
Copy link
Author

leon2milan commented Mar 8, 2022

This is my first exposure to singing synthesis. So I have some question about the terminology.
Does pitch_midi | pitch_dur mean note & note duration
Should I set is_slur through staffs ?
And I don't know how to set pitch_dur in a unseen song. Should I use Logic Pro to label it ? Or I can get this by some model or something like this.

@newportchen
Copy link

This is my first exposure to singing synthesis. So I have some question about the terminology. Does pitch_midi | pitch_dur mean note & note duration ? Should I set is_slur through staffs ? And I don't know how to set pitch_dur in a unseen song. Should I use Logic Pro to label it ? Or I can get this by some model or something like this.

Wait a minute. I'll find you a picture

@newportchen
Copy link

20210103161116755

We use the data marked by yellow box, phoneme | pitch_midi | pitch_dur

@newportchen
Copy link

pitch_dur = 60 * NoteBeats / bmp

bmp : beats per minute --the speed

@leon2milan
Copy link
Author

Thank you very much. I know how to do this. But I have another question.
There is silence in music. And it won't work if I simply turn text into pinyin?
Should I do singing - Lyrics alignment?

@newportchen
Copy link

2001000005|面对浩瀚的星海我们微小得像尘埃|m ian d ui h ao h an an d e x ing h ai ai ai AP w o m en w ei x iao d e x iang ch en ai ai ai SP|C#4/Db4 C#4/Db4 D#4/Eb4 D#4/Eb4 C#4/Db4 C#4/Db4 D#4/Eb4 D#4/Eb4 E4 D#4/Eb4 D#4/Eb4 E4 E4 G#4/Ab4 G#4/Ab4 A4 G#4/Ab4 rest C#4/Db4 C#4/Db4 C#4/Db4 C#4/Db4 D#4/Eb4 D#4/Eb4 C#4/Db4 C#4/Db4 D#4/Eb4 D#4/Eb4 E4 E4 E4 E4 G#4/Ab4 A4 G#4/Ab4 rest|0.196990 0.196990 0.102120 0.102120 0.304680 0.304680 0.096780 0.096780 0.100220 0.150010 0.150010 0.361460 0.361460 0.221070 0.221070 0.183240 0.478670 0.384620 0.106510 0.106510 0.143020 0.143020 0.169480 0.169480 0.224180 0.224180 0.089360 0.089360 0.414460 0.414460 0.378050 0.378050 0.162790 0.207380 0.317260 0.297040|0.02765 0.16934 0.01874 0.08338 0.0821 0.22258 0.0693 0.02748 0.10022 0.07137 0.07864 0.12471 0.23675 0.12356 0.09751 0.18324 0.47867 0.38462 0.0405 0.06601 0.08303 0.05999 0.04687 0.12261 0.09778 0.1264 0.02321 0.06615 0.11958 0.29488 0.06723 0.31082 0.16279 0.20738 0.31726 0.29704|0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0

You should learn from transcriptions.txt

@leon2milan
Copy link
Author

OK。 Thank you so much. I'll try.

@imiskolee
Copy link

@leon2milan did you succeed? can you share an example code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants