Unconditional synthesis #34

berkeleymalagon · 2022-06-28T08:28:45Z

I"m running the this command to generate unconditional samples.

python -m diffwave.inference --fast /path/to/model -o output.wav

I've trained for almost 4k epochs on 7k+ sounds. I seem to get the same sound (or a very similar one) regardless of training time.

I have not worked with diffwave before - any tips for debugging this?

Thanks

The text was updated successfully, but these errors were encountered:

berkeleymalagon · 2022-06-28T09:29:29Z

For context, here are the params during inference in case there's anything obviously wrong with them:

model.params: {'batch_size': 16, 'learning_rate': 0.0002, 'max_grad_norm': None, 'sample_rate': 44100, 'n_mels': 80, 'n_fft': 1024, 'hop_samples': 256, 'crop_mel_frames': 62, 'residual_layers': 30, 'residual_channels': 64, 'dilation_cycle_length': 10, 'unconditional': True, 'noise_schedule': [0.0001, 0.0011183673469387756, 0.002136734693877551, 0.0031551020408163264, 0.004173469387755102, 0.005191836734693878, 0.006210204081632653, 0.007228571428571429, 0.008246938775510203, 0.009265306122448979, 0.010283673469387754, 0.01130204081632653, 0.012320408163265305, 0.013338775510204081, 0.014357142857142857, 0.015375510204081632, 0.016393877551020408, 0.017412244897959183, 0.01843061224489796, 0.019448979591836734, 0.02046734693877551, 0.021485714285714285, 0.02250408163265306, 0.023522448979591836, 0.02454081632653061, 0.025559183673469387, 0.026577551020408163, 0.027595918367346938, 0.028614285714285714, 0.02963265306122449, 0.030651020408163265, 0.031669387755102044, 0.03268775510204082, 0.033706122448979595, 0.03472448979591837, 0.035742857142857146, 0.03676122448979592, 0.0377795918367347, 0.03879795918367347, 0.03981632653061225, 0.04083469387755102, 0.0418530612244898, 0.042871428571428574, 0.04388979591836735, 0.044908163265306125, 0.0459265306122449, 0.046944897959183676, 0.04796326530612245, 0.04898163265306123, 0.05], 'inference_noise_schedule': [0.0001, 0.001, 0.01, 0.05, 0.2, 0.5], 'audio_len': 22051}

albertfgu · 2022-07-03T17:31:43Z

I tried using this codebase in the past for SC09 unconditional generation and found that it does not work. An alternative implementation of DiffWave at philsyn/diffwave-unconditional did work. I've released an improved implementation of this at https://github.com/albertfgu/diffwave-sashimi

Rongjiehuang · 2022-07-19T07:26:46Z

@Andrechang Hi, using this repo, I have generated silence waves in SC09 datasets, have you succeeded in getting plausible sounds?

Andrechang · 2022-07-27T15:46:49Z

It shouldn't output silence waves. When I trained shortly it generated noisy audio.

Rongjiehuang · 2022-07-27T16:19:11Z

It seems that the Diffwave paper uses res_channel = 256 for unconditional speech synthesis (but we have 64 in this code), which is why we could not get reasonable sounds.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unconditional synthesis #34

Unconditional synthesis #34

berkeleymalagon commented Jun 28, 2022

berkeleymalagon commented Jun 28, 2022

albertfgu commented Jul 3, 2022

Rongjiehuang commented Jul 19, 2022 •

edited

Andrechang commented Jul 27, 2022

Rongjiehuang commented Jul 27, 2022

Unconditional synthesis #34

Unconditional synthesis #34

Comments

berkeleymalagon commented Jun 28, 2022

berkeleymalagon commented Jun 28, 2022

albertfgu commented Jul 3, 2022

Rongjiehuang commented Jul 19, 2022 • edited

Andrechang commented Jul 27, 2022

Rongjiehuang commented Jul 27, 2022

Rongjiehuang commented Jul 19, 2022 •

edited