This is my own implementation of WaveGAN using PyTorch, introduced in this paper. The main task was to synthesize a raw audio of drum sounds and human voice articulating numbers 0 to 9.
✿ Take a look at an example of a synthesized audio HERE!✿
-
While building the model, I chose hyperparameters suggested by the paper, EXCEPT the model size (d) :
-
I reduced the model size (d) from 64 (suggested in the paper) to 32, and obtained recognizable synthesis as early as Epoch 30.
-
During training, I did not use any quantitative stopping criteria (as in the paper). I just used qualitative method of checking the synthesized audio.
WaveGAN model was trained on the following datasets (containing .wav files):