-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add regularization, dropout and batch norm? #65
Comments
I've observed the same things. I looked at the code to see what might be hanging and didn't find any red flags. I thought the hang might be related to my setup: CUDA 8.0rc (required for Pascal support), cuDNN 5.1, and tensorflow built from source (git master from 9/20) |
The hanging is probably caused by the background audio processing crashing. (Especially if if the CPU/GPU are idle once it stops). I've been trying to find a solution to the gradient jumping to large values at large step numbers, but don't have any amazing solutions at the moment. |
@ibab I'm experiencing the stalling with the latest commit. |
@lelayf i've used learning rate of 0.01 to get that loss curve above. Train saver only stores last 5 checkpoints so I'm not able to try lowering learning rate right before gradient implosion. @ibab I was indeed using older commit. Latest one does not have stalling problem. Here is loss curve with l2 regularization added; orange - learning rate 0.01 (~20k steps), blue - 0.001 (~60k steps) Gradient implosion problem is gone, but it seems network is not learning anymore after first epoch. Will try to generate some audio later today. |
@r-zemblys are you training on GPU or CPU ? |
Here is generated 80k samples, primed with 8k sample audio from other database. Soundwave looks reasonably OK (green - generated audio) Notes:
|
@r-zemblys: Excellent, did you use the default |
Forgot to add. This is configuration I've used:
But as I've mention in the beginning, there is no difference (at least in loss curve) if using default configuration. |
@r-zemblys: Did you train on the entire dataset, or a specific speaker? |
@ibab: entire VCTK corpus. And then primed generation with a recording from LibriSpeech ASR corpus. |
That's very cool. I think mixing together all different speakers explains the voice difference between your sample and mine. |
I'm using python 2.7 and as r-zemplys mentioned above as "..there was a bug in WaveNet.decode, which resulted to all-zeros output", I obtained the generated.wav file with all-zeros. After fixing the last line of "wavenet_ops.py" like below, I am now getting the speech-like waveform output. magnitude = (1 / mu) * ((1 + mu)**abs(signal) - 1) Hope someone reflect it to the code if necessary. |
@hoonyoung: This should be fixed on master now. I've also enabled travis to run the tests with Python 2. |
I commented out silence trimming and now training does not stall anymore, using 88e77bf. |
Has anybody got loss lower than ~2? Tried couple of configurations (default, 3 and 4 stacks of 10 dilation layers), but loss does not get lower, suggesting the network is not learning anymore.
Also, there is what happened happened after ~30k steps:
![training](https://cloud.githubusercontent.com/assets/8914323/18709029/99d32c48-8006-11e6-8a33-ecd8388afde7.png)
![weights](https://cloud.githubusercontent.com/assets/8914323/18709214/abe1ffbc-8007-11e6-9454-fabca3cfc3e5.png)
I believe this is the same problem as reported in #30. There is what happens with weights:
Now running the same network with l2 norm regularization added.
And one more note: training just stops after 44256 steps (already happened twice) without any warnings or errors, despite of num_steps=50000
The text was updated successfully, but these errors were encountered: