Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add regularization, dropout and batch norm? #65

Closed
r-zemblys opened this issue Sep 21, 2016 · 14 comments
Closed

add regularization, dropout and batch norm? #65

r-zemblys opened this issue Sep 21, 2016 · 14 comments

Comments

@r-zemblys
Copy link

Has anybody got loss lower than ~2? Tried couple of configurations (default, 3 and 4 stacks of 10 dilation layers), but loss does not get lower, suggesting the network is not learning anymore.

Also, there is what happened happened after ~30k steps:
training
I believe this is the same problem as reported in #30. There is what happens with weights:
weights

Now running the same network with l2 norm regularization added.

And one more note: training just stops after 44256 steps (already happened twice) without any warnings or errors, despite of num_steps=50000

@dnuffer
Copy link

dnuffer commented Sep 21, 2016

I've observed the same things. I looked at the code to see what might be hanging and didn't find any red flags. I thought the hang might be related to my setup: CUDA 8.0rc (required for Pascal support), cuDNN 5.1, and tensorflow built from source (git master from 9/20)

@ibab
Copy link
Owner

ibab commented Sep 21, 2016

The hanging is probably caused by the background audio processing crashing. (Especially if if the CPU/GPU are idle once it stops).
Usually, there should be a backtrace that can help us find the reason it crashed.
Which commit did you observe the problem with?
There was a bug where we simply stop processing audio once we've seen every file once.
It might be that you're on an older commit that had this problem.

I've been trying to find a solution to the gradient jumping to large values at large step numbers, but don't have any amazing solutions at the moment.
It seems to be related to the ReLU activations in the last few layers of the network.
I've tried clipping the gradients, which didn't have an effect on this problem.
Replacing the ReLU activations with Tanh seems to fix it completely, but the network doesn't converge quite as quickly as with ReLU.

@lelayf
Copy link

lelayf commented Sep 21, 2016

@ibab I'm experiencing the stalling with the latest commit.
@r-zemblys if you resume training at the checkpoint right before gradients implosion with a lower learning rate does it still behave the same ?

@r-zemblys
Copy link
Author

@lelayf i've used learning rate of 0.01 to get that loss curve above. Train saver only stores last 5 checkpoints so I'm not able to try lowering learning rate right before gradient implosion.

@ibab I was indeed using older commit. Latest one does not have stalling problem.

Here is loss curve with l2 regularization added; orange - learning rate 0.01 (~20k steps), blue - 0.001 (~60k steps)
l2norm

Gradient implosion problem is gone, but it seems network is not learning anymore after first epoch. Will try to generate some audio later today.

@lelayf
Copy link

lelayf commented Sep 22, 2016

@r-zemblys are you training on GPU or CPU ?

@r-zemblys
Copy link
Author

Here is generated 80k samples, primed with 8k sample audio from other database.
generated_l2_primed.wav.zip

Soundwave looks reasonably OK (green - generated audio)
soundwave

Notes:

  • used af4c58e
  • trained for ~20k steps with learning rate of 0.01 and continued for ~60k steps with 0.001
  • @lelayf it is TitanX GPU I'm using
  • used l2 regularization
  • disabled silence trimming because of #59
  • there was a bug in WaveNet.decode, which resulted to all-zeros output. I think bug is still here in fc5417d

@ibab
Copy link
Owner

ibab commented Sep 22, 2016

@r-zemblys: Excellent, did you use the default wavenet_params.json?
I've also linked some of my results in #47.

@r-zemblys
Copy link
Author

Forgot to add. This is configuration I've used:


{
    "filter_width": 2,
    "quantization_steps": 256,
    "sample_rate": 16000,
    "dilations": [1, 2, 4, 8, 16, 32, 64, 128, 256, 512,
                  1, 2, 4, 8, 16, 32, 64, 128, 256, 512,
                  1, 2, 4, 8, 16, 32, 64, 128, 256, 512],
    "residual_channels": 32,
    "dilation_channels": 16,
    "use_biases": false
}

But as I've mention in the beginning, there is no difference (at least in loss curve) if using default configuration.

@ibab
Copy link
Owner

ibab commented Sep 22, 2016

@r-zemblys: Did you train on the entire dataset, or a specific speaker?

@r-zemblys
Copy link
Author

@ibab: entire VCTK corpus. And then primed generation with a recording from LibriSpeech ASR corpus.

@ibab
Copy link
Owner

ibab commented Sep 22, 2016

That's very cool. I think mixing together all different speakers explains the voice difference between your sample and mine.
Would you be interested in contributing the l2 regularization in a pull request?

@hoonyoung
Copy link

I'm using python 2.7 and as r-zemplys mentioned above as "..there was a bug in WaveNet.decode, which resulted to all-zeros output", I obtained the generated.wav file with all-zeros.

After fixing the last line of "wavenet_ops.py" like below, I am now getting the speech-like waveform output.

magnitude = (1 / mu) * ((1 + mu)**abs(signal) - 1)
--> magnitude = (1. / mu) * ((1. + mu)**abs(signal) - 1)

Hope someone reflect it to the code if necessary.

@ibab
Copy link
Owner

ibab commented Sep 22, 2016

@hoonyoung: This should be fixed on master now. I've also enabled travis to run the tests with Python 2.

@lelayf
Copy link

lelayf commented Sep 23, 2016

I commented out silence trimming and now training does not stall anymore, using 88e77bf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants