Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with Replacing ReLU with eLU #16

Closed
rkjones4 opened this issue Apr 25, 2017 · 11 comments
Closed

Problems with Replacing ReLU with eLU #16

rkjones4 opened this issue Apr 25, 2017 · 11 comments

Comments

@rkjones4
Copy link

Hi I have been messing around with the Repo and I have lately been experimenting with switching out the relu activations in the gan_cifar.py with elu activations, however even with varying the lambda value I have not been able to get any convergence. I am wondering if elu activations pose theoretical issues that are not compatible with the wgan-gp (i.e. more non-linear and wider variance in slope values than reLU or leaky reLU), or if elu should be able to work with the wgan-gp (i.e. has your team gotten any models running that used elu activations). Thank you!

@rkjones4 rkjones4 changed the title Problems with Switching out ReLU for eLU Problems with Replacing ReLU with eLU Apr 25, 2017
@igul222
Copy link
Owner

igul222 commented Apr 26, 2017 via email

@NickShahML
Copy link

I have also experienced the same effect and ended up reducing the learning rate to compensate for it.

@hiwonjoon
Copy link

I also experienced the same effect. Reducing learning rate does not have any effects on this issue.
I observed that W perturbing and diverged when I only trained critic networks. Any thoughts?

@NickShahML
Copy link

@hiwonjoon , have you tried using weight norm in your conv1d? also have tried decreasing beta1?

@LynnHo
Copy link

LynnHo commented Jun 20, 2017

@NickShahML Can you explain why decreasing beta1 should help?

@igul222
Copy link
Owner

igul222 commented Jun 21, 2017

my (very rough, hand-wavy) intuition: beta1 is a momentum term. if you think of momentum as using past gradients as an estimator for the current gradient, it follows that momentum might not be helpful on loss surfaces with sharp curvature. gradient penalty introduces a lot of this through multiplicative interactions between weights in the loss fn. this makes optimization with momentum less stable sometimes. (eLUs seem to be tricky to optimize for similar reasons). note that none of this means you can't make it work -- you'd just need to drop the learning rate so much that it's probably not worth it.

@igul222 igul222 closed this as completed Jun 21, 2017
@NickShahML
Copy link

Yea, I've found that dropping the learning rate from ELU does work though you have to drop it so much that they aren't worth it. You could try SELU instead but I've experienced the same effect.

@Jiaming-Liu
Copy link

Jiaming-Liu commented Jun 29, 2017

For curiosity's sake, would SELU eliminate the need of normalization in the Discriminator? @NickShahML

@NickShahML
Copy link

@Jiaming-Liu I don't know if SELU would necessarily eliminate the need to normalize but in theory it should.

@jglombitza
Copy link

jglombitza commented Jun 6, 2018

@rkjones4 There is a theoretical reason. By adding the gradient penalty in the objective during the critic training, the resulting gradient update contains terms of second order derivatives of the network's activation functions. For non continuous second order derivatives this can lead to a collapse of the training. Remember that ELU has a non continuous second order derivative. This non continuity ruins the objective by producing strange behaviours in the gradient penalty.
Just have a look on the latest version of: https://arxiv.org/pdf/1704.00028v1.pdf

@igul222
Copy link
Owner

igul222 commented Jun 6, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants