New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with Replacing ReLU with eLU #16
Comments
No theoretical reason it shouldn't work that I'm aware of, so decreasing the learning rate and/or setting beta1=0 should help.
…________________________________
From: rkjones4 <notifications@github.com>
Sent: Wednesday, April 26, 2017 12:14:17 AM
To: igul222/improved_wgan_training
Cc: Subscribed
Subject: [igul222/improved_wgan_training] Problems with Switching out ReLU for eLU (#16)
Hi I have been messing around with the Repo and I have lately been experimenting with switching out the relu activations in the gan_cifar.py with elu activations, however even with varying the lambda value I have not been able to get any convergence. I am wondering if elu activations pose theoretical issues that are not compatible with the wgan-gp (i.e. more non-linear and wider variance in slope values than reLU or leaky reLU), or if elu should be able to work with the wgan-gp (i.e. has your team gotten any models running that used elu activations). Thank you!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#16>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABBP7hy5w8-kCZZ6TkF-mFXIaGICVx9Xks5rznA5gaJpZM4NIH4_>.
|
I have also experienced the same effect and ended up reducing the learning rate to compensate for it. |
I also experienced the same effect. Reducing learning rate does not have any effects on this issue. |
@hiwonjoon , have you tried using weight norm in your conv1d? also have tried decreasing beta1? |
@NickShahML Can you explain why decreasing beta1 should help? |
my (very rough, hand-wavy) intuition: beta1 is a momentum term. if you think of momentum as using past gradients as an estimator for the current gradient, it follows that momentum might not be helpful on loss surfaces with sharp curvature. gradient penalty introduces a lot of this through multiplicative interactions between weights in the loss fn. this makes optimization with momentum less stable sometimes. (eLUs seem to be tricky to optimize for similar reasons). note that none of this means you can't make it work -- you'd just need to drop the learning rate so much that it's probably not worth it. |
Yea, I've found that dropping the learning rate from ELU does work though you have to drop it so much that they aren't worth it. You could try SELU instead but I've experienced the same effect. |
For curiosity's sake, would SELU eliminate the need of normalization in the Discriminator? @NickShahML |
@Jiaming-Liu I don't know if SELU would necessarily eliminate the need to normalize but in theory it should. |
@rkjones4 There is a theoretical reason. By adding the gradient penalty in the objective during the critic training, the resulting gradient update contains terms of second order derivatives of the network's activation functions. For non continuous second order derivatives this can lead to a collapse of the training. Remember that ELU has a non continuous second order derivative. This non continuity ruins the objective by producing strange behaviours in the gradient penalty. |
There’s a note on this in appendix D of the paper. My suggestion is just to
use ReLU and not worry about it, but if you really want something like ELU,
((softmax(2x+2)/2)-1 is very close to ELU but smooth, so it works :)
…On Wed, Jun 6, 2018 at 9:37 AM JGlombitza ***@***.***> wrote:
@rkjones4 <https://github.com/rkjones4> There is a theoretical reason. By
adding the gradient penalty in the objective during the critic training,
the resulting gradient update contains terms of second order derivatives of
the network's activation functions. For non continuous second order
derivatives this can lead to a collapse of the training. Remember that ELU
has a non continuous second order derivative. This non continuity ruins the
objective by producing strange behaviours in the gradient penalty.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#16 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABBP7qkiTTZC_ttH4ALneNqGlSSpKPm0ks5t6AU8gaJpZM4NIH4_>
.
|
Hi I have been messing around with the Repo and I have lately been experimenting with switching out the relu activations in the gan_cifar.py with elu activations, however even with varying the lambda value I have not been able to get any convergence. I am wondering if elu activations pose theoretical issues that are not compatible with the wgan-gp (i.e. more non-linear and wider variance in slope values than reLU or leaky reLU), or if elu should be able to work with the wgan-gp (i.e. has your team gotten any models running that used elu activations). Thank you!
The text was updated successfully, but these errors were encountered: