Problems with Replacing ReLU with eLU #16

rkjones4 · 2017-04-25T22:14:17Z

Hi I have been messing around with the Repo and I have lately been experimenting with switching out the relu activations in the gan_cifar.py with elu activations, however even with varying the lambda value I have not been able to get any convergence. I am wondering if elu activations pose theoretical issues that are not compatible with the wgan-gp (i.e. more non-linear and wider variance in slope values than reLU or leaky reLU), or if elu should be able to work with the wgan-gp (i.e. has your team gotten any models running that used elu activations). Thank you!

igul222 · 2017-04-26T10:27:01Z

No theoretical reason it shouldn't work that I'm aware of, so decreasing the learning rate and/or setting beta1=0 should help.

…

________________________________ From: rkjones4 <notifications@github.com> Sent: Wednesday, April 26, 2017 12:14:17 AM To: igul222/improved_wgan_training Cc: Subscribed Subject: [igul222/improved_wgan_training] Problems with Switching out ReLU for eLU (#16) Hi I have been messing around with the Repo and I have lately been experimenting with switching out the relu activations in the gan_cifar.py with elu activations, however even with varying the lambda value I have not been able to get any convergence. I am wondering if elu activations pose theoretical issues that are not compatible with the wgan-gp (i.e. more non-linear and wider variance in slope values than reLU or leaky reLU), or if elu should be able to work with the wgan-gp (i.e. has your team gotten any models running that used elu activations). Thank you! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#16>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABBP7hy5w8-kCZZ6TkF-mFXIaGICVx9Xks5rznA5gaJpZM4NIH4_>.

NickShahML · 2017-05-08T20:54:59Z

I have also experienced the same effect and ended up reducing the learning rate to compensate for it.

hiwonjoon · 2017-05-12T21:36:15Z

I also experienced the same effect. Reducing learning rate does not have any effects on this issue.
I observed that W perturbing and diverged when I only trained critic networks. Any thoughts?

NickShahML · 2017-05-15T21:51:48Z

@hiwonjoon , have you tried using weight norm in your conv1d? also have tried decreasing beta1?

LynnHo · 2017-06-20T07:09:25Z

@NickShahML Can you explain why decreasing beta1 should help?

igul222 · 2017-06-21T01:54:48Z

my (very rough, hand-wavy) intuition: beta1 is a momentum term. if you think of momentum as using past gradients as an estimator for the current gradient, it follows that momentum might not be helpful on loss surfaces with sharp curvature. gradient penalty introduces a lot of this through multiplicative interactions between weights in the loss fn. this makes optimization with momentum less stable sometimes. (eLUs seem to be tricky to optimize for similar reasons). note that none of this means you can't make it work -- you'd just need to drop the learning rate so much that it's probably not worth it.

NickShahML · 2017-06-24T16:53:56Z

Yea, I've found that dropping the learning rate from ELU does work though you have to drop it so much that they aren't worth it. You could try SELU instead but I've experienced the same effect.

Jiaming-Liu · 2017-06-29T01:32:36Z

For curiosity's sake, would SELU eliminate the need of normalization in the Discriminator? @NickShahML

NickShahML · 2017-06-29T18:56:28Z

@Jiaming-Liu I don't know if SELU would necessarily eliminate the need to normalize but in theory it should.

jglombitza · 2018-06-06T16:37:16Z

@rkjones4 There is a theoretical reason. By adding the gradient penalty in the objective during the critic training, the resulting gradient update contains terms of second order derivatives of the network's activation functions. For non continuous second order derivatives this can lead to a collapse of the training. Remember that ELU has a non continuous second order derivative. This non continuity ruins the objective by producing strange behaviours in the gradient penalty.
Just have a look on the latest version of: https://arxiv.org/pdf/1704.00028v1.pdf

igul222 · 2018-06-06T16:43:07Z

There’s a note on this in appendix D of the paper. My suggestion is just to use ReLU and not worry about it, but if you really want something like ELU, ((softmax(2x+2)/2)-1 is very close to ELU but smooth, so it works :)

…

On Wed, Jun 6, 2018 at 9:37 AM JGlombitza ***@***.***> wrote: @rkjones4 <https://github.com/rkjones4> There is a theoretical reason. By adding the gradient penalty in the objective during the critic training, the resulting gradient update contains terms of second order derivatives of the network's activation functions. For non continuous second order derivatives this can lead to a collapse of the training. Remember that ELU has a non continuous second order derivative. This non continuity ruins the objective by producing strange behaviours in the gradient penalty. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABBP7qkiTTZC_ttH4ALneNqGlSSpKPm0ks5t6AU8gaJpZM4NIH4_> .

rkjones4 changed the title ~~Problems with Switching out ReLU for eLU~~ Problems with Replacing ReLU with eLU Apr 25, 2017

igul222 closed this as completed Jun 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with Replacing ReLU with eLU #16

Problems with Replacing ReLU with eLU #16

rkjones4 commented Apr 25, 2017

igul222 commented Apr 26, 2017 via email

NickShahML commented May 8, 2017

hiwonjoon commented May 12, 2017

NickShahML commented May 15, 2017

LynnHo commented Jun 20, 2017

igul222 commented Jun 21, 2017

NickShahML commented Jun 24, 2017

Jiaming-Liu commented Jun 29, 2017 •

edited

NickShahML commented Jun 29, 2017

jglombitza commented Jun 6, 2018 •

edited

igul222 commented Jun 6, 2018 via email

Problems with Replacing ReLU with eLU #16

Problems with Replacing ReLU with eLU #16

Comments

rkjones4 commented Apr 25, 2017

igul222 commented Apr 26, 2017 via email

NickShahML commented May 8, 2017

hiwonjoon commented May 12, 2017

NickShahML commented May 15, 2017

LynnHo commented Jun 20, 2017

igul222 commented Jun 21, 2017

NickShahML commented Jun 24, 2017

Jiaming-Liu commented Jun 29, 2017 • edited

NickShahML commented Jun 29, 2017

jglombitza commented Jun 6, 2018 • edited

igul222 commented Jun 6, 2018 via email

Jiaming-Liu commented Jun 29, 2017 •

edited

jglombitza commented Jun 6, 2018 •

edited