Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elliott Activation Function #14

Closed
gnitr opened this issue Aug 28, 2011 · 9 comments
Closed

Elliott Activation Function #14

gnitr opened this issue Aug 28, 2011 · 9 comments
Milestone

Comments

@gnitr
Copy link

gnitr commented Aug 28, 2011

Here's the code the Elliott Activation function in case someone is interested. It is not as popular as tanh and sigmoid but I've seen it used in a few papers.

The implementation is based on this report:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.46.7204&rep=rep1&type=pdf

Since I discovered that something like 70% of the training time is spent in Math.tanh or Math.exp I was looking for a cheap alternative. The main advantage of this activation function is that it is very fast to compute. It is bounded between -1 and 1 like tanh but will reach those values more slowly so it might be more suitable for classification tasks.

I've had very mixed and results with this implementation so far. Used with Rprop on a xor problem it seems to perform quite badly in terms of number of iterations and getting stuck in local minima or not being able to go below high MSE values. It is quite unexpected so I'm wondering if maybe there's a mistake somewhere with the derivative.

On the other hand I've also observed excellent results with evolutionary algorithms like GA (and my version of PSO) with often very fast convergence compared to tanh and sigmoid. That's why I put this code here in case it might be useful to someone else.

/*
 */
package org.encog.engine.network.activation;

/**
 * Computationally efficient alternative to ActivationTANH.
 * Its output is in the range [-1, 1], and it is derivable.
 * 
 * It will approach the -1 and 1 more slowly than Tanh so it 
 * might be more suitable to classification tasks than predictions tasks.
 * 
 * Elliott, D.L. "A better activation function for artificial neural networks", 1993
 * http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.46.7204&rep=rep1&type=pdf
 */
public class ActivationElliott implements ActivationFunction {

    /**
     * Serial id for this class.
     */
    private static final long serialVersionUID = 1234L;

    /**
     * The parameters.
     */
    private final double[] params;

    /**
     * Construct a basic HTAN activation function, with a slope of 1.
     */
    public ActivationElliott() {
        this.params = new double[0];
    }

    /**
     * {@inheritDoc}
     */
    @Override
    public final void activationFunction(final double[] x, final int start,
            final int size) {
        for (int i = start; i < start + size; i++) {
            x[i] = 1.0 / (1.0 + (Math.abs(x[i])));
        }
    }

    /**
     * @return The object cloned;
     */
    @Override
    public final ActivationFunction clone() {
        return new ActivationElliott();
    }

    /**
     * {@inheritDoc}
     */
    @Override
    public final double derivativeFunction(final double b, final double a) {
        return (1.0 - a) * (1.0 - a);
    }

    /**
     * {@inheritDoc}
     */
    @Override
    public final String[] getParamNames() {
        final String[] result = {};
        return result;
    }

    /**
     * {@inheritDoc}
     */
    @Override
    public final double[] getParams() {
        return this.params;
    }

    /**
     * @return Return true, Elliott activation has a derivative.
     */
    @Override
    public final boolean hasDerivative() {
        return true;
    }

    /**
     * {@inheritDoc}
     */
    @Override
    public final void setParam(final int index, final double value) {
        this.params[index] = value;
    }

}

@ghost ghost assigned seemasingh Aug 29, 2011
@seemasingh
Copy link
Contributor

Thanks for the code, I will get this added in Encog 3.1.

@gnitr
Copy link
Author

gnitr commented Aug 29, 2011

Hey cool, thanks for looking into this!

I'd be curious to know if you manage to make it work properly with backprogation methods as I've had no luck with it. But I don't see why as I've seen it being successfully used in various research reports.

If it happens to work then it might become be a useful alternative to tanh for intensive training tasks. I think it can take more iterations than tanh to meet a training condition but the iterations should take much less CPU time.

Geoffroy

@gnitr
Copy link
Author

gnitr commented Aug 30, 2011

Here's another page which mentions this function and also a variant which is analogous to the logistic sigmoid this time:
http://www.dontveter.com/bpr/activate.html

I've done more tests with it and got some interesting results.

Due to the shape of the curve of these activation functions it is important to rescale the output to [0.1, 0.9] instead of [0, 1]. The best results I've had so far were with an evolutionary algorithm on the XOR dataset. Using a combination of the Elliott function and it's sigmoid variant the number of iterations was 5 times lower than with TANH+SIGMOID and less trapped into local minima. Moreover the CPU time per iteration was twice better.

This difference was not visible on all the datatsets I've tried but at least it seems to confirm that it's certainly worth exploring their applicability when working evolutionary algorithms like GA or Neat for instance.

@seemasingh
Copy link
Contributor

That is pretty interesting, I read the articles on the activation function. I added your code to Encog and I also was not able to get any sort of propagation (derivative based) training to work. I plugged the activation function you had in the code into R and came up with a different derivative. So I am THINKING there might be some disconnect there. I will take a look.

@seemasingh
Copy link
Contributor

Okay I added the class. I also got the derivative working, it now performs just as well as tanh in my initial testing. I also added this to the workbench so that you can easily toggle between the Elliott chart and the tanh chart and see the minor difference.

I actually split this into two files:

https://github.com/encog/encog-java-core/blob/master/src/main/java/org/encog/engine/network/activation/ActivationElliott.java

https://github.com/encog/encog-java-core/blob/master/src/main/java/org/encog/engine/network/activation/ActivationElliottSymmetric.java

ActivationElliott - is similar to Encog's sigmoid function, that is range [0,1]

ActivationElliottSymmetric - is similar to Encog's tanh function, that is range [-1,1]

Did not do any performance benchmarks, however, I bet it is much faster.

@jeffheaton
Copy link
Owner

It will be interesting to see this in a benchmark.

@gnitr
Copy link
Author

gnitr commented Sep 9, 2011

Hey, thanks for integrating this to Encog! I see that you've also included a parameter to control the slope, good thinking!

I'll do more tests once I get some spare time.

Geoffroy

@jeffheaton
Copy link
Owner

These are quite fast! I will do an official benchmark soon. Thanks for adding them, very useful. I will close the issue once I add them to C# too.

@seemasingh
Copy link
Contributor

Implemented in both Java and C#, closing issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants