New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elliott Activation Function #14
Comments
Thanks for the code, I will get this added in Encog 3.1. |
Hey cool, thanks for looking into this! I'd be curious to know if you manage to make it work properly with backprogation methods as I've had no luck with it. But I don't see why as I've seen it being successfully used in various research reports. If it happens to work then it might become be a useful alternative to tanh for intensive training tasks. I think it can take more iterations than tanh to meet a training condition but the iterations should take much less CPU time. Geoffroy |
Here's another page which mentions this function and also a variant which is analogous to the logistic sigmoid this time: I've done more tests with it and got some interesting results. Due to the shape of the curve of these activation functions it is important to rescale the output to [0.1, 0.9] instead of [0, 1]. The best results I've had so far were with an evolutionary algorithm on the XOR dataset. Using a combination of the Elliott function and it's sigmoid variant the number of iterations was 5 times lower than with TANH+SIGMOID and less trapped into local minima. Moreover the CPU time per iteration was twice better. This difference was not visible on all the datatsets I've tried but at least it seems to confirm that it's certainly worth exploring their applicability when working evolutionary algorithms like GA or Neat for instance. |
That is pretty interesting, I read the articles on the activation function. I added your code to Encog and I also was not able to get any sort of propagation (derivative based) training to work. I plugged the activation function you had in the code into R and came up with a different derivative. So I am THINKING there might be some disconnect there. I will take a look. |
Okay I added the class. I also got the derivative working, it now performs just as well as tanh in my initial testing. I also added this to the workbench so that you can easily toggle between the Elliott chart and the tanh chart and see the minor difference. I actually split this into two files: ActivationElliott - is similar to Encog's sigmoid function, that is range [0,1] ActivationElliottSymmetric - is similar to Encog's tanh function, that is range [-1,1] Did not do any performance benchmarks, however, I bet it is much faster. |
It will be interesting to see this in a benchmark. |
Hey, thanks for integrating this to Encog! I see that you've also included a parameter to control the slope, good thinking! I'll do more tests once I get some spare time. Geoffroy |
These are quite fast! I will do an official benchmark soon. Thanks for adding them, very useful. I will close the issue once I add them to C# too. |
Implemented in both Java and C#, closing issue. |
Here's the code the Elliott Activation function in case someone is interested. It is not as popular as tanh and sigmoid but I've seen it used in a few papers.
The implementation is based on this report:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.46.7204&rep=rep1&type=pdf
Since I discovered that something like 70% of the training time is spent in Math.tanh or Math.exp I was looking for a cheap alternative. The main advantage of this activation function is that it is very fast to compute. It is bounded between -1 and 1 like tanh but will reach those values more slowly so it might be more suitable for classification tasks.
I've had very mixed and results with this implementation so far. Used with Rprop on a xor problem it seems to perform quite badly in terms of number of iterations and getting stuck in local minima or not being able to go below high MSE values. It is quite unexpected so I'm wondering if maybe there's a mistake somewhere with the derivative.
On the other hand I've also observed excellent results with evolutionary algorithms like GA (and my version of PSO) with often very fast convergence compared to tanh and sigmoid. That's why I put this code here in case it might be useful to someone else.
The text was updated successfully, but these errors were encountered: