New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation - Padé Approximant in Log Softmax Layer #3685
base: master
Are you sure you want to change the base?
Implementation - Padé Approximant in Log Softmax Layer #3685
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you use mlpack style for variables ?
@shrit, thank you for pointing that out. The new commit includes the fix :) |
I approved this one too quickly, I did not see the that the tests were not passing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just preventing mlpack-bot from auto-approving until we get the fixes worked out. I'm guessing that the level of approximation might be too high, and things are not converging? (Maybe a threshold like x < 13
is needed?)
}; | ||
|
||
output.transform([padeApproximant](double x) { | ||
return padeApproximant(x); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be a bit cleaner to just inline the whole approximant into the lambda, but, up to you.
@shrit @rcurtin Sorry for the delay with the benchmarks -I needed to run a more detailed analysis to find an effective solution. Mnist Simple New Implementation (Scale 4, x < 13.0): The initial idea of adding only Despite the graph showing a seemingly doubled duration in runtime, the actual difference in the cnn run is minor. This improvement could be a viable option for implementation? What do you think? auto scaledPadeApproxExpMinusX = [](double x) {
if (x < 13.0) {
double s = 4.0;
double xs = x / s;
double numerator = 24 - 12*xs + 4*xs*xs - xs*xs*xs;
double denominator = 24 + 12*xs + 4*xs*xs + xs*xs*xs;
double pade = numerator / denominator;
return std::pow(pade, s);
}
return 0.0;
};
output.transform([scaledPadeApproxExpMinusX](double x) {
return scaledPadeApproxExpMinusX(x);
}); I think I will also test the algorithm on mnist_cnn soon. |
@MarkFischinger give it a try, what I find weird is that, when we tested this separately the time was way faster, while here it looks much slower than the original one. |
Hey @shrit, I think the trouble we're seeing comes from the higher Here are the
I'm thinking, since lower |
Yeah, a switch to the existing implementation at about The scaling trick is definitely a good one for convergence, but I suspect the |
@rcurtin I did some backtesting, and the results showed unfortunately no/only minor improvements in runtime. Statistically, combining those two algorithms should reduce the error, but I'm still looking for faster implementations because I'm hopeful that I can find a better solution. I'll update you as soon as possible, though my available time will be limited for the next few days due to the exams :/ |
Following our discussion in issue #3662, I've implemented the pade approximant in the log softmax layer. Due to time constraints, I haven't run the tests yet, but I plan to do so shortly and update you with the results.