New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix derivative of LogSoftMax (Resolves #3468) #3469
Conversation
Can you cite the paper? So I can double check this. |
See this colab file for my explanation with references to paper: LogSoftMax. |
I have updated the colab file(LogSoftMax) to include a simpler and a more complete calculation of the derivative. Also, an important reference in understanding the derivative of LogSoftMax is understanding how the derivative of Softmax is calculated. For that, you can see the mlpack implementation(great job to whoever wrote) and also this blog is a good reference to understand how the softmax derivative is calculated as well as how it is multiplied with the back-propagated derivatives. The derivative in the mlpack is a further simplified form. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to add a note to HISTORY.md
about this?
I found that the tests still pass with both implementations. Maybe we should add a Jacobian test to log_softmax.cpp
too? I tried that, but it fails---I haven't dug into exactly why yet:
/**
* Jacobian log-softmax module test.
*/
TEST_CASE("JacobianLogSoftMaxLayerTest", "[ANNLayerTest]")
{
for (size_t i = 0; i < 5; ++i)
{
const size_t inputElements = RandInt(2, 1000);
arma::mat input;
input.set_size(inputElements, 1);
input.randu();
LogSoftMax module;
module.InputDimensions() = std::vector<size_t>({ inputElements });
module.ComputeOutputDimensions();
double error = JacobianTest(module, input);
REQUIRE(error <= 1e-5);
}
}
Sorry it took so long to get to this. I've been so heads-down on bandicoot that a lot of mlpack stuff has gotten put on the back burner.
I will add the Jacobian test after the PR correcting it is merged. |
@rcurtin I have added the JacobianTest. It is mostly your implementation from your previous comment, because after looking at it, it was probably better than anything I could think of 😅. Hope you don't have a problem with that. One slight change is that, JacobianTest initializes the input matrix, so there's no need to initialize the matrix before passing. |
Sounds good @mrdaybird; thanks for adding it. I'm going to debug the static code analysis checks job on this PR, so, don't mind the noise... |
@mlpack-jenkins test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet, got the static code analysis job fixed on the first try. :)
We should add a note to HISTORY.md
; if you want to do that, feel free, otherwise I'll handle it during merge. 👍
Thanks for finding and fixing this important issue!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second approval provided automatically after 24 hours. 👍
Forgot to update the history during merge... oops... so I opened #3479. |
Update the Backward method of LogSoftMax and fix test corresponding to it. Fixes #3468 (See issue for more details)
See this colab file for my explanation and with references for the new derivative : LogSoftMax.
Also to be noted, the new derivative was tested against JacobianTest(corrected version).