Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve random failing of LMNN Test #1469

Merged
merged 3 commits into from Jul 24, 2018

Conversation

Projects
None yet
3 participants
@manish7294
Copy link
Member

manish7294 commented Jul 19, 2018

It seems LMNNOptimizerTest from main tests is randomly failing. This PR is for dealing with this issue.

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jul 19, 2018

The issue in not fully debugged yet but it seems, it has to do something with shuffle() as it's only visible if linear_scan is set to true.

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jul 19, 2018

@rcurtin Other than what the change I have made here, I can't seem to find any other problem. But, the error is still there. Can you think of any reason as to why after a number of iterations objective suddenly changes to nan in case of bbsgd, that too when linear_scan is set to false?

@rcurtin

This comment has been minimized.

Copy link
Member

rcurtin commented Jul 19, 2018

I agree that the change you've made will solve a problem, but I'm not sure about the BBSGD issue. It's worth noting that the BBSGD optimizer is the only one that you have been using that uses the Evaluate() and Gradient() functions instead of EvaluateWithGradient().

One way you could debug this more is to try and use gdb conditional breakpoints:
https://www.fayewilliams.com/2011/07/13/gdb-conditional-breakpoints/
So, for instance, you could set a breakpoint at the end of Evaluate() and then your condition would just be something like if cost != cost (remember NaN is not equal to itself). Or possibly you could check for when any given eval was NaN, and then inspect all of the local variables to see if you can figure out what is going on.

(Alternately it's possible to avoid gdb and simply do a lot of printing to debug this. I think either way is valid.)

@rcurtin

This comment has been minimized.

Copy link
Member

rcurtin commented Jul 19, 2018

Also, it's worth considering the possibility that some element of transformation is NaN, which would mean probably that some part of the gradient being returned in an iteration is NaN---or very large. Depending on the step size it is possible that BBSGD is simply diverging and needs a smaller step size. So maybe that is a better first thing to check.

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jul 19, 2018

I think I have pinpointed the origin of error :

stepSizeDecay = (1 - (1 / ((double) batchSize - 1) * sampleVariance) /
(batchSize * gradientNorm)) / batchSize;

Here in case sampleVariance or gradientNorm is 0 then stepSizeDecay will become nan.

I clearly don't know whether ideally gradientNorm can become 0 or not. Please tell me your views on this?

@rcurtin

This comment has been minimized.

Copy link
Member

rcurtin commented Jul 19, 2018

Nice work digging in. 👍

In general if the gradient norm is 0, then we are at a local minimum. But if we are only operating on a batch of points, then we are at a local minimum only for those points in the batch, not in general. So it's not clear to me what the best choice is here.

@zoq: what do you think in this situation? My quick intuition is, maybe in this case we do not decay the step size at all. Also don't the two batchSizes on the second line cancel out? In this case I guess the result should be that stepSize = 0 after the update, but that is probably a bad idea to do.

@zoq

This comment has been minimized.

Copy link
Member

zoq commented Jul 19, 2018

You are right batchSize cancels out batchSize in (batchSize * gradientNorm). Skiping the decay step sounds reasonable to me; still have to think about the issue.

@rcurtin

This comment has been minimized.

Copy link
Member

rcurtin commented Jul 21, 2018

Ok, @manish7294, if you'd like to modify the BBSGD optimizer to skip the decay step on iterations like this, I think that is fine for now.

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jul 21, 2018

So, shall we be keeping stepSizeDecay to 0 itself for iterations having sampleVariance or gradientNorm as 0, or do we just bypass the if condition and move to else condition to calculate stepSizeDecay on the basis of NumFunctions.

if (batchSize < function.NumFunctions())
{
stepSizeDecay = (1 - (1 / ((double) batchSize - 1) * sampleVariance) /
(batchSize * gradientNorm)) / batchSize;
}
else
{
stepSizeDecay = 1 / function.NumFunctions();
}

manish7294 added some commits Jul 21, 2018

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jul 21, 2018

@rcurtin Do you think we shall merge this now. It will at least fix the failing build. Though, a better solution need to developed. I have added a comment regarding that.

@rcurtin
Copy link
Member

rcurtin left a comment

Thanks, I think this will work. I'll leave it for 3 days until merge (for the merge policy), and if @zoq thinks we should we can open an issue to try and find a better solution for the BBSGD fix. Personally I suspect that what we've done here is reasonable---if the gradient is 0, then no step will be taken (well I guess depending on momentum), so maybe there is no reason to adjust the step size. I'd wager that the paper doesn't consider the possibility that the gradient norm on a minibatch is 0 since that is such a (seemingly) unlikely edge case.

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jul 22, 2018

I just have a quick look over the bigbatchSGD paper and I think that the last batchSize which was creating the cancellation issue is probably function.NumFunctions()

(batchSize * gradientNorm)) / batchSize;

Here's the reference - https://arxiv.org/pdf/1610.05792.pdf page 10, Algorithm 3 Big batch SGD: with BB stepsizes, Line 14 - 18

@zoq

This comment has been minimized.

Copy link
Member

zoq commented Jul 23, 2018

I'm not following, can you clarify:

I think that the last batchSize which was creating the cancellation issue is probably function.NumFunctions()

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jul 23, 2018

@zoq I was referring to this statement of yours and Ryan over the computation of stepSizeDecay

You are right batchSize cancels out batchSize in (batchSize * gradientNorm).

Please let me know if this helps clarifying anything or you were expecting something else.

@zoq

This comment has been minimized.

Copy link
Member

zoq commented Jul 23, 2018

I see thanks for the clarification.

@zoq

zoq approved these changes Jul 24, 2018

Copy link
Member

zoq left a comment

Looks good to me; also I agree with Ryan the paper probably doesn't consider the possibility that the gradient norm is 0.

@rcurtin rcurtin merged commit 455b009 into mlpack:master Jul 24, 2018

5 checks passed

Memory Checks Build finished.
Details
Static Code Analysis Checks Build finished.
Details
Style Checks Build finished.
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@rcurtin

This comment has been minimized.

Copy link
Member

rcurtin commented Jul 24, 2018

Thanks! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.