Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rectified Big Batch SGD implementation #91

Merged
merged 3 commits into from May 12, 2019

Conversation

@manish7294
Copy link
Member

commented Mar 9, 2019

Solves below issues :

  • Use curvature of the quadratic approximation while calculating stepSizeDecay.
  • Update Gradient, Sample Variance and Gradient Norm before second backtrack call within the iteration.
  • Correct backtracking line search termination condition.
  • Use 1/2 as default back track step size.

manish7294 added some commits Mar 9, 2019

@manish7294

This comment has been minimized.

Copy link
Member Author

commented Mar 9, 2019

@zoq Can you please have a quick look over the implementation? As far I could see, the code should have worked. But it does not seems to, as BBSBBLogisticRegressionTest is failing. I tried to dig into the issue and found that on the very starting lines of optimization, stepSize is getting reduce to very low value, making iterate update negligible which in turn causing v to reach 0. Maybe you can spot the error, I may be overlooking.

@manish7294

This comment has been minimized.

Copy link
Member Author

commented Mar 10, 2019

@zoq there was much more to it than I earlier expected, Today again while debugging & going through paper several times, I realize that though we were updating the iterate after first backtrack call but were using gradient, sample variance and gradient norm calculated on earlier update value for second backtrack call. Hopefully, everything should be correct now.

arma::mat delta0, delta1;

// Initialize previous iterate, if not already initialized.
if (iteratePrev.is_empty())

This comment has been minimized.

Copy link
@zoq

zoq Mar 10, 2019

Member

What about we move this into the constructor?

This comment has been minimized.

Copy link
@manish7294

manish7294 Mar 10, 2019

Author Member

We need iterate dimensions for this, which we don't have access to in constructor.

This comment has been minimized.

Copy link
@zoq

zoq Mar 10, 2019

Member

You are right, thanks for the clarification.

gradient += functionGradient;

// Used for curvature calculation.
function.Gradient(iteratePrev, offset + j, functionGradientPrev, 1);

This comment has been minimized.

Copy link
@zoq

zoq Mar 10, 2019

Member

Looks like we could move the gradient outside the for loop and use backtrackingBatchSize instead of 1.

This comment has been minimized.

Copy link
@manish7294

manish7294 Mar 10, 2019

Author Member

Right, we can do that.

@manish7294

This comment has been minimized.

Copy link
Member Author

commented Mar 10, 2019

@zoq I have observed one more thing :
In BBS_Armijo backtracking line searchwe use

while (overallObjectiveUpdate >
(overallObjective - searchParameter * stepSize * gradientNorm))

as the breaking condition
whereas on carefully looking the condition given in paper
while lB(xt − α∇lB(xt)) > lB(xt) − cαt‖ ∇lB(xt) ‖^2 do (Algorithm 2 line 15)
we can see that authors are using αt in cαt‖ ∇lB(xt) ‖^2 term whereas we are using α in the implementation. (αt can be calculated using equation 8)

Also there is this sentence related to BBS_BB implementation which I am not able to understand.

Further, instead of calculating two gradients on each iteration (∇lBt(xt) and∇lBt(xt−1)), our implementation uses the same batch (and stepsize) on two consecutive iterations. Thus, one parameter update takes place for each gradient calculation.

Maybe you can help me out with this.

Let me know your thoughts on them.

@zoq

This comment has been minimized.

Copy link
Member

commented Mar 21, 2019

Sorry for the slow response, would be great to get this fixed. You are right, I think I missed t or interpreted it as a timestep. We should use equation 8 here instead.

Also there is this sentence related to BBS_BB implementation which I am not able to understand.

Further, instead of calculating two gradients on each iteration (∇lBt(xt) and∇lBt(xt−1)), our implementation uses the same batch (and stepsize) on two consecutive iterations. Thus, one parameter update takes place for each gradient calculation.

Maybe you can help me out with this.

I'm not entirely sure if they just mean they reused the gradient for the next iteration. Don't see any gradient step we could remove here.

@manish7294

This comment has been minimized.

Copy link
Member Author

commented Mar 23, 2019

I went through backtracking line search algorithm, and it is exactly the same to what already been implemented. So, I doubt the understanding of notation given in the paper.

@zoq

This comment has been minimized.

Copy link
Member

commented Mar 26, 2019

I think unless we can clarify the part, we could still go with the current implementation, what do you think?

@manish7294

This comment has been minimized.

Copy link
Member Author

commented Mar 27, 2019

Sure, I agree. I guess that's the best option under current circumstances.

@rcurtin

rcurtin approved these changes Apr 2, 2019

Copy link
Member

left a comment

This looks good to me---if you want to add something to HISTORY.md feel free, otherwise we can do it during merge and then I can release a bugfix version. 👍

@mlpack-bot

mlpack-bot bot approved these changes Apr 3, 2019

Copy link

left a comment

Second approval provided automatically after 24 hours. 👍

@rcurtin rcurtin merged commit 7318be7 into mlpack:master May 12, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@rcurtin

This comment has been minimized.

Copy link
Member

commented May 12, 2019

Thanks, happy to merge this in. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.