New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of Stochastic Coordinate Descent #1075

Merged
merged 31 commits into from Sep 14, 2017

Conversation

Projects
None yet
4 participants
@shikharbhardwaj
Contributor

shikharbhardwaj commented Aug 1, 2017

Here is an implementation of another optimizer, SCD. I have started with a serial implementation first, and made some changes to LogisticRegressionFunction to test this optimizer. The optimizer requires another addition to the requirements for the function type to be optimized.

Currently, I am working on getting two things done :

  1. Implement the greedy descent policy, using ideas from : http://www.maths.ed.ac.uk/~prichtar/Optimization_and_Big_Data_2015/slides/Schmidt.pdf.

  2. Optimize the FeatureGradient method in LogisticRegressionFunction to not use Gradient.

I'll update soon with more changes.

@rcurtin

Looks good to me. I do think we need a formalized write-up of all the different FunctionTypes that we have now and what the differences between them are. Would you be willing to sketch up some documentation for that? I figured it could be look the documentation in doc/policies/.

Another question I have is, are you planning on making any of the other functions (like softmax regression) into ResolvableFunctionTypes?

And last question for now, have you done any timing simulations to compare SCD with, e.g., LBFGS or SGD for logistic regression? I am excited to see how it performs. :)

Show outdated Hide outdated src/mlpack/tests/scd_test.cpp
Show outdated Hide outdated src/mlpack/core/optimizers/scd/scd.hpp
@shikharbhardwaj

This comment has been minimized.

Show comment
Hide comment
@shikharbhardwaj

shikharbhardwaj Aug 10, 2017

Contributor

Thanks for the suggestions Ryan.
I am currently working on some more improvements. I'll add more functions with the Resolvable requirements soon.

Documenting the policies added is a nice improvement. I'll do that.

Contributor

shikharbhardwaj commented Aug 10, 2017

Thanks for the suggestions Ryan.
I am currently working on some more improvements. I'll add more functions with the Resolvable requirements soon.

Documenting the policies added is a nice improvement. I'll do that.

@mentekid

Only minor comments from me.

This looks good. I am glad you went into the testing rabbit hole with HOGWILD!, now testing this feels more straightforward.

Show outdated Hide outdated src/mlpack/core/optimizers/parallel_sgd/sparse_test_function.hpp
Show outdated Hide outdated src/mlpack/core/optimizers/scd/descent_policies/random_descent.hpp
* @tparam DescentPolicy Descent policy to decide the order in which the
* coordinate for descent is selected.
*/
template <typename DescentPolicyType = RandomDescent>

This comment has been minimized.

@mentekid

mentekid Aug 10, 2017

Contributor

I like that you have a default policy here.

@mentekid

mentekid Aug 10, 2017

Contributor

I like that you have a default policy here.

size_t updateInterval;
//! The descent policy used to pick the coordinates for the update.
DescentPolicyType descentPolicy;

This comment has been minimized.

@mentekid

mentekid Aug 10, 2017

Contributor

I would argue that the user would be unlikely to change the descent policy during the object's lifetime - would it make sense to make it const and remove the set function? I realise it's a very minor point though.

@mentekid

mentekid Aug 10, 2017

Contributor

I would argue that the user would be unlikely to change the descent policy during the object's lifetime - would it make sense to make it const and remove the set function? I realise it's a very minor point though.

@shikharbhardwaj

This comment has been minimized.

Show comment
Hide comment
@shikharbhardwaj

shikharbhardwaj Aug 15, 2017

Contributor

I finished up the changes in SoftmaxRegressionFunction with the last commit. A minor inconsistency I noticed while working with the code in softmax and logistic regression is the way the decision variable is represented. Softmax regression has the features arranged column-wise, where as logistic regression has it row-wise (so it gives out a single column vector as the output from Gradient).

I guess it would be nice to make this uniform across all the functions (SCD could then do the update on the relevant column instead of working on the entire decision variable), also easing up the parallelisation?

Contributor

shikharbhardwaj commented Aug 15, 2017

I finished up the changes in SoftmaxRegressionFunction with the last commit. A minor inconsistency I noticed while working with the code in softmax and logistic regression is the way the decision variable is represented. Softmax regression has the features arranged column-wise, where as logistic regression has it row-wise (so it gives out a single column vector as the output from Gradient).

I guess it would be nice to make this uniform across all the functions (SCD could then do the update on the relevant column instead of working on the entire decision variable), also easing up the parallelisation?

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Aug 17, 2017

Member

I think we should see if we can work out the inconsistency, @rcurtin already looked into the logistic regression method so he can probably provide some more insights, regarding what needs to be done.

Member

zoq commented Aug 17, 2017

I think we should see if we can work out the inconsistency, @rcurtin already looked into the logistic regression method so he can probably provide some more insights, regarding what needs to be done.

@shikharbhardwaj

This comment has been minimized.

Show comment
Hide comment
@shikharbhardwaj

shikharbhardwaj Aug 18, 2017

Contributor

I am working on the changes in Logistic regression (using a rowvec for the decision variable and obtaining the submatrix views with tail_cols). I am done with the changes in the function, but they break some other tests (which assumed the gradient to be in the other shape).

Contributor

shikharbhardwaj commented Aug 18, 2017

I am working on the changes in Logistic regression (using a rowvec for the decision variable and obtaining the submatrix views with tail_cols). I am done with the changes in the function, but they break some other tests (which assumed the gradient to be in the other shape).

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Aug 18, 2017

Member

Can you refactor the test cases too, don't feel obligated.

Member

zoq commented Aug 18, 2017

Can you refactor the test cases too, don't feel obligated.

@shikharbhardwaj

This comment has been minimized.

Show comment
Hide comment
@shikharbhardwaj

shikharbhardwaj Aug 19, 2017

Contributor

Sure, I am already on it. :)

Contributor

shikharbhardwaj commented Aug 19, 2017

Sure, I am already on it. :)

@mentekid

Again only minor comments.

Thanks for normalizing the logistic regression interface as well, that was a good catch :)

@mentekid

This comment has been minimized.

Show comment
Hide comment
@mentekid

mentekid Aug 22, 2017

Contributor

This looks ready to merge on my end - AppVeyor failure seems to be a random timeout.

I am leaving it open to any comments or reviews, and I'll merge this on Friday if nothing comes up.

@shikharbhardwaj nice work - thank you and well done :)

Contributor

mentekid commented Aug 22, 2017

This looks ready to merge on my end - AppVeyor failure seems to be a random timeout.

I am leaving it open to any comments or reviews, and I'll merge this on Friday if nothing comes up.

@shikharbhardwaj nice work - thank you and well done :)

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Aug 22, 2017

Member

Let's restart the windows build, looks like the build could not fetch the repo.

Member

zoq commented Aug 22, 2017

Let's restart the windows build, looks like the build could not fetch the repo.

Show outdated Hide outdated src/mlpack/core/optimizers/scd/descent_policies/cyclic_descent.hpp
Show outdated Hide outdated src/mlpack/core/optimizers/scd/descent_policies/cyclic_descent.hpp
SCD(const double stepSize = 0.01,
const size_t maxIterations = 100000,
const double tolerance = 1e-5,
const size_t updateInterval = 1e3,

This comment has been minimized.

@zoq

zoq Aug 22, 2017

Member

Not sure, but it sounds like updateInterval is what we call batchSize in e.g. MinibatchSGD I guess if it's the same, we should think about renaming the parameter. Let me know what you think.

@zoq

zoq Aug 22, 2017

Member

Not sure, but it sounds like updateInterval is what we call batchSize in e.g. MinibatchSGD I guess if it's the same, we should think about renaming the parameter. Let me know what you think.

This comment has been minimized.

@shikharbhardwaj

shikharbhardwaj Aug 23, 2017

Contributor

updateInterval is different from batchSize. It is the number of iterations after which we print the diagnostic information to the logs. As printing requires a call to Evaluate (which may take time), we need to make sure that the diagnostic info does not slow down the iteration (each iteration in SCD is expected to be much faster than Evaluate).

@shikharbhardwaj

shikharbhardwaj Aug 23, 2017

Contributor

updateInterval is different from batchSize. It is the number of iterations after which we print the diagnostic information to the logs. As printing requires a call to Evaluate (which may take time), we need to make sure that the diagnostic info does not slow down the iteration (each iteration in SCD is expected to be much faster than Evaluate).

This comment has been minimized.

@zoq

zoq Aug 23, 2017

Member

I see, thanks for the clarification.

@zoq

zoq Aug 23, 2017

Member

I see, thanks for the clarification.

// Calculate sigmoid.
const double exponent = parameters(0, 0) + arma::dot(predictors.col(i),
parameters.col(0).subvec(1, parameters.n_elem - 1));
parameters.tail_cols(parameters.n_elem - 1).t());

This comment has been minimized.

@zoq

zoq Aug 22, 2017

Member

Nice!

@zoq

zoq Aug 22, 2017

Member

Nice!

Show outdated Hide outdated ...mlpack/methods/logistic_regression/logistic_regression_function_impl.hpp
Show outdated Hide outdated src/mlpack/tests/scd_test.cpp
@rcurtin

Looks great to me. Thank you so much for adding the tutorial on FunctionType parameters, I think it is a great improvement. These are the last comments I have; I think they are all pretty minor. From my end the PR is good to go if you can address them. Great work!

Show outdated Hide outdated doc/policies/functiontype.hpp
Show outdated Hide outdated src/mlpack/core/optimizers/scd/descent_policies/cyclic_descent.hpp
Show outdated Hide outdated doc/policies/functiontype.hpp
Show outdated Hide outdated doc/policies/functiontype.hpp
Show outdated Hide outdated src/mlpack/core/optimizers/scd/descent_policies/greedy_descent.hpp
Show outdated Hide outdated doc/policies/functiontype.hpp
Show outdated Hide outdated ...mlpack/methods/logistic_regression/logistic_regression_function_impl.hpp
Show outdated Hide outdated src/mlpack/methods/softmax_regression/softmax_regression_function.cpp
/**
* Test the greedy descent policy.
*/
BOOST_AUTO_TEST_CASE(GreedyDescentTest)

This comment has been minimized.

@rcurtin

rcurtin Aug 22, 2017

Member

It might be also useful to add a simple test for RandomDescent and CyclicDescent; very simple tests can just make sure they give the expected result. These can be helpful later, to make sure that nothing is broken by later maintenance/refactorings of the code.

@rcurtin

rcurtin Aug 22, 2017

Member

It might be also useful to add a simple test for RandomDescent and CyclicDescent; very simple tests can just make sure they give the expected result. These can be helpful later, to make sure that nothing is broken by later maintenance/refactorings of the code.

Show outdated Hide outdated src/mlpack/tests/scd_test.cpp

shikharbhardwaj added some commits Aug 23, 2017

Better formatting and minor fixes in FunctionType documentation
Add descriptive comments and inline linear algbra operations in LogisticRegressionFunction::FeatureGradient
Add references for relevant papers in SCD implementation.
Use arma::dot instead of arma::norm in LogisiticRegressionFunction::PartialGradient
@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Aug 28, 2017

Member

@shikharbhardwaj: thanks for the responses, everything looks good to me. I think there are still two comments worth addressing---SoftmaxRegressionFunction::PartialGradient() could be accelerated still, and tests could be added for RandomDescent and CyclicDescent, but those are up to you. I think it's fine to merge as-is regardless. Thanks for the hard work, I think this is a nice addition. 👍

Member

rcurtin commented Aug 28, 2017

@shikharbhardwaj: thanks for the responses, everything looks good to me. I think there are still two comments worth addressing---SoftmaxRegressionFunction::PartialGradient() could be accelerated still, and tests could be added for RandomDescent and CyclicDescent, but those are up to you. I think it's fine to merge as-is regardless. Thanks for the hard work, I think this is a nice addition. 👍

@shikharbhardwaj

This comment has been minimized.

Show comment
Hide comment
@shikharbhardwaj

shikharbhardwaj Sep 1, 2017

Contributor

Sure, I had added tests for the descent policies (CyclicDescentTest and RandomDescenTest in the SCD test file).
I had started with optimizing PartialGradient in SoftmaxRegressionFunction but got confused. I'll start again with a clear head today.
I also looked into the idea of checking the function API for consistency at compile time. I guess this could be applied to almost all optimizers, so I'll do this as a separate PR.

Contributor

shikharbhardwaj commented Sep 1, 2017

Sure, I had added tests for the descent policies (CyclicDescentTest and RandomDescenTest in the SCD test file).
I had started with optimizing PartialGradient in SoftmaxRegressionFunction but got confused. I'll start again with a clear head today.
I also looked into the idea of checking the function API for consistency at compile time. I guess this could be applied to almost all optimizers, so I'll do this as a separate PR.

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Sep 14, 2017

Member

I think this is ready to merge; @mentekid: was there anything else you were waiting on for this one?

Member

rcurtin commented Sep 14, 2017

I think this is ready to merge; @mentekid: was there anything else you were waiting on for this one?

@mentekid

This comment has been minimized.

Show comment
Hide comment
@mentekid

mentekid Sep 14, 2017

Contributor

Sorry, I was planning to review this but forgot. I haven't looked at the latest push, but if you think it is ready to go just go ahead and merge.

Sorry for the delay, @shikharbhardwaj thanks for the contribution!

Contributor

mentekid commented Sep 14, 2017

Sorry, I was planning to review this but forgot. I haven't looked at the latest push, but if you think it is ready to go just go ahead and merge.

Sorry for the delay, @shikharbhardwaj thanks for the contribution!

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Sep 14, 2017

Member

Ok, sure, I'll go ahead and merge it then. I think the latest changes we good. Thanks! :)

Member

rcurtin commented Sep 14, 2017

Ok, sure, I'll go ahead and merge it then. I think the latest changes we good. Thanks! :)

@rcurtin rcurtin merged commit 21349b3 into mlpack:master Sep 14, 2017

4 checks passed

Static Code Analysis Checks Build finished.
Details
Style Checks Build finished.
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment