-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stochastic Optimization for PCA. #1391
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only two quick comments so far, then I realized to give anything more in-depth I'll have to read the paper fully. I'll try and do that in the next week. The technique looks really interesting, and I think some stochastic methods could give fast solutions to PCA that are pretty good. By any chance have you done any timing comparisons?
* For more information, see the following. | ||
* | ||
* @code | ||
* @inproceedings{Musco2015, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be arora2012
? I met one of the Musco brothers at NIPS last year, but I can't remember which one (probably Cameron?). A fun person to talk to. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are absolutely right.
void Shuffle() | ||
{ | ||
visitationOrder = arma::shuffle( | ||
arma::linspace<arma::Row<size_t> >(0, data.n_cols - 1, data.n_cols)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might give a nice speedup to shuffle the entire data matrix instead of keeping a visitationOrder
.
Agreed stochastic methods can provide faster solutions, in this case, the runtime highly depends on the number of optimization steps (MaxIterations), since there is no condition to stop early; so this has to be tailored down for each task. Besides the paper, what I really like is the combination of the existing optimization framework with the PCA class. Timings:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, I enjoyed reading the paper. It had the shortest abstract of any paper I have ever read.
I only have a couple minor comments, mostly about not calculating the objective function. The timings look good to me, and I think it's a nice improvement. We could add it to pca_main.cpp
if you want, and probably we should modify HISTORY.md
. :)
{ | ||
Log::Warn << "StepSize(): invalid value (> 0), automatically take the " | ||
<< "negative direction." << std::endl; | ||
optimizer.StepSize() *= -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, would it be easier to simply negate the objective function? The formulation of stochastic PCA here is a maximization, but the mlpack optimizers minimize. So if you just use the negative objective (and derive the gradient accordingly) I think that removes the need for this.
arma::mat newData = data.cols(ordering); | ||
|
||
// If we are an alias, make sure we don't write to the original data. | ||
math::ClearAlias(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think that even if data
is an alias, when we call data = std::move(newData)
then data
will no longer be an alias since it just takes the memory pointer from newData
(which is not an alias). I am not 100% sure but about 95% sure on that, based on looking at the Armadillo matrix operator=
for rvalue references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, at this point I copied:
mlpack/src/mlpack/methods/logistic_regression/logistic_regression_function_impl.hpp
Lines 68 to 80 in e3fe135
{ | |
MatType newPredictors; | |
arma::Row<size_t> newResponses; | |
math::ShuffleData(predictors, responses, newPredictors, newResponses); | |
// If we are an alias, make sure we don't write to the original data. | |
math::ClearAlias(predictors); | |
math::ClearAlias(responses); | |
// Take ownership of the new data. | |
predictors = std::move(newPredictors); | |
responses = std::move(newResponses); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Sorry for the slow response here.) I see, I guess I originally wrote this code. I think that maybe we could omit the ClearAlias()
call in both places, but I suppose it's not a huge deal either way---up to you if you want to make the change.
const size_t /* begin */, | ||
const size_t /* batchSize */) | ||
{ | ||
return pl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused about the psuedoloss here... I think you are trying to avoid the computation of the objective entirely (so maybe my comment about negative the objective makes no sense, since I wrote it before I got here!), but I'd be concerned that the optimizer might terminate early because it detects convergence.
From equation 1, I guess that the objective is just tr(U^T x x^T U)
. If we have some EvaluateWithGradient()
, then we only have to do the extra computation of pre-multiplying U^T
to the term we take in the gradient, and then taking the trace of that. So maybe it is not that much extra, but I could see that it might still make a difference.
What do you think, is there any problem computing the actual loss? I am not opposed to avoiding the objective calculation if it's not any problem.
updatePolicy.Update(iterate, Fargs...); | ||
|
||
arma::mat R; | ||
arma::qr_econ(iterate, R, iterate); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to check that I understand correctly, I believe that this is the renormalization step that isn't necessary every iteration. If that's the case, no need to respond, then I got it. :)
* @param iterate Parameters that minimize the function. | ||
*/ | ||
template<typename... Targs> | ||
void Update(arma::mat& iterate, Targs... Fargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess if you want to be really picky it should be fargs
(or fArgs
?) not Fargs
, but I dunno if you want to handle template pack parameters differently. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good to me, just one confused comment. If you can clarify it would be great.
arma::mat newData = data.cols(ordering); | ||
|
||
// If we are an alias, make sure we don't write to the original data. | ||
math::ClearAlias(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Sorry for the slow response here.) I see, I guess I originally wrote this code. I think that maybe we could omit the ClearAlias()
call in both places, but I suppose it's not a huge deal either way---up to you if you want to make the change.
template<typename... Targs> | ||
void Update(arma::mat& iterate, Targs... fArgs) | ||
{ | ||
iterate *= -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Oops, I put this in the wrong place originally.)
I guess, I am still just confused about this bit---shouldn't this be unnecessary if we simply negate the results of Evaluate() and EvaluateWithGradient() and Gradient()? Sorry if I am overlooking something---it's possible I've misunderstood. It just seems like extra computation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For PCA/PLS we take a (postive) step in the stochastic gradient direction; eq. 3, maybe I missed something that reverts:
iterate -= stepSize * gradient; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry that I never responded to this. I follow now, sorry for the confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second approval provided automatically after 24 hours. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second approval provided automatically after 24 hours. 👍
I think this needs transition to ensmallen before merge, but should be otherwise good to go. 👍 |
Agreed, let's see if I can do this on the next days. |
@mlpack-jenkins test this please |
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍 |
@zoq what are you thinking for this one? I do think it would be nice to get merged, but not sure how much time you have to work on it. |
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍 |
Let's see if I can do the necessary changes in the next days. |
…sis to the HISTORY.
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍 |
This is ready, so let's keep this open. |
Bumping this; is this still ready? |
Hey @zoq, this one got approved a long time ago but not merged---is everything still ready? If so, if you want to merge master in we can go ahead and merge it in. 👍 |
Agreed this is a cool feature to merge before releasing mlpack 4, @zoq great work 💯 |
I have marked this for mlpack 4. If you got a chance to merge it before that would be great 👍 |
Implementation of Stochastic PCA as described in "Stochastic Optimization for PCA and PLS", R. Arora et al.