Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stochastic Optimization for PCA. #1391

Closed
wants to merge 17 commits into from
Closed

Stochastic Optimization for PCA. #1391

wants to merge 17 commits into from

Conversation

zoq
Copy link
Member

@zoq zoq commented May 11, 2018

Implementation of Stochastic PCA as described in "Stochastic Optimization for PCA and PLS", R. Arora et al.

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only two quick comments so far, then I realized to give anything more in-depth I'll have to read the paper fully. I'll try and do that in the next week. The technique looks really interesting, and I think some stochastic methods could give fast solutions to PCA that are pretty good. By any chance have you done any timing comparisons?

* For more information, see the following.
*
* @code
* @inproceedings{Musco2015,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be arora2012? I met one of the Musco brothers at NIPS last year, but I can't remember which one (probably Cameron?). A fun person to talk to. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right.

void Shuffle()
{
visitationOrder = arma::shuffle(
arma::linspace<arma::Row<size_t> >(0, data.n_cols - 1, data.n_cols));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might give a nice speedup to shuffle the entire data matrix instead of keeping a visitationOrder.

@zoq
Copy link
Member Author

zoq commented Jun 12, 2018

Agreed stochastic methods can provide faster solutions, in this case, the runtime highly depends on the number of optimization steps (MaxIterations), since there is no condition to stop early; so this has to be tailored down for each task.

Besides the paper, what I really like is the combination of the existing optimization framework with the PCA class.

Timings:

data size: 100x100 rank: 100
----------------------------
ExactSVDPolicy: 6.29743 ms
RandomizedBlockKrylovSVDPolicy: 14.3532 ms
RandomizedSVDPolicy: 12.5829 ms
StochasticAdamPolicy (Iterations 300): 33.0697 ms
StochasticAdamPolicy (Iterations 100): 16.1772 ms

data size: 1000x1000 rank: 100
----------------------------
ExactSVDPolicy: 1039.41 ms
RandomizedBlockKrylovSVDPolicy: 757.691 ms
RandomizedSVDPolicy: 143.457 ms
StochasticAdamPolicy (Iterations 300): 422.688 ms
StochasticAdamPolicy (Iterations 100): 225.264 ms

data size: 5000x5000 rank: 100
----------------------------
ExactSVDPolicy: 119041 ms
RandomizedBlockKrylovSVDPolicy: 10375.3 ms
RandomizedSVDPolicy: 3881.91 ms
StochasticAdamPolicy (Iterations 300): 6435.66 ms
StochasticAdamPolicy (Iterations 100): 2148.02 ms

data size: 4000x8000 rank: 100
----------------------------
ExactSVDPolicy: 154379 ms
RandomizedBlockKrylovSVDPolicy: 7940.06 ms
RandomizedSVDPolicy: 2792.8 ms
StochasticAdamPolicy (Iterations 300): 3306.98 ms
StochasticAdamPolicy (Iterations 100): 2605.95 ms

data size: 6000x12000 rank: 100
----------------------------
ExactSVDPolicy: 530332 ms
RandomizedBlockKrylovSVDPolicy: 14385.7 ms
RandomizedSVDPolicy: 5892.35 ms
StochasticAdamPolicy (Iterations 300): 5990.87 ms
StochasticAdamPolicy (Iterations 100): 4726.83 ms

data size: 7000x19000 rank: 100
----------------------------
ExactSVDPolicy: 839472 ms
RandomizedBlockKrylovSVDPolicy: 25140.6 ms
RandomizedSVDPolicy: 10154.6 ms
StochasticAdamPolicy (Iterations 300): 9784.8 ms
StochasticAdamPolicy (Iterations 100): 8343.91 ms

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I enjoyed reading the paper. It had the shortest abstract of any paper I have ever read.

I only have a couple minor comments, mostly about not calculating the objective function. The timings look good to me, and I think it's a nice improvement. We could add it to pca_main.cpp if you want, and probably we should modify HISTORY.md. :)

{
Log::Warn << "StepSize(): invalid value (> 0), automatically take the "
<< "negative direction." << std::endl;
optimizer.StepSize() *= -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, would it be easier to simply negate the objective function? The formulation of stochastic PCA here is a maximization, but the mlpack optimizers minimize. So if you just use the negative objective (and derive the gradient accordingly) I think that removes the need for this.

arma::mat newData = data.cols(ordering);

// If we are an alias, make sure we don't write to the original data.
math::ClearAlias(data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think that even if data is an alias, when we call data = std::move(newData) then data will no longer be an alias since it just takes the memory pointer from newData (which is not an alias). I am not 100% sure but about 95% sure on that, based on looking at the Armadillo matrix operator= for rvalue references.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, at this point I copied:

{
MatType newPredictors;
arma::Row<size_t> newResponses;
math::ShuffleData(predictors, responses, newPredictors, newResponses);
// If we are an alias, make sure we don't write to the original data.
math::ClearAlias(predictors);
math::ClearAlias(responses);
// Take ownership of the new data.
predictors = std::move(newPredictors);
responses = std::move(newResponses);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sorry for the slow response here.) I see, I guess I originally wrote this code. I think that maybe we could omit the ClearAlias() call in both places, but I suppose it's not a huge deal either way---up to you if you want to make the change.

const size_t /* begin */,
const size_t /* batchSize */)
{
return pl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about the psuedoloss here... I think you are trying to avoid the computation of the objective entirely (so maybe my comment about negative the objective makes no sense, since I wrote it before I got here!), but I'd be concerned that the optimizer might terminate early because it detects convergence.

From equation 1, I guess that the objective is just tr(U^T x x^T U). If we have some EvaluateWithGradient(), then we only have to do the extra computation of pre-multiplying U^T to the term we take in the gradient, and then taking the trace of that. So maybe it is not that much extra, but I could see that it might still make a difference.

What do you think, is there any problem computing the actual loss? I am not opposed to avoiding the objective calculation if it's not any problem.

updatePolicy.Update(iterate, Fargs...);

arma::mat R;
arma::qr_econ(iterate, R, iterate);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to check that I understand correctly, I believe that this is the renormalization step that isn't necessary every iteration. If that's the case, no need to respond, then I got it. :)

* @param iterate Parameters that minimize the function.
*/
template<typename... Targs>
void Update(arma::mat& iterate, Targs... Fargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if you want to be really picky it should be fargs (or fArgs?) not Fargs, but I dunno if you want to handle template pack parameters differently. :)

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good to me, just one confused comment. If you can clarify it would be great.

arma::mat newData = data.cols(ordering);

// If we are an alias, make sure we don't write to the original data.
math::ClearAlias(data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sorry for the slow response here.) I see, I guess I originally wrote this code. I think that maybe we could omit the ClearAlias() call in both places, but I suppose it's not a huge deal either way---up to you if you want to make the change.

template<typename... Targs>
void Update(arma::mat& iterate, Targs... fArgs)
{
iterate *= -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Oops, I put this in the wrong place originally.)

I guess, I am still just confused about this bit---shouldn't this be unnecessary if we simply negate the results of Evaluate() and EvaluateWithGradient() and Gradient()? Sorry if I am overlooking something---it's possible I've misunderstood. It just seems like extra computation here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For PCA/PLS we take a (postive) step in the stochastic gradient direction; eq. 3, maybe I missed something that reverts:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry that I never responded to this. I follow now, sorry for the confusion.

Copy link

@mlpack-bot mlpack-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second approval provided automatically after 24 hours. 👍

Copy link

@mlpack-bot mlpack-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second approval provided automatically after 24 hours. 👍

@rcurtin
Copy link
Member

rcurtin commented Jan 31, 2019

I think this needs transition to ensmallen before merge, but should be otherwise good to go. 👍

@zoq
Copy link
Member Author

zoq commented Mar 5, 2019

Agreed, let's see if I can do this on the next days.

@zoq
Copy link
Member Author

zoq commented Apr 14, 2019

@mlpack-jenkins test this please

2 similar comments
@zoq
Copy link
Member Author

zoq commented Apr 14, 2019

@mlpack-jenkins test this please

@zoq
Copy link
Member Author

zoq commented Apr 14, 2019

@mlpack-jenkins test this please

@mlpack-bot
Copy link

mlpack-bot bot commented Jul 19, 2019

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

@mlpack-bot mlpack-bot bot added the s: stale label Jul 19, 2019
@rcurtin
Copy link
Member

rcurtin commented Jul 19, 2019

@zoq what are you thinking for this one? I do think it would be nice to get merged, but not sure how much time you have to work on it.

@mlpack-bot mlpack-bot bot removed the s: stale label Jul 19, 2019
@mlpack-bot
Copy link

mlpack-bot bot commented Aug 18, 2019

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

@mlpack-bot mlpack-bot bot added the s: stale label Aug 18, 2019
@mlpack-bot mlpack-bot bot closed this Aug 25, 2019
@zoq zoq reopened this Aug 25, 2019
@mlpack-bot mlpack-bot bot removed the s: stale label Aug 25, 2019
@zoq
Copy link
Member Author

zoq commented Aug 25, 2019

Let's see if I can do the necessary changes in the next days.

@mlpack-bot
Copy link

mlpack-bot bot commented Sep 30, 2019

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

@zoq
Copy link
Member Author

zoq commented Sep 30, 2019

This is ready, so let's keep this open.

@birm
Copy link
Member

birm commented Sep 27, 2020

Bumping this; is this still ready?

@rcurtin
Copy link
Member

rcurtin commented Oct 24, 2020

Hey @zoq, this one got approved a long time ago but not merged---is everything still ready? If so, if you want to merge master in we can go ahead and merge it in. 👍

@shrit
Copy link
Member

shrit commented Dec 11, 2020

Agreed this is a cool feature to merge before releasing mlpack 4, @zoq great work 💯

@shrit shrit added this to the mlpack 4.0.0 milestone Dec 18, 2020
@shrit
Copy link
Member

shrit commented Dec 18, 2020

I have marked this for mlpack 4. If you got a chance to merge it before that would be great 👍

@zoq zoq added this to mlpack 4.0.0 in Roadmap Jan 1, 2021
@mlpack-bot mlpack-bot bot closed this Sep 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Roadmap
mlpack 4.0.0
Development

Successfully merging this pull request may close these issues.

None yet

5 participants