Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SparsePCA implementation based on sparse dictionary learning #13127

Open
iancovert opened this issue Feb 10, 2019 · 6 comments
Open

SparsePCA implementation based on sparse dictionary learning #13127

iancovert opened this issue Feb 10, 2019 · 6 comments

Comments

@iancovert
Copy link

Description

Hi developers,

There's a sklearn model called SparsePCA, but it doesn't seem to be SPCA at all. According to documentation here, it's based on a sparse dictionary learning paper (this one). This seems like a naming mistake, and is especially odd because there's also a model called DictionaryLearning that seems to optimize a very similar sparse coding objective.

I was looking for an open-source SPCA implementation, ideally based on the original paper, and was confused to find out that the sklearn one is something else altogether. I think I'm not the only one, e.g. I found open bug #11512 where a user requests a feature based on the SPCA paper that doesn't make as much sense in the context of dictionary learning.

Are there plans to fix this inconsistency? Would there be interest from other users in having a SPCA implementation?

I ended up implementing the model myself so would be happy to help, if this seems important.

Thanks,
Ian

References

H. Zou, T. Hastie, R. Tibshirani: Sparse Principal Components Analysis, Journal of Computational and Graphical Statistics, 2006.

J. Mairal, F. Bach, J. Ponce, G. Sapiro: Online Dictionary Learning for Sparse Coding, ICML 2009.

@agramfort
Copy link
Member

agramfort commented Feb 10, 2019 via email

@vincentbillaut
Copy link

Found out it didn't do SPCA at all, also.
I think the very least would be to rename the class into something less ambiguous.

In the meantime I'm using the R package from the paper.

@ogrisel
Copy link
Member

ogrisel commented Aug 5, 2020

indeed it's not the same algorithmic approach that is currently used. Yet it maximizes explained variance with a reduced rank under sparsity constraints on the components (not the loadings as dict learning).

The definition in the Wikipedia article uses a constraint on the l_0 norm of the components:

https://en.wikipedia.org/wiki/Sparse_PCA

The Sparse PCA paper by Zou et al. uses a constraint on the l_1 norm of the components which is more in line with the l1 penalty used in the objective function of the SparsePCA class in scikit-learn.

However, both the Wikipedia article and the Zou et al. paper also have an orthogonality contraint which is not enforced in scikit-learn (and does not hold in practice either).

So indeed, the naming in scikit-learn is highly misleading. We should at least implement a solver that provides the user with the ability to respect the orthogonality constraint and maybe keep the current solver as an alternative solver that cannot enforce this constraint.

@ogrisel
Copy link
Member

ogrisel commented Aug 11, 2020

Speaking offline with @GaelVaroquaux, it's debatable whether SparsePCA without orthogonality constraints can be considered natural/intuitive or not:

When there is no penalty on the components, PCA with n_components < n_features will automatically converge to orthogonal components when optimizing for maximal explained variance (equivalent to minimizing the squared reconstruction error). This is no longer the case when we add an l1 penatly on the components to the objective function. One could therefore argue that adding an orthogonality constraint on top of the "max explained variance with l1 penalty" objective is arbitrary and according to Gael the literature was not clear about what Sparse PCA means.

Recently the definition of Zou et al. (which is quite similar to the one used in Wikipedia but not exactly the same) seems to have won so we might still want to rename the model we implemented in scikit-learn to avoid confusion but I am not sure what to rename it too.

@ogrisel
Copy link
Member

ogrisel commented Aug 11, 2020

We should start by clarifying the doc + eventually add the original solver from Zou et al. provided

Adding the Zou et al. implementation (with orthogonality constraints) in the same class is another option if people find it practically useful. @iancovert what is your use case? In particular why do you need to enforce the orthogonality of the components for your application?

@iancovert
Copy link
Author

iancovert commented Aug 19, 2020

Hi Olivier. At the time I was looking for a SPCA solver that would produce a low-dimensional embedding using sparse loading vectors, and it was mainly the sparse loading vectors that I wanted. The solver from Zou et al. calculates the loadings via an alternating optimization algorithm, and that's what I personally expected to be implemented here. But you seem to be right that there are many SPCA variants out there.

I had no need for the loadings to be orthogonal, which, as you've mentioned, is not guaranteed by this solver (p.272 of the paper).

Since I required the loading vectors, my issue with the algorithm currently implemented here (according to this) is that it solves directly for the embedding and does not calculate loading vectors. Other users may have different requirements, I'm not sure because I'm not a frequent SPCA user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants