New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SparsePCA implementation based on sparse dictionary learning #13127
Comments
indeed it's not the same algorithmic approach that is currently used. Yet
it maximizes explained variance
with a reduced rank under sparsity constraints on the components (not the
loadings as dict learning).
We should start by clarifying the doc + eventually add the original solver
from Zou et al. provided
it scales as well or gives better result (but better how?).
my 2c
… |
Found out it didn't do SPCA at all, also. In the meantime I'm using the R package from the paper. |
The definition in the Wikipedia article uses a constraint on the l_0 norm of the components: https://en.wikipedia.org/wiki/Sparse_PCA The Sparse PCA paper by Zou et al. uses a constraint on the l_1 norm of the components which is more in line with the l1 penalty used in the objective function of the SparsePCA class in scikit-learn. However, both the Wikipedia article and the Zou et al. paper also have an orthogonality contraint which is not enforced in scikit-learn (and does not hold in practice either). So indeed, the naming in scikit-learn is highly misleading. We should at least implement a solver that provides the user with the ability to respect the orthogonality constraint and maybe keep the current solver as an alternative solver that cannot enforce this constraint. |
Speaking offline with @GaelVaroquaux, it's debatable whether SparsePCA without orthogonality constraints can be considered natural/intuitive or not: When there is no penalty on the components, PCA with Recently the definition of Zou et al. (which is quite similar to the one used in Wikipedia but not exactly the same) seems to have won so we might still want to rename the model we implemented in scikit-learn to avoid confusion but I am not sure what to rename it too. |
Adding the Zou et al. implementation (with orthogonality constraints) in the same class is another option if people find it practically useful. @iancovert what is your use case? In particular why do you need to enforce the orthogonality of the components for your application? |
Hi Olivier. At the time I was looking for a SPCA solver that would produce a low-dimensional embedding using sparse loading vectors, and it was mainly the sparse loading vectors that I wanted. The solver from Zou et al. calculates the loadings via an alternating optimization algorithm, and that's what I personally expected to be implemented here. But you seem to be right that there are many SPCA variants out there. I had no need for the loadings to be orthogonal, which, as you've mentioned, is not guaranteed by this solver (p.272 of the paper). Since I required the loading vectors, my issue with the algorithm currently implemented here (according to this) is that it solves directly for the embedding and does not calculate loading vectors. Other users may have different requirements, I'm not sure because I'm not a frequent SPCA user. |
Description
Hi developers,
There's a sklearn model called SparsePCA, but it doesn't seem to be SPCA at all. According to documentation here, it's based on a sparse dictionary learning paper (this one). This seems like a naming mistake, and is especially odd because there's also a model called DictionaryLearning that seems to optimize a very similar sparse coding objective.
I was looking for an open-source SPCA implementation, ideally based on the original paper, and was confused to find out that the sklearn one is something else altogether. I think I'm not the only one, e.g. I found open bug #11512 where a user requests a feature based on the SPCA paper that doesn't make as much sense in the context of dictionary learning.
Are there plans to fix this inconsistency? Would there be interest from other users in having a SPCA implementation?
I ended up implementing the model myself so would be happy to help, if this seems important.
Thanks,
Ian
References
H. Zou, T. Hastie, R. Tibshirani: Sparse Principal Components Analysis, Journal of Computational and Graphical Statistics, 2006.
J. Mairal, F. Bach, J. Ponce, G. Sapiro: Online Dictionary Learning for Sparse Coding, ICML 2009.
The text was updated successfully, but these errors were encountered: