Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to use ARPACK or randomized_svd solvers in PLSSVD #18931

Open
ogrisel opened this issue Nov 27, 2020 · 1 comment
Open

Make it possible to use ARPACK or randomized_svd solvers in PLSSVD #18931

ogrisel opened this issue Nov 27, 2020 · 1 comment

Comments

@ogrisel
Copy link
Member

ogrisel commented Nov 27, 2020

As initially discussed in #18302 (comment) it might be interesting to add an extra constructor param to PLSSVD to select ARPACK or the sklearn's randomized_svd solver instead of the default LAPACK solver (from scipy.linalg.svd).

But the ARPACK and randomized_svd are non-deterministic so we would also need to add a random_state parameter.

Careful benchmarking to evaluate the speed vs numerical or statistical accuracy trade-off should be conducted to:

  • help the user choose the value of this parameter (both in the docstring and the user guide)
  • suggest an "auto" strategy to automatically select a good solver based on the shape of the data and the n_components parameter, similar to what is done in the PCA and TruncateSVD estimators. This "auto" parameter shall become the default after the usual deprecation period.
@ogrisel
Copy link
Member Author

ogrisel commented Nov 27, 2020

This is related to #12069 that does a similar thing for KernelPCA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants