Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify spectral clustering solver logic #14713

Open
amueller opened this issue Aug 21, 2019 · 5 comments
Open

Simplify spectral clustering solver logic #14713

amueller opened this issue Aug 21, 2019 · 5 comments

Comments

@amueller
Copy link
Member

We need to simplify the logic for selecting a solver in spectral_clustering.
See discussion here:
#10715 (comment)
#14647 (comment)
#10720 (comment)

@alexshacked
Copy link
Contributor

Hi @amueller. I would like to take a shot at this. I can see it's demanding but would still like to take a look. Is it OK, or is someone already looking at it?

@amueller
Copy link
Member Author

@alexshacked please go for it!

@alexshacked
Copy link
Contributor

Thank you, I will have a design based on #10715 (comment), #14647 (comment), #10720 (comment) in a couple of days

@alexshacked
Copy link
Contributor

alexshacked commented Oct 4, 2019

After reading the issues and PRs referenced here and also studying the spectral clustering implementation, I understand that this enhancement's goal is to refactor the logic for chosing an eigen solver in function spectral_embedding() located in manifold/spectral_embedding.py
I opened a [WIP ] PR #15136, where I show an inital suggestion for refactoring spectral_embedding(). My intention is just to present an idea for simplifing the logic that choses the solver. It felt easier to write code, then to write a design document. I hope to get feed-back that will focus me onwards.
I also intend to add more regressinon tests, but I thought to work on the tests only after the refactoring goal becomes more clear to me.

alexshacked pushed a commit to alexshacked/scikit-learn that referenced this issue Oct 4, 2019
alexshacked pushed a commit to alexshacked/scikit-learn that referenced this issue Oct 4, 2019
@alexshacked
Copy link
Contributor

alexshacked commented Oct 7, 2019

After we finish the refactoring and achieve a clear code, there are a couple of candidates for improving the algorithm. I found those by reading the issues referenced here. For now I have 2 ideas based on @lobpcg suggestions .

  1. call scipy.linalg.eigh() always if n_nodes < 5 * n_components.
    Meaning, if number of samples is smaller than the (number_of_requested_eigen_vectors x 5)
    original comment: Amg arpack workaround fix #14647 (comment)

  2. When calling arpack with shift-invert use a sigma value of -1e-5 instead of 1.0 and do not multiply the laplacian by -1, just before calling arpack
    original comment: Amg arpack workaround fix #14647 (comment)

@amueller what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants