Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH propagate eigen_tol to all eigen solver #11968

Closed
wants to merge 12 commits into from

Conversation

massich
Copy link
Contributor

@massich massich commented Sep 1, 2018

Reference Issues/PRs

closes #21243
closes #6489

What does this implement/fix? Explain your changes.

Propagate the eigen_tol for the 'arpack', 'amg', and 'lobpcg' solvers. By default, we are using the scipy default ('arpack'=>0 otherwise None).

Any other comments?

@massich
Copy link
Contributor Author

massich commented Sep 1, 2018

This is not how the default should be done but it gives an idea.
cc:@lobpcg, @glemaitre

@lobpcg
Copy link
Contributor

lobpcg commented Sep 3, 2018

I ran a few tests using the new code and it now seems to work as I expected. Thanks!

sklearn/manifold/spectral_embedding_.py Outdated Show resolved Hide resolved
sklearn/cluster/tests/test_spectral.py Outdated Show resolved Hide resolved
sklearn/cluster/tests/test_spectral.py Outdated Show resolved Hide resolved
@massich
Copy link
Contributor Author

massich commented Sep 18, 2018 via email

@massich massich changed the title propagate eigen_tol [MRG+1] propagate eigen_tol Sep 19, 2018
@massich
Copy link
Contributor Author

massich commented Sep 21, 2018

ping: @rth

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a note to the docstring of eigen_tol saying that for amg solver, it will be always set to larger or equal to 1e12, and document the behavior of this parameter for lobpcg. Also see comment below. Otherwise LGTM.

@@ -303,6 +305,10 @@ def spectral_embedding(adjacency, n_components=8, eigen_solver=None,
raise ValueError

elif eigen_solver == "lobpcg":

if eigen_tol == 0:
eigen_tol = 1e-15
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a disadvantage of doing, eigen_tol = max(1e-15, eigen_tol) same as for amg? I find it strange that eigen_tol is not monotonic in this case: e.g. 0 will produce 1e-15 but 1e-16 will produce 1e-16. Or are there use cases for tolerances below 1e-15? cc @lobpcg

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. "amg" solve actually is the same lobpcg, only with an extra option, so indeed, the default eigen_tol should be the same for "amg" and "lobpcg" My recommendation is to set eigen_tol=None in both cases and thus let the lobpcg code to choose its default.

  2. 1e-15 is way too small for the default eigen_tol. The default eigen_tol in the https://github.com/scipy/scipy/blob/v1.1.0/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py line 307-308 is np.sqrt(1e-15) * n where n is the size of the matrix. The clean fix may be NOT to specify ANY default eigen_tol in "spectral_embedding" but rather let lobpcg to do it, as I suggest above.

  3. I do not like in lines 136-138 of in "spectral_embedding":

def spectral_embedding(adjacency, n_components=8, eigen_solver=None,
random_state=None, eigen_tol=0.0,

Why not eigen_tol=None ? As far as I can see, eigen_tol=0.0 makes little sense, while eigen_tol=None would be passed to "lobpcg" and let it choose its default, as I propose. I do not know how "arpack" reacts to eigen_tol=None, however.

  1. Yes, there may be use cases for tolerances below 1e-15 in general, because the stopping criteria currently used in "lobpcg" is not scale invariant, i.e., finding eigenvectors of the matrix 1e-10*A with the same accuracy as of finding the eigenvectors of the matrix A would require to scale the eigen_tol by the same magnitude, 1e-10. I have not actually tested it, but this is how the code is written. The default eigen_tol in "lobpcg" which is np.sqrt(1e-15) * n implicitly assumes that the matrix A if reasonably scaled.

@amueller
Copy link
Member

amueller commented Aug 5, 2019

what's the status of this?

@glemaitre glemaitre self-assigned this Aug 12, 2019
@massich massich changed the title [MRG+1] propagate eigen_tol [MRG] propagate eigen_tol Sep 6, 2019
@massich
Copy link
Contributor Author

massich commented Sep 6, 2019

This PR has been updated taking into account the changes in #13707 that were merged in master.

The original issue in #6489 is already fixed in master probably by #13707. But the tolerance in the function call was not propagated to all solvers still.

This PR proposes to unify them. The PR changes the default to 1e-15 for all solvers that support tol except for amg+lobpcg where we kept 1e-5.

@lobpcg do you think that using 1e-15 is a good default in general? whats your opinion?

@massich
Copy link
Contributor Author

massich commented Sep 6, 2019

maybe it would be also nice to cath if eigen_tol is passed to spectral_embedding but the method does not support tol.

@lobpcg
Copy link
Contributor

lobpcg commented Sep 6, 2019

This PR has been updated taking into account the changes in #13707 that were merged in master.

The original issue in #6489 is already fixed in master probably by #13707. But the tolerance in the function call was not propagated to all solvers still.

This PR proposes to unify them. The PR changes the default to 1e-15 for all solvers that support tol except for amg+lobpcg where we kept 1e-5.

@lobpcg do you think that using 1e-15 is a good default in general? whats your opinion?

propagate eigen_tol is a good idea. There only two solvers involved: arpack and lobpcg. AMG is also lobpcg, just with an extra option. 1e-15 is a good default only for trivial unit tests. It is never a good default in practical calculations - way too small. It is difficult to come up with a good default as it needs to depend, e.g., on the size of the matrix and if the matrix is float64 or 32. So I strongly advocate delegating this to the solver, choosing the default "None"

@lobpcg
Copy link
Contributor

lobpcg commented Sep 6, 2019

I also advocate propagating max_it - the cap on the number of iterations, and introducing and propagating the convergence_flag output, True or False, showing if the tolerance requirement has been achieved.

@cmarmo
Copy link
Contributor

cmarmo commented Aug 15, 2020

Following @lobpcg comment the correspondent issue seems to be solved. I'm going to close this PR then, feel free to reopen if necessary.

@cmarmo cmarmo closed this Aug 15, 2020
@lobpcg
Copy link
Contributor

lobpcg commented Aug 15, 2020

Following @lobpcg comment the correspondent issue seems to be solved. I'm going to close this PR then, feel free to reopen if necessary.

@cmarmo My comment quoted above is unrelated to this issue, which is still unresolved to my knowledge and needs to be reopened...

@cmarmo
Copy link
Contributor

cmarmo commented Aug 15, 2020

Ok, sorry for misunderstanding.
To be clear: what is missing here is

  • set the default eigen_tol=None
  • introducing a max_iter parameter
  • introducing a ConvergenceWarning when convergence is not achieved

Am I understanding correctly?

@lobpcg
Copy link
Contributor

lobpcg commented Aug 15, 2020

Ok, sorry for misunderstanding.
To be clear: what is missing here is

  • set the default eigen_tol=None
  • introducing a max_iter parameter
  • introducing a ConvergenceWarning when convergence is not achieved

Am I understanding correctly?

Yes. And of course unit tests checking that each one actually works for all (both) solvers implemented.

@albertvillanova
Copy link
Contributor

Take

@albertvillanova
Copy link
Contributor

@glemaitre sorry, I just saw the label "help wanted", but afterwards I realized you are assignee. I guess you are already working on this...

Base automatically changed from master to main January 22, 2021 10:50
@glemaitre glemaitre changed the title [MRG] propagate eigen_tol ENH propagate eigen_tol to all eigen solver Dec 17, 2021
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some suggestions. It will also require a what's new entry, and some tests as well.

@@ -8,7 +8,7 @@ doctest_optionflags = NORMALIZE_WHITESPACE ELLIPSIS
testpaths = sklearn
addopts =
--doctest-modules
--disable-pytest-warnings
# --disable-pytest-warnings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# --disable-pytest-warnings
--disable-pytest-warnings

@@ -339,7 +355,13 @@ def spectral_embedding(
X = random_state.randn(laplacian.shape[0], n_components + 1)
X[:, 0] = dd.ravel()
X = X.astype(laplacian.dtype)
_, diffusion_map = lobpcg(laplacian, X, M=M, tol=1.0e-5, largest=False)
# Until scikit-learn minimum scipy dependency <1.4.0 we require high
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Until scikit-learn minimum scipy dependency <1.4.0 we require high
# As long as scikit-learn has minimum scipy dependency <1.4.0 we require high

eigen_tol : float, default=0.0
Stopping criterion for eigendecomposition of the Laplacian matrix
when using arpack eigen_solver.
eigen_tol : float or None, default=None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
eigen_tol : float or None, default=None
eigen_tol : float, default=None

eigen_tol : float, default=0.0
Stopping criterion for eigendecomposition of the Laplacian matrix
when ``eigen_solver='arpack'``.
eigen_tol : float or None, default=None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
eigen_tol : float or None, default=None
eigen_tol : float, default=None

eigen_tol : float, default=0.0
Stopping criterion for eigendecomposition of the Laplacian matrix
when using arpack eigen_solver.
eigen_tol : float or None, default=None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
eigen_tol : float or None, default=None
eigen_tol : float, default=None

@@ -443,6 +470,21 @@ class SpectralEmbedding(BaseEstimator):
to be installed. It can be faster on very large, sparse problems.
If None, then ``'arpack'`` is used.

eigen_tol : float or None, default=None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
eigen_tol : float or None, default=None
eigen_tol : float, default=None

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recently, we have been using "auto" to change behavior depending on another parameter.

@jeremiedbb What do you think of eigen_tol="auto"?

# Until scikit-learn minimum scipy dependency <1.4.0 we require high
# tolerance as explained in:
# https://github.com/scikit-learn/scikit-learn/pull/13707#discussion_r314028509
tol = max(1e-5, 1e-5 if eigen_tol is None else eigen_tol)
Copy link
Member

@thomasjpfan thomasjpfan Apr 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May we update the docstrings to explain this clipping behavior for eigen_tol + lobpcg.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an irl discussion with @ogrisel and @glemaitre, we don't think clipping is a good solution here. We should let the user chose the tol he wants. Instead we can document in the parameter description that it's advised to no use a too low tolerance when scipy < 1.4

@jeremiedbb
Copy link
Member

@jeremiedbb What do you think of eigen_tol="auto"?

makes sense

@Micky774
Copy link
Contributor

Is someone still actively working on this PR? @massich are you still involved with this one?

@glemaitre
Copy link
Member

@Micky774 You can go ahead. I personally know @massich and he will not be able to carry on the work here.

@cmarmo cmarmo added the Superseded PR has been replace by a newer PR label May 10, 2022
@cmarmo
Copy link
Contributor

cmarmo commented May 10, 2022

I'm closing this one as superseded by #23210.
Thanks @massich for your work: all your commits have been included in the new PR.

@cmarmo cmarmo closed this May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:manifold Superseded PR has been replace by a newer PR
Projects
No open projects
Status: Done