Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amg arpack workaround fix #14647

Merged
merged 9 commits into from Aug 29, 2019
Merged

Conversation

@amueller
Copy link
Member

amueller commented Aug 13, 2019

Fixes #10715.
See discussion here:
#10720 (comment)

Further discussion by @glemaitre and @lesteve and @sky88088 and @lobpcg here:
#10715 (comment)

This logic can probably be simplified further but this is a fix for an obvious logic bug.

Right now, in the first case of the "try", if it succeeds and eigen_solver=="amg", the result is discarded, a different (apparently less suitable, according to the heuristic) solver is used, and the laplacian was flipped.
This is clearly not the intention of the code. If the try succeeds, these results should be used.

FYI, the tolerances in the checks were way larger than any of the values.

@amueller amueller added the Bug label Aug 13, 2019
amueller added 2 commits Aug 13, 2019
@amueller

This comment has been minimized.

Copy link
Member Author

amueller commented Aug 13, 2019

Lol after debugging and fixing this and then reading #10715 it's obvious that @sky88088 diagnosed the problem very accurately in his initial post.

@lobpcg

This comment has been minimized.

Copy link

lobpcg commented Aug 13, 2019

You may also want to completely remove the sign flip in the Laplacian

Copy link
Member

jnothman left a comment

The preceding if condition also looks dodgy. It is

(eigen_solver == 'arpack' or eigen_solver != 'lobpcg' and
       (not sparse.isspmatrix(laplacian) or n_nodes < 5 * n_components))

I think this is interpreted as

(
   (eigen_solver == 'arpack' or eigen_solver != 'lobpcg')
 and
   (not sparse.isspmatrix(laplacian) or n_nodes < 5 * n_components)):

If this is the correct interpretation then the first clause can be simplified to eigen_solver != 'lobpcg'

@lobpcg

This comment has been minimized.

Copy link

lobpcg commented Aug 14, 2019

For speed and simplicity, you may want just to call a dense solver if n_nodes < 5 * n_components no matter what the other conditions are.

@glemaitre

This comment has been minimized.

Copy link
Contributor

glemaitre commented Aug 14, 2019

The conditions are really complicated to follow. Recalling and trusting my old self (probably I should not do that) #10715 (comment), we could simplify with something like:

if not sparse.isparse(laplacian) and eigen_solver == 'amg':
    # warns that we switch to arpack
    eigen_solver = 'arpack'

if eigen_solver == 'arpack':
    # solve the problem
    try:
        ...
    except RuntimeError:
        eigen_solver = 'lobcpg'

# from that point eigen_solver will be either amg or lobcpg
# check if we have enough n_nodes
if n_nodes < 5 * n_components:
    # use eigh
else:
    # check that we should precondition
    if eigen_solver == 'amg':
        # preconditionne and get M
    # call lobcpg with M if available otherwise go for default

Anyway, we should probably merge the regression test and later refactor the code to be readable.

@@ -301,7 +301,7 @@ def spectral_embedding(adjacency, n_components=8, eigen_solver=None,
if embedding.shape[0] == 1:
raise ValueError

elif eigen_solver == "lobpcg":
if eigen_solver == "lobpcg":
@amueller

This comment has been minimized.

Copy link
Member Author

amueller commented Aug 14, 2019

@lobpcg dense solver meaning arpack?

@amueller

This comment has been minimized.

Copy link
Member Author

amueller commented Aug 14, 2019

@glemaitre your code has the same issue I'm fixing here. There is no return anywhere, so "# from that point eigen_solver will be either amg or lobcpg" is not true?

@amueller

This comment has been minimized.

Copy link
Member Author

amueller commented Aug 14, 2019

Ok I also really don't understand the sign flip tbh.

@lobpcg

This comment has been minimized.

Copy link

lobpcg commented Aug 14, 2019

@lobpcg dense solver meaning arpack?

dense solver meaning scipy.linalg.eigh or numpy.linalg.eigh (unsure about the difference), also turning a sparse Laplacian into dense if needed. The point being that if n_nodes < 5 * n_components it it likely cheaper to compute all eigenvectors and drop some.

@amueller

This comment has been minimized.

Copy link
Member Author

amueller commented Aug 14, 2019

@lobpcg? Shouldn't the fallback for arpack just be the standard eigsh, not lobpcg? The benefits of shift-invert mode are not within my expertise, can you comment on the benefits of using that instead of using the standard eigsh?

@lobpcg

This comment has been minimized.

Copy link

lobpcg commented Aug 14, 2019

Ok I also really don't understand the sign flip tbh.

laplacian *= -1 in several places makes no sense whatsoever to me.
There used to be a bug in lobpcg, now fixed, for small matrices requiring this, but not any more.

@amueller

This comment has been minimized.

Copy link
Member Author

amueller commented Aug 14, 2019

@lobpcg did you see the comment above the arpack magic? That explains what's happening. I don't think it's related to lobpcg.

@amueller

This comment has been minimized.

Copy link
Member Author

amueller commented Aug 14, 2019

There's a bunch of other issues here, but I think this PR fixes one obvious bug and adds one clear regression test and we should merge it. Then we can triage the other bugs.

@amueller

This comment has been minimized.

Copy link
Member Author

amueller commented Aug 14, 2019

@glemaitre would love to increase coverage but calling lobpcg actually fails, see #13393 (comment)

@lobpcg

This comment has been minimized.

Copy link

lobpcg commented Aug 14, 2019

@lobpcg? Shouldn't the fallback for arpack just be the standard eigsh, not lobpcg? The benefits of shift-invert mode are not within my expertise, can you comment on the benefits of using that instead of using the standard eigsh?

For "small" matrices, scipy.sparse.linalg.eigsh is typically the best in all respects. For large matrices:

  1. scipy.sparse.linalg.eigsh is the most expensive, scaling cubically, but never fails.
  2. arpack shift and invert typically scales quadritically, and may fail and miss eigenvalues
  3. lobpcg without amg scales linearly (with the number of nnz in the space matrix), but the convergence may be slow. The latest version scipy/scipy#10621 fixes stability issues
  4. lobpcg with amg should scale linearly and converge very fast in most cases, but needs #13707 to be merged and also may need scipy/scipy#10621
@amueller

This comment has been minimized.

Copy link
Member Author

amueller commented Aug 14, 2019

(4 should be with amg I think ;)

@amueller

This comment has been minimized.

Copy link
Member Author

amueller commented Aug 14, 2019

@jnothman and binds stronger than or so it's parsed as

    if (eigen_solver == 'arpack' or (eigen_solver != 'lobpcg' and
       ((not sparse.isspmatrix(laplacian)) or n_nodes < 5 * n_components)):
@lobpcg

This comment has been minimized.

Copy link

lobpcg commented Aug 14, 2019

(4 should be with amg I think ;)

thanks, fixed

@lobpcg did you see the comment above the arpack magic? That explains what's happening. I don't think it's related to lobpcg.

Do you mean

            laplacian *= -1
            v0 = random_state.uniform(-1, 1, laplacian.shape[0])
            _, diffusion_map = eigsh(
                laplacian, k=n_components, sigma=1.0, which='LM',
                tol=eigen_tol, v0=v0)

It's just a bad way to call arpack shift-and-invert in this case. A good way is just to change sigma, replacing the above with something like

            v0 = random_state.uniform(-1, 1, laplacian.shape[0])
            _, diffusion_map = eigsh(
                laplacian, k=n_components, sigma=-1e-5, which='LM',
                tol=eigen_tol, v0=v0)

to find the eigenvalues closest to zero. You want sigma to be a bit negative, to avoid LU factorization failures, so that arpack never fails. Above I use a safe choice sigma=-1e-5 that should work in single precision as well (if arpack supports it). If you make sigma even more negative, it should slow down the convergence a bit.

@amueller

This comment has been minimized.

Copy link
Member Author

amueller commented Aug 16, 2019

I'd still like to see this merged before we go into rewriting the logic.

Copy link
Contributor

glemaitre left a comment

+1 for introducing the regression test first

Copy link
Member

jnothman left a comment

Please

  • add what's new
  • Open issue to finish this clean up
@amueller

This comment has been minimized.

Copy link
Member Author

amueller commented Aug 21, 2019

addes whatsnew, opened #14713

amueller added 2 commits Aug 21, 2019
# Conflicts:
#	doc/whats_new/v0.22.rst
@@ -175,7 +175,22 @@ def test_spectral_embedding_amg_solver(seed=36):
random_state=np.random.RandomState(seed))
embed_amg = se_amg.fit_transform(S)
embed_arpack = se_arpack.fit_transform(S)
assert _check_with_col_sign_flipping(embed_amg, embed_arpack, 0.05)
assert _check_with_col_sign_flipping(embed_amg, embed_arpack, 0.1e-4)

This comment has been minimized.

Copy link
@ogrisel

ogrisel Aug 29, 2019

Member
Suggested change
assert _check_with_col_sign_flipping(embed_amg, embed_arpack, 0.1e-4)
assert _check_with_col_sign_flipping(embed_amg, embed_arpack, 1e-5)
se_arpack.affinity = "precomputed"
embed_amg = se_amg.fit_transform(affinity)
embed_arpack = se_arpack.fit_transform(affinity)
assert _check_with_col_sign_flipping(embed_amg, embed_arpack, 0.1e-4)

This comment has been minimized.

Copy link
@ogrisel

ogrisel Aug 29, 2019

Member
Suggested change
assert _check_with_col_sign_flipping(embed_amg, embed_arpack, 0.1e-4)
assert _check_with_col_sign_flipping(embed_amg, embed_arpack, 1e-5)
Copy link
Member

ogrisel left a comment

LGTM. Will merge.

@ogrisel ogrisel merged commit c52b6e1 into scikit-learn:master Aug 29, 2019
16 of 17 checks passed
16 of 17 checks passed
codecov/patch 93.75% of diff hit (target 96.9%)
Details
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/project 96.9% (+<.01%) compared to 82e06af
Details
scikit-learn.scikit-learn Build #20190822.50 succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_ubuntu_atlas) Linux py35_ubuntu_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_conda_mkl_pandas) Linux pylatest_conda_mkl_pandas succeeded
Details
scikit-learn.scikit-learn (Linux32 py35_ubuntu_atlas_32bit) Linux32 py35_ubuntu_atlas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py35_pip_openblas_32bit) Windows py35_pip_openblas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py37_conda_mkl) Windows py37_conda_mkl succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda_mkl) macOS pylatest_conda_mkl succeeded
Details
ogrisel added a commit that referenced this pull request Aug 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.