Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numba-accelerated truncated SVD fails to converge #189

Closed
juliendrapeau opened this issue Jul 5, 2023 · 2 comments
Closed

Numba-accelerated truncated SVD fails to converge #189

juliendrapeau opened this issue Jul 5, 2023 · 2 comments

Comments

@juliendrapeau
Copy link
Contributor

juliendrapeau commented Jul 5, 2023

Hi @jcmgray,

What is happening?

While adding gates to large MPSs with the gate_with_auto_swap function, I sometimes get a convergence failure error during the SVD computation of the swapping process. The problem lies in the numba-accelerated part of the svd_truncated function in the decomp module. This error is due to the LAPACK driver used for the SVD function in numba. The gesdd LAPACK routine does not guarantee convergence of the computation. This is problematic because, even if this issue does not happen a lot, it can still raise an error.

How to possibly fix it?

I was able to circumvent this issue by using the gesvd LAPACK routine provided by SciPy. A quick fix to guarantee the convergence would be to use the SciPy function when the numba function fails to converge. For example, changing the original function:

@svd_truncated.register("numpy")
@njit  # pragma: no cover
def svd_truncated_numba(
    x, cutoff=-1.0, cutoff_mode=3, max_bond=-1, absorb=0, renorm=0
):
    """Accelerated version of ``svd_truncated`` for numpy arrays."""
    U, s, VH = np.linalg.svd(x, full_matrices=False)
    return _trim_and_renorm_svd_result_numba(
        U, s, VH, cutoff, cutoff_mode, max_bond, absorb, renorm
    )

by the following functions seems to fix the problem:

@njit  # pragma: no cover
def svd_truncated_numba(
    x, cutoff=-1.0, cutoff_mode=3, max_bond=-1, absorb=0, renorm=0
):
    """Accelerated version of ``svd_truncated`` for numpy arrays."""
    U, s, VH = np.linalg.svd(x, full_matrices=False)
    return _trim_and_renorm_svd_result_numba(
        U, s, VH, cutoff, cutoff_mode, max_bond, absorb, renorm
    )


def svd_truncated_scipy(
    x, cutoff=-1.0, cutoff_mode=3, max_bond=-1, absorb=0, renorm=0
):
    """Non-accelerated version of ``svd_truncated`` for numpy arrays with guaranteed convergence by scipy."""
    U, s, VH = sp.linalg.svd(x, full_matrices=False, lapack_driver="gesvd")
    return _trim_and_renorm_svd_result_numba(
        U, s, VH, cutoff, cutoff_mode, max_bond, absorb, renorm
    )


@svd_truncated.register("numpy")
def svd_truncated_numba_scipy(
    x, cutoff=-1.0, cutoff_mode=3, max_bond=-1, absorb=0, renorm=0
):
    """Accelerated version of ``svd_truncated`` for numpy arrays with guaranteed convergence by scipy."""
    try:
        return svd_truncated_numba(x, cutoff, cutoff_mode, max_bond, absorb, renorm)
    except:
        return svd_truncated_scipy(x, cutoff, cutoff_mode, max_bond, absorb, renorm)

Is there a better way to deal with this issue?

Thank you for your help.

@jcmgray
Copy link
Owner

jcmgray commented Jul 13, 2023

Hi @juliendrapeau, yes something along those lines seems reasonable, maybe with a warning raised as well? A fallback like this happens in the core part of quimb for some functions too.

Generally I have found that if one is having problems with linear algebra routine convergence then there is usually some instability in the higher tensor network algorithm that should be addressed, e.g. large differences in norms of tensors appearing.

However, with #192 as well, I wonder if something has changed about the numba svd implementation that has begun to cause this issue. If you have time it would be helpful to see if:

  1. it appears for different version of numba going back
  2. it appears for a different backend like torch.
  3. you could save an actual matrix/tensor it occurs on.

No worries if not, its just a little hard to reproduce these things otherwise.

@jcmgray
Copy link
Owner

jcmgray commented Apr 26, 2024

I restored the fallback to scipy back in b507abc so closing for the moment, feel free to re-open if problem persists!

@jcmgray jcmgray closed this as completed Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants