Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: linalg.solve: detect and exploit matrix structure #12824

Merged
merged 19 commits into from
Aug 10, 2024

Conversation

ilayn
Copy link
Member

@ilayn ilayn commented Sep 6, 2020

summary

The solve function is basically one of the most used in the linalg module. Its counterparts in other packages and languages mostly enjoy varying degrees of automation via the so-called polyalgorithms. These basically try the simplest cases (is it diagonal, bidiagonal, tridiagonal, triangular ... so on) such that simple problems are handled much faster than the less structured problems.

The most well-known version of this is from matlab syntax A\b hence the name backslash. Julia, Octave, and many other languages have an attempt of this. And through mathworks the following logic is generally followed

image

scope

There are basically lots of structure to be tested and I am aiming to address most of them (enumerated below). The main mechanism of the algorithm is to discover structure as quick possible and branch off to the relevant solver. This is made possible via find_array_structure function written in Cython. If the entrywise checking didn't turn out anything we also check for symmetricity/hermitianness and if indeed there is structure we also test for positive/negative definiteness. If all fail then we fall back to the generic no structure solution.

With this PR linalg.solve will be able to automatically distinguish, solve and check for singularity/ill-condition cases for (marked are done)

  • generic
  • diagonal
  • bidiagonal (lower, upper)
  • triangular (lower, upper)
  • tridiagonal
  • hessenberg (lower, upper)
  • symmetric/Hermitian
  • pos./neg definite

permuted triangular arrays are out-of-scope on purpose because there is no shuffled-triangular solvers available and reshuffling the arrays back triangular, unfortunately eats up all the benefits of this polyalgorithm and destroys the memory layout. Hence there is no point in that specialization for now.

performance

Main performance gains are through the custom solvers however they are not the only tie-breakers for small n when it comes to the complete performance measurements. Arguably the next most important detail is that now a array can survive without any copies if dtype is suitable to LAPACK and contiguous since C/F layout is meticulously tracked and their transposes are used whenever possible. This also made it possible to have more possibilities of overwriting the original array. Due to the custom check_finite function the slowdowns of _asarray_validated passes are sped up too. Finally the get_lapack_funcs bottleneck is removed via completely separate (s-d-c-z) type-paths.

I will add the relevant plots here. But as a teaser, it is always faster than current version of scipy.linalg.solve for generic case even though doing lots of extra work. It can go much faster but losing most of the pace in the array regularization at the intake. Compared to numpy.linalg.solve I don't know why exactly but up-to about n=10 NumPy version is very fast with no checks whatsoever. Then the proposed version takes over both SciPy and NumPy versions.

Diagonal case

image

There is no point in comparing to the current SciPy master since NumPy pretty much smokes it in every case. In the fat B array case on the left, we are only paying the price of copying B into fortran contiguous array. On the right the columns of B is set to 10 and N varied. As you can see in both cases, we are pretty much an order of magnitude faster for large arrays. On very small arrays N=[3, 10] we don't pay any price even though we perform many more checks.

Bi/Tridiagonal case

image

Similarly, bidiagonal/tridiagonal difference is the same complexity difference hence still no surprise. In fact checking the condition number takes the same amount of time as the solver step (rcond part is not optimized yet). SciPy and NumPy has no special solver for these cases.

Triangular case

For triangular cases, already much faster than the generic solvers in both SciPy and NumPy, and slightly faster than linalg.solve_triangular. As n gets larger the difference grows.

More to come here....

early feedback points needed

  • Cython code quality: My Cython skills and typing knowledge is mediocre and some places I have been basically blindly imitating. I would be grateful if as many people as possible take a look for anti-patterns or flat-out stupidity.
  • Cython templating possibilities; this is something actually I don't care too much. Currently, readability is harder but it is significantly easy to code with rolled out copies. Also makes the compilation bug hunting easier. --> Switched to tempita to stay sane since fused types are too experimental in Cython to be robustly used. In many simple cases I hit compiler crashes
  • ILP64 details: This is probably going to cost me a lot of rewrite but it is an accepted cost. I just wanted to see if this project can even fly. Then I would add the details on top.
  • Should we incorporate the rectangular case here too as matlab and others do? In other words instead of calling solve and lstsq depending on the matrix size internally this can be decided and branched off.
  • More items will be populated here.

backward incompatible details

  • The function no-longer checks whether the inputs are sparse arrays or not. This had the side effect of importing scipy.sparse through _lib._util._asarray_validated for no reason. Instead it will be discouraged through the documentation.

  • If everybody is OK with it, I would like to finally zap the disfunctional debug keyword (maybe sym_pos too? ).

TO DO

  • megatons of new tests while keeping the existing ones green.
  • announce in the mailing list
  • Finish the solvers
  • Finish the structure discovery
  • Test ill-conditioning behavior before/after
  • Get the dreaded transposed keyword handling working properly for complex arrays when array has C-layout
  • other tasks...

I would appreciate all the help and all kinds of feedback for this overhaul.

Code for generating various structured a, b pairs

import numpy as np
import scipy.linalg as la


def get_test_matrices(case, n, m=1, a_dtype=float, b_dtype=None):
    """Generate test matrices.

    Parameters
    ----------
    case : str
        which case to be tested;
        'diag', 'ubidiag', 'lbidiag', 'tridiag', 'uptri', 'lotri', 'uhess',
        'lhess', 'sym', 'her', 'pos'
    n : int
        number of rows of the problem
    m : int
        RHS b array number of columns

    Returns
    -------
    a : ndarray
        2D LHS nxn array with the given property defined by case
    b : ndarray
        2D RHS nxm array with random data

    """
    if b_dtype is None:
        b_dtype = a_dtype
    if case == 'diag':
        a = np.diag(np.random.rand(n)).astype(a_dtype)
    elif case == 'ubidiag':
        a = np.triu(np.tril(np.random.rand(n, n), 1)).astype(a_dtype)
        a += n/1.9 * np.eye(n, dtype=a_dtype)
    elif case == 'lbidiag':
        a = np.tril(np.triu(np.random.rand(n, n), -1)).astype(a_dtype)
        a += n/1.9 * np.eye(n, dtype=a_dtype)
    elif case == 'tridiag':
        a = np.tril(np.triu(np.random.rand(n, n), -1), 1).astype(a_dtype)
    elif case == 'uptri':
        a = np.triu(np.random.rand(n, n)).astype(a_dtype)
    elif case == 'lotri':
        a = np.tril(np.random.rand(n, n)).astype(a_dtype)
    elif case == 'uhess':
        a = np.triu(np.random.rand(n, n), -1).astype(a_dtype)
    elif case == 'lhess':
        a = np.tril(np.random.rand(n, n), 1).astype(a_dtype)
    elif case == 'sym':
        a = np.random.rand(n, n).astype(a_dtype)
        a = a + a.T
        if np.iscomplexobj(a):
            a += (1j*np.random.rand(n, n)).astype(a_dtype)
            a = a + a.T
    elif case == 'her':
        a = np.random.rand(n, n).astype(a_dtype)
        a = a + a.T
        if np.iscomplexobj(a):
            a += (1j*np.random.rand(n, n)).astype(a_dtype)
            a = a + a.T.conj()
    elif case == 'pos':
        a = np.random.rand(n, n).astype(a_dtype)
        a = a + a.T
        if np.iscomplexobj(a):
            a += (1j*np.random.rand(n, n)).astype(a_dtype)
            a = a + a.T.conj()
        a += n/1.9 * np.eye(n, dtype=a_dtype)
    elif case == 'gen':
        a = (np.random.rand(n, n).astype(a_dtype) +
             (n/1.9)*np.eye(n, dtype=a_dtype))

    b = np.random.rand(n, m).astype(b_dtype)

    return a, b

@ilayn ilayn added enhancement A new feature or improvement scipy.linalg labels Sep 6, 2020
@ilayn
Copy link
Member Author

ilayn commented Sep 6, 2020

I don't get which part of [skip ci] do these webhooks not understand?

@mdhaber
Copy link
Contributor

mdhaber commented Sep 7, 2020

Should we incorporate the rectangular case here too as matlab and others do?

Yes. I think so.

C/F layout is meticulously tracked and their transposes are used whenever possible

This means that the issue like gh-10722 will not occur, right? (I guess this PR won't solve that issue because it won't change solve_triangular, but it will essentially make solve_triangular obsolete, right?)

The function no-longer checks whether the inputs are sparse arrays or not. This had the side effect of importing scipy.sparse

By "sparse arrays" do you mean whether it's a sparse matrix object, a np.array that is relatively sparse, or actually a sparse nd array? Can't this be checked some other way that doesn't import scipy.sparse? Seems like it would be useful to at least raise an appropriate NotImplementedError (and maybe eventually support sparse array solves). It's transparent to the user in Matlab backslash, right?

@ilayn
Copy link
Member Author

ilayn commented Sep 7, 2020

This means that the issue like gh-10722 will not occur, right? (I guess this PR won't solve that issue because it won't change solve_triangular, but it will essentially make solve_triangular obsolete, right?)

Indeed, that's the idea. Maybe not obsolete but a bit laborious.

By "sparse arrays" do you mean whether it's a sparse matrix object, a np.array that is relatively sparse, or actually a sparse nd array? Can't this be checked some other way that doesn't import scipy.sparse?

I mean in this sense:

https://github.com/scipy/scipy/blob/master/scipy/_lib/_util.py#L252-L258

If there are easier ways to check sparseness I can put it back in. In matlab it switches internally but all objects are in the same namespace so checking it doesn't cost a lot of code.

@mdhaber
Copy link
Contributor

mdhaber commented Sep 8, 2020

If there are easier ways to check sparseness I can put it back in.

Not sure if it is fast or clean enough but how about 'scipy.sparse.base.spmatrix' in repr(A.__class__.__mro__), where A is the object in question? I don't like to use strings to test types either, but thought that could be a way around importing scipy.sparse.

Would from scipy.sparse.base import spmatrix and isinstance(A, spmatrix) get around the problem? I thought that might get around loading all of scipy.sparse.

Could look for (or probably faster to just try to use in a try-except) a method or field that spmatrix has but array doesn't like nnz or getformat. Or we could add a private attribute to spmatrix that is even less unlikely to exist for any other type and check for it, but I guess I'm really stretching now.

If these don't work, I understand. I know it needs to be really efficient.

@rgommers
Copy link
Member

rgommers commented Sep 8, 2020

This looks really good already! Cython looks pretty idiomatic to me.

I don't get which part of [skip ci] do these webhooks not understand?

Long-standing issue, only TravisCI has always understood [skip ci]. CircleCI didn't seem to have run here, so maybe it finally started to understand it as well.

transposed: bool, optional
If True, ``a^T x = b`` for real matrices, raises `NotImplementedError`
for complex matrices (only for True).
sym_pos : bool, optional
Copy link
Contributor

@mdhaber mdhaber Sep 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, deprecating this sounds good. I know it's been discussed for a while.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need @pv 's blessing for that as he didn't like the idea of dropping it previously.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be easier to create a new function?
I know that solve it the obvious name, and it's a shame that it would not be available. But maybe create a new function (backslash?) for now, deprecate the things we don't like about solve, and after a few versions make solve and the new function synonymous?
Maybe it's a crazy idea, but I've been tiptoeing around some unfortunate parts of the interface of svds recently and thinking how nice it would be to just start fresh.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know exactly how you feel and I feel the same about many linalg parts but I think these are pretty central to usage of SciPy. That is to say, there are too many users importing scipy just for linalg and I don't know how to handle them.

Every time I sit down for some work on this, I get lost in a backwards compatible API design mess and I rage quit which happens every weekend which is the only actual coding time I can spare recently.

If I find a way to regularize the input then, say, I can't handle transposed keyword. If I do that then Cython doesn't like certain inputs, if we fix that then we lose too much time checking too many things then it becomes pointless to implement the whole thing etc. So I think a few more weekends and there will be some progress on this.

Copy link
Contributor

@mdhaber mdhaber Jul 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are pretty central to usage of SciPy. That is to say, there are too many users importing scipy just for linalg and I don't know how to handle them.

Sure. So maybe forget what I said about deprecating the bad parts of solve; just consider the part about making this a new function so you don't have to rage quit anymore. If the name were backslash, it would go at the very top of the linalg documentation:
image

It would also be good if inv were not the first thing to appear there...

Users who want to keep using solve the same old way can. Why should they expect to get the benefits of the new features you're adding? If they want the new features, they can put in a little work to call a different function. I don't think you should be required to do all that work for everyone, going crazy in the process.

After all, this function is going to do more than strictly "solve" a linear system, since it would also handle the non-square case, right?

What do you think @mckib2 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not too counterintuitive to call this backslash? Wouldn't something like ldivide be clearer? (That might actually be the MATLAB name).

Sorry, I can only contribute to the colour of the bikeshed :)

Copy link
Contributor

@mdhaber mdhaber Jul 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be for non-matlab users. I didn't think other people would agree with it, but I couldn't stop myself from suggesting it : )

lower : bool, optional
If True, only the data contained in the lower triangle of `a`. Default
is to use upper triangle. (ignored for ``'gen'``)
assume_a : str, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current version of the code, this isn't used any more, right? Do you plan on bringing it back and skipping to the appropriate solver if it's supplied? (Maybe the default would be None?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be used to bypass the structure checks if you already know what you have but currently I don't have names or abbreviations for them so it's left for later when the code stabilizes.

@rejones7
Copy link

rejones7 commented Nov 2, 2020

Ilhan... If the diagram above is complete, and for non-square systems just QR is called, then this is probably the place to call my fast ill-conditioned test, and if it says the system is ill-conditioned (which is very likely correct) then call arls() (or aregls, or whatever the final name is. If you are interested I will promote my latest version of it. I have not been promoting incremental versions so as not to be a nuisance.
In contrast, linalg.lstsq seems like a complicated place to put it.... but that is based on brief examination of the code.
Ron Jones

@ilayn
Copy link
Member Author

ilayn commented Nov 2, 2020

Ah yes. I didn't mean to ignore I am having very little time to come back to this or any other algo-heavy stuff. And when I do, I am mostly fighting with Cython which turned out to be very laborious to develop anything with it (i have basically hit many reported unsolved compiler-crash cases in Cython repo). But finally got some progress. I can't make any promises but if I can stabilize the square case then yes let's try to sneak that in.

The API will be a hot mess for a few versions to accomodate the old and new behavior simultaneously for backwards compatibility though. So we have to think very hard for the least disruption in any case.

cdef int r, c
cdef lapack_t entry
if lapack_t in lapack_cz_t:
for r in xrange(n):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried googling a little and couldn't quite find the difference between range and xrange in Cython, but it appears that they favor range (for example)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is a confusing one search for the phrase "And in fact, just changing range to xrange" in this link

https://cython.readthedocs.io/en/latest/src/tutorial/profiling_tutorial.html

I am slowly losing track of v3 changes and best practices in cython docs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I think what is going on here is that xrange deduces the index type? The Cython code snippets below that quote revert to cdefing the index variable and using range

for finite arrays and faster if array has inf/nan due to
early exit capability.
"""
cdef int n = <int>A.shape[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is assumed size_t in other places, would prevent overflow if matrices are really large or LinearOperator used

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I'll fix that.

cdef size_t n = A.shape[0]
cdef int r, c
cdef lapack_t x, y
cdef bint symmetric = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other places in this code bint is assigned using boolean literals (True/False), here they are ints. I think True/`False makes it slightly more readable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still far from a consistent code because every part is written on different times hence these will be a lot until the API stabilizes but I totally agree.

if lapack_t in lapack_sd_t:
for r in xrange(n):
for c in xrange(n):
if r == c:
Copy link
Contributor

@mckib2 mckib2 Jul 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can refactor this as

for r in range(n):
    for c in range(n):
        if A[r, c] != A[c, r]:
            return False, False

This avoids a nested if statement in a loop and the same or less than total comparisons required (though they might be more challenging comparisons granted, but I usually find that avoiding bad branch predictions is worth it)

# look-up once
x, y = A[r, c], A[c, r]
# once caught no need to keep checking continue for sym
if hermitian:
Copy link
Contributor

@mckib2 mckib2 Jul 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd have to do some benchmarking to see if this really helps, but you can avoid some if predictive branching by doing the following (at the cost of readability of course)

for r in range(n):
    for c in range(n):
        x, y = A[r, c], A[c, r]
        symmetric |= (x != y)
        hermitian |= (x != y.conjugate())
        if not (symmetric or hermition):
            return False, False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless symmetric != hermitian, I guess I assumed for real-valued matrices they would be symmetric and hermitian, but hermitian is usually only a property of complex-valued matrices

cdef char* trans = 'T' if a_is_c else 'N'
cdef char* norm = '1' if a_is_c else 'I'
cdef int info, res
cdef int n = <int>a.shape[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possible int -> size_t

@mdhaber mdhaber marked this pull request as ready for review August 6, 2024 00:08
@mdhaber mdhaber requested a review from larsoner as a code owner August 6, 2024 00:08
@ilayn
Copy link
Member Author

ilayn commented Aug 6, 2024

For positiveness/negativeness it starts with

  • Check if the array is hermitian (complex) or symmetric (reals) then if it is only symmetric (complex case not hermitian but symmetric), send it to zsycon and zsysv.
  • If it is hermitian check the main diagonal for negative entries. If all positive send it to Cholesky potrf
  • If Cholesky passes then it is positive definite -> Send it pocon and then to the Cholesky solver. otherwise send it to hermitian solver sycon/hecon and and the corresponding solver.

For hessenberg and bidiagonals I have to refresh my memory. I need to implement the hessenberg solver from https://dl.acm.org/doi/abs/10.5555/866641 to make ..con, ..sv in one go. For bidiags, it is similar story.

@mdhaber
Copy link
Contributor

mdhaber commented Aug 6, 2024

Ok so for symmetric matrices, try Cholesky if it passes some necessary conditions, if not, fall back to LDL.
Yeah I saw that Julia used that Hessenberg solver. That does not look like low-hanging fruit : )
And I guess bidiagonal would be nice in the long run, but using tridiagonal gets you most of thr way there. Does it really matter after the bandwidth check, which is more expensive?

@ilayn
Copy link
Member Author

ilayn commented Aug 6, 2024

try Cholesky if it passes some necessary conditions

No conditions, if info = 0 in portf it is pos def. That's surprisingly the cheapest check for positive definiteness (as far as I know).

Does it really matter after the bandwidth check, which is more expensive?

You mean it is already tridiagonal?

@mdhaber
Copy link
Contributor

mdhaber commented Aug 6, 2024

No conditions

Right. No sufficient conditions (other than trying Cholesky). But having all positive diagonal entries is a necessary condition, right? (And being Hermitian if complex.)

@ilayn
Copy link
Member Author

ilayn commented Aug 6, 2024

Yes positive diagonals is a necessary condition for pos def. You can also check if all negative and compute the Cholesky of the negative etc. but that's probably for later. The diagram at the top describes the workflow or in matlab docs

@mdhaber
Copy link
Contributor

mdhaber commented Aug 6, 2024

You mean it is already tridiagonal?

I just mean that if we're accepting a full matrix rather than just the diagonals, there are few circumstances under which using a bidiagonal solver rather than a tridiagonal solver will make a significant difference in overall computation time. So personally, writing custom bidiagonal routines would be lower on my priority list than some other things. Just thinking out loud. : )

@mdhaber
Copy link
Contributor

mdhaber commented Aug 6, 2024

Anyway, LMK if there is anything else you wanted here. This is the scope of what I had in mind for a first PR. In follow-ups, we can try Cholesky and fall back to LDL, add the missing reciprocal condition number checks, and then see how much compiling helps?

Last question for now, I think: I would have expected the different slopes of triangular vs general solves to become apparent toward the right of some of the graphs, like the top right one in #12824 (comment). Is the limited benefit of using triangular solve not surprising? (Is LAPACK LU just unreasonably fast?)

scipy/linalg/_basic.py Show resolved Hide resolved
scipy/linalg/_basic.py Outdated Show resolved Hide resolved
scipy/linalg/_basic.py Outdated Show resolved Hide resolved
return max(a, b, c)


def _bandwidth(a):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need these excepts if it is not supported it will not be possible to do linear algebra on them anyways. So either we need to quit or convert things at the entrance to np.float32, np.float64, np.complex64, and np.complex128. Otherwise it will be losing time for no apparent reason because it will be converted later anyways either by us or by f2py.

Copy link
Contributor

@mdhaber mdhaber Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K. At the beginning of solve I will coerce to one of those types. (Why is that not already done? We just always rely on that to happen when the LAPACK routine is called? Isn't there at least a private function for this?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous state of this PR there was a casting table and I was doing it with a casting table etc. but probably now overwritten.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K. Just seems fundamental if all of linalg is based on LAPACK which accepts only the four flavors. Seems there is no harm in taking care of the conversion early if LAPACK will be used rather than run the risk of letting multiple calls to wrappers do the same conversions repeatedly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is a major pain to use BLAS and LAPACK to massage things until functions accept the input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully the approach I took looks OK. It seems to work for typical dtypes.

# Triangular case
elif assume_a in {'lower triangular', 'upper triangular'}:
lower = assume_a == 'lower triangular'
x = _solve_triangular(a1, b1, lower=lower, overwrite_b=overwrite_b,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to do the same explicit lapack calls but I can add those later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean _solve_triangular needs the condition number check?
If so, yeah, after the tridiagonal condition number PR, I planned to add the triangular version, then use both of them here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I made gh-21328 and gh-21331 to add the relevant routines, then it should be easy to incorporate them here.

@ilayn
Copy link
Member Author

ilayn commented Aug 6, 2024

I just mean that if we're accepting a full matrix rather than just the diagonals, there are few circumstances under which using a bidiagonal solver rather than a tridiagonal solver will make a significant difference in overall computation time.

I agree. The main use for polyalgorithm is to work automagically regardless whether the user knows it or not. So if they pass a bidiagonal array the machine should recognize it. So we are increasing the structures that the solver can identify. Not necessarily providing separate solvers for each cases. The more the better.

Is the limited benefit of using triangular solve not surprising? (Is LAPACK LU just unreasonably fast?)

The main slow down is to call ??con and then calling ??sv algorithms. That is solving the problem twice. What has to happen is first hit ??trf get its output to ??con and then send it to ??trs algorithms. ??sv is a higher level front end to this operation without doing ??con hence you are getting closer to generic solver speeds. That's probably what is being done inside solve_triangular.

In follow-ups, we can try Cholesky and fall back to LDL, add the missing reciprocal condition number checks, and then see how much compiling helps?

Sure this is already good progress.

@mdhaber
Copy link
Contributor

mdhaber commented Aug 6, 2024

So we are increasing the structures that the solver can identify. Not necessarily providing separate solvers for each cases.

Ok. Just to be explicit, this already identifies bidiagonal matrices as tridiagonal and calls the tridiagonal routines for them.

@mdhaber
Copy link
Contributor

mdhaber commented Aug 6, 2024

The main slow down is to call ??con and then calling ??sv algorithms. That is solving the problem twice.

But ??con is not currently being called at all for triangular matrices. _solve_triangular just calls ?trtrs.

@ilayn
Copy link
Member Author

ilayn commented Aug 6, 2024

But ??con is not currently being called at all for triangular matrices. _solve_triangular just calls ?trtrs.

Hmm then we need to do some surgery. Are we sure that solve_triangular is not doing any slow things?

@mdhaber
Copy link
Contributor

mdhaber commented Aug 6, 2024

Hmm then we need to do some surgery. Are we sure that solve_triangular is not doing any slow things?

For a triangular matrix with n=1000 solving with assume_a=None (~3.66 ms):

  • finiteness check: 7.3%
  • find matrix structure: 24%
  • lange: 52%
  • solve_triangular: 13.3%, 93% of which is trtrs

With assume_a='gen' (~21.1 ms):

  • finiteness check: 2.2%
  • lange: 16% (but about the same wall clock time as for triangular, since it's the same, general matrix LAPACK routine)
  • getrf: 65.7%
  • gecon: 12.1%

This brings to mind that we could speed up the finiteness check, especially for the diagonal and tridiagonal cases, by checking only the elements that will be used in the calculation. However, this brings to mind gh-2116.

But I think the conclusion is that getrf is just unreasonably fast and the wall clock time doesn't appear to scale much faster than $O(n^2)$ for these size matrices.

@mdhaber
Copy link
Contributor

mdhaber commented Aug 10, 2024

@ilayn Anything else to do here? Did you want someone else to take a look or does this look safe enough?

@ilayn
Copy link
Member Author

ilayn commented Aug 10, 2024

No I think the tests are comprehensive enough as far as I can see and we have a good indication that the performance is improved. Let's see if the nightlies cause any issue downstream. Thank you for pushing this over the line. Great stuff.

@ilayn ilayn merged commit 5fe2473 into scipy:main Aug 10, 2024
39 checks passed
@ilayn ilayn deleted the solve_cython branch August 10, 2024 07:39
@lucascolley lucascolley added this to the 1.15.0 milestone Aug 10, 2024
@lucascolley lucascolley added the needs-release-note a maintainer should add a release note written by a reviewer/author to the wiki label Aug 14, 2024
anushkasuyal pushed a commit to anushkasuyal/scipy that referenced this pull request Sep 13, 2024
* ENH:linalg: Initial Cythonized solve commit

* ENH:linalg: Added dtype validation

* ENH: gen, tri, bidiag solvers are added [skip ci]

* MAINT:linalg: Switch to Cython templating in solve

* PERF:linalg.solve: Cython optimize diagsolve [ci skip]

[ci skip]

* MAINT: linalg.solve: allow NumPy to copy as needed

* MAINT: linalg.solve: revert some changes

* MAINT: linalg.solve: add diagonal, tri-diagonal, and triangular cases

* TST: linalg.solve: add tests

* MAINT: linalg.solve: spell out assume_a values

* MAINT: linalg.solve: avoid double input validation with triangular solve

* MAINT: linalg.solve: define bandwidth/issymetric/ishermitian for all dtypes

* Update scipy/linalg/tests/test_basic.py

[skip circle] [skip cirrus]

* ENH: linalg.solve: speed up with custom matrix norm

[skip cirrus] [skip circle]

* DOC: linalg.solve: table instead of list

* MAINT: linalg.solve: consider aliases throughout instead of mapping them

* ENH: linalg.solve: streamline _lange_tridiagonal

* MAINT: linalg.solve: convert to cdsz at the start

---------

Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement needs-release-note a maintainer should add a release note written by a reviewer/author to the wiki scipy.linalg
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants