-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: linalg.solve
: detect and exploit matrix structure
#12824
Conversation
I don't get which part of |
Yes. I think so.
This means that the issue like gh-10722 will not occur, right? (I guess this PR won't solve that issue because it won't change
By "sparse arrays" do you mean whether it's a sparse matrix object, a |
Indeed, that's the idea. Maybe not obsolete but a bit laborious.
I mean in this sense: https://github.com/scipy/scipy/blob/master/scipy/_lib/_util.py#L252-L258 If there are easier ways to check sparseness I can put it back in. In matlab it switches internally but all objects are in the same namespace so checking it doesn't cost a lot of code. |
Not sure if it is fast or clean enough but how about Would Could look for (or probably faster to just try to use in a If these don't work, I understand. I know it needs to be really efficient. |
This looks really good already! Cython looks pretty idiomatic to me.
Long-standing issue, only TravisCI has always understood |
scipy/linalg/basic.py
Outdated
transposed: bool, optional | ||
If True, ``a^T x = b`` for real matrices, raises `NotImplementedError` | ||
for complex matrices (only for True). | ||
sym_pos : bool, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, deprecating this sounds good. I know it's been discussed for a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need @pv 's blessing for that as he didn't like the idea of dropping it previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be easier to create a new function?
I know that solve
it the obvious name, and it's a shame that it would not be available. But maybe create a new function (backslash
?) for now, deprecate the things we don't like about solve
, and after a few versions make solve
and the new function synonymous?
Maybe it's a crazy idea, but I've been tiptoeing around some unfortunate parts of the interface of svds
recently and thinking how nice it would be to just start fresh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know exactly how you feel and I feel the same about many linalg parts but I think these are pretty central to usage of SciPy. That is to say, there are too many users importing scipy just for linalg and I don't know how to handle them.
Every time I sit down for some work on this, I get lost in a backwards compatible API design mess and I rage quit which happens every weekend which is the only actual coding time I can spare recently.
If I find a way to regularize the input then, say, I can't handle transposed keyword. If I do that then Cython doesn't like certain inputs, if we fix that then we lose too much time checking too many things then it becomes pointless to implement the whole thing etc. So I think a few more weekends and there will be some progress on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are pretty central to usage of SciPy. That is to say, there are too many users importing scipy just for linalg and I don't know how to handle them.
Sure. So maybe forget what I said about deprecating the bad parts of solve
; just consider the part about making this a new function so you don't have to rage quit anymore. If the name were backslash
, it would go at the very top of the linalg
documentation:
It would also be good if inv
were not the first thing to appear there...
Users who want to keep using solve
the same old way can. Why should they expect to get the benefits of the new features you're adding? If they want the new features, they can put in a little work to call a different function. I don't think you should be required to do all that work for everyone, going crazy in the process.
After all, this function is going to do more than strictly "solve" a linear system, since it would also handle the non-square case, right?
What do you think @mckib2 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not too counterintuitive to call this backslash
? Wouldn't something like ldivide
be clearer? (That might actually be the MATLAB name).
Sorry, I can only contribute to the colour of the bikeshed :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be for non-matlab users. I didn't think other people would agree with it, but I couldn't stop myself from suggesting it : )
scipy/linalg/basic.py
Outdated
lower : bool, optional | ||
If True, only the data contained in the lower triangle of `a`. Default | ||
is to use upper triangle. (ignored for ``'gen'``) | ||
assume_a : str, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current version of the code, this isn't used any more, right? Do you plan on bringing it back and skipping to the appropriate solver if it's supplied? (Maybe the default would be None
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be used to bypass the structure checks if you already know what you have but currently I don't have names or abbreviations for them so it's left for later when the code stabilizes.
Ilhan... If the diagram above is complete, and for non-square systems just QR is called, then this is probably the place to call my fast ill-conditioned test, and if it says the system is ill-conditioned (which is very likely correct) then call arls() (or aregls, or whatever the final name is. If you are interested I will promote my latest version of it. I have not been promoting incremental versions so as not to be a nuisance. |
Ah yes. I didn't mean to ignore I am having very little time to come back to this or any other algo-heavy stuff. And when I do, I am mostly fighting with Cython which turned out to be very laborious to develop anything with it (i have basically hit many reported unsolved compiler-crash cases in Cython repo). But finally got some progress. I can't make any promises but if I can stabilize the square case then yes let's try to sneak that in. The API will be a hot mess for a few versions to accomodate the old and new behavior simultaneously for backwards compatibility though. So we have to think very hard for the least disruption in any case. |
scipy/linalg/_linsolve.pyx.in
Outdated
cdef int r, c | ||
cdef lapack_t entry | ||
if lapack_t in lapack_cz_t: | ||
for r in xrange(n): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried googling a little and couldn't quite find the difference between range
and xrange
in Cython, but it appears that they favor range
(for example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is a confusing one search for the phrase "And in fact, just changing range to xrange" in this link
https://cython.readthedocs.io/en/latest/src/tutorial/profiling_tutorial.html
I am slowly losing track of v3 changes and best practices in cython docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I think what is going on here is that xrange
deduces the index type? The Cython code snippets below that quote revert to cdef
ing the index variable and using range
scipy/linalg/_linsolve.pyx.in
Outdated
for finite arrays and faster if array has inf/nan due to | ||
early exit capability. | ||
""" | ||
cdef int n = <int>A.shape[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is assumed size_t
in other places, would prevent overflow if matrices are really large or LinearOperator used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I'll fix that.
scipy/linalg/_linsolve.pyx.in
Outdated
cdef size_t n = A.shape[0] | ||
cdef int r, c | ||
cdef lapack_t x, y | ||
cdef bint symmetric = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other places in this code bint
is assigned using boolean literals (True
/False
), here they are ints. I think True
/`False makes it slightly more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still far from a consistent code because every part is written on different times hence these will be a lot until the API stabilizes but I totally agree.
scipy/linalg/_linsolve.pyx.in
Outdated
if lapack_t in lapack_sd_t: | ||
for r in xrange(n): | ||
for c in xrange(n): | ||
if r == c: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can refactor this as
for r in range(n):
for c in range(n):
if A[r, c] != A[c, r]:
return False, False
This avoids a nested if statement in a loop and the same or less than total comparisons required (though they might be more challenging comparisons granted, but I usually find that avoiding bad branch predictions is worth it)
scipy/linalg/_linsolve.pyx.in
Outdated
# look-up once | ||
x, y = A[r, c], A[c, r] | ||
# once caught no need to keep checking continue for sym | ||
if hermitian: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd have to do some benchmarking to see if this really helps, but you can avoid some if predictive branching by doing the following (at the cost of readability of course)
for r in range(n):
for c in range(n):
x, y = A[r, c], A[c, r]
symmetric |= (x != y)
hermitian |= (x != y.conjugate())
if not (symmetric or hermition):
return False, False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless symmetric != hermitian
, I guess I assumed for real-valued matrices they would be symmetric and hermitian, but hermitian is usually only a property of complex-valued matrices
scipy/linalg/_linsolve.pyx.in
Outdated
cdef char* trans = 'T' if a_is_c else 'N' | ||
cdef char* norm = '1' if a_is_c else 'I' | ||
cdef int info, res | ||
cdef int n = <int>a.shape[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another possible int
-> size_t
For positiveness/negativeness it starts with
For hessenberg and bidiagonals I have to refresh my memory. I need to implement the hessenberg solver from https://dl.acm.org/doi/abs/10.5555/866641 to make |
Ok so for symmetric matrices, try Cholesky if it passes some necessary conditions, if not, fall back to LDL. |
No conditions, if
You mean it is already tridiagonal? |
Right. No sufficient conditions (other than trying Cholesky). But having all positive diagonal entries is a necessary condition, right? (And being Hermitian if complex.) |
Yes positive diagonals is a necessary condition for pos def. You can also check if all negative and compute the Cholesky of the negative etc. but that's probably for later. The diagram at the top describes the workflow or in matlab docs |
I just mean that if we're accepting a full matrix rather than just the diagonals, there are few circumstances under which using a bidiagonal solver rather than a tridiagonal solver will make a significant difference in overall computation time. So personally, writing custom bidiagonal routines would be lower on my priority list than some other things. Just thinking out loud. : ) |
Anyway, LMK if there is anything else you wanted here. This is the scope of what I had in mind for a first PR. In follow-ups, we can try Cholesky and fall back to LDL, add the missing reciprocal condition number checks, and then see how much compiling helps? Last question for now, I think: I would have expected the different slopes of triangular vs general solves to become apparent toward the right of some of the graphs, like the top right one in #12824 (comment). Is the limited benefit of using triangular solve not surprising? (Is LAPACK LU just unreasonably fast?) |
scipy/linalg/_basic.py
Outdated
return max(a, b, c) | ||
|
||
|
||
def _bandwidth(a): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need these excepts if it is not supported it will not be possible to do linear algebra on them anyways. So either we need to quit or convert things at the entrance to np.float32
, np.float64
, np.complex64
, and np.complex128
. Otherwise it will be losing time for no apparent reason because it will be converted later anyways either by us or by f2py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K. At the beginning of solve
I will coerce to one of those types. (Why is that not already done? We just always rely on that to happen when the LAPACK routine is called? Isn't there at least a private function for this?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous state of this PR there was a casting table and I was doing it with a casting table etc. but probably now overwritten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K. Just seems fundamental if all of linalg
is based on LAPACK which accepts only the four flavors. Seems there is no harm in taking care of the conversion early if LAPACK will be used rather than run the risk of letting multiple calls to wrappers do the same conversions repeatedly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is a major pain to use BLAS and LAPACK to massage things until functions accept the input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully the approach I took looks OK. It seems to work for typical dtypes.
# Triangular case | ||
elif assume_a in {'lower triangular', 'upper triangular'}: | ||
lower = assume_a == 'lower triangular' | ||
x = _solve_triangular(a1, b1, lower=lower, overwrite_b=overwrite_b, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to do the same explicit lapack calls but I can add those later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean _solve_triangular
needs the condition number check?
If so, yeah, after the tridiagonal condition number PR, I planned to add the triangular version, then use both of them here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. The main use for polyalgorithm is to work automagically regardless whether the user knows it or not. So if they pass a bidiagonal array the machine should recognize it. So we are increasing the structures that the solver can identify. Not necessarily providing separate solvers for each cases. The more the better.
The main slow down is to call
Sure this is already good progress. |
Ok. Just to be explicit, this already identifies bidiagonal matrices as tridiagonal and calls the tridiagonal routines for them. |
But ??con is not currently being called at all for triangular matrices. |
Hmm then we need to do some surgery. Are we sure that solve_triangular is not doing any slow things? |
For a triangular matrix with
With
This brings to mind that we could speed up the finiteness check, especially for the diagonal and tridiagonal cases, by checking only the elements that will be used in the calculation. However, this brings to mind gh-2116. But I think the conclusion is that |
@ilayn Anything else to do here? Did you want someone else to take a look or does this look safe enough? |
No I think the tests are comprehensive enough as far as I can see and we have a good indication that the performance is improved. Let's see if the nightlies cause any issue downstream. Thank you for pushing this over the line. Great stuff. |
* ENH:linalg: Initial Cythonized solve commit * ENH:linalg: Added dtype validation * ENH: gen, tri, bidiag solvers are added [skip ci] * MAINT:linalg: Switch to Cython templating in solve * PERF:linalg.solve: Cython optimize diagsolve [ci skip] [ci skip] * MAINT: linalg.solve: allow NumPy to copy as needed * MAINT: linalg.solve: revert some changes * MAINT: linalg.solve: add diagonal, tri-diagonal, and triangular cases * TST: linalg.solve: add tests * MAINT: linalg.solve: spell out assume_a values * MAINT: linalg.solve: avoid double input validation with triangular solve * MAINT: linalg.solve: define bandwidth/issymetric/ishermitian for all dtypes * Update scipy/linalg/tests/test_basic.py [skip circle] [skip cirrus] * ENH: linalg.solve: speed up with custom matrix norm [skip cirrus] [skip circle] * DOC: linalg.solve: table instead of list * MAINT: linalg.solve: consider aliases throughout instead of mapping them * ENH: linalg.solve: streamline _lange_tridiagonal * MAINT: linalg.solve: convert to cdsz at the start --------- Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
summary
The
solve
function is basically one of the most used in the linalg module. Its counterparts in other packages and languages mostly enjoy varying degrees of automation via the so-called polyalgorithms. These basically try the simplest cases (is it diagonal, bidiagonal, tridiagonal, triangular ... so on) such that simple problems are handled much faster than the less structured problems.The most well-known version of this is from matlab syntax
A\b
hence the name backslash. Julia, Octave, and many other languages have an attempt of this. And through mathworks the following logic is generally followedscope
There are basically lots of structure to be tested and I am aiming to address most of them (enumerated below). The main mechanism of the algorithm is to discover structure as quick possible and branch off to the relevant solver. This is made possible via
find_array_structure
function written in Cython. If the entrywise checking didn't turn out anything we also check for symmetricity/hermitianness and if indeed there is structure we also test for positive/negative definiteness. If all fail then we fall back to the generic no structure solution.With this PR
linalg.solve
will be able to automatically distinguish, solve and check for singularity/ill-condition cases for (marked are done)permuted triangular arrays are out-of-scope on purpose because there is no shuffled-triangular solvers available and reshuffling the arrays back triangular, unfortunately eats up all the benefits of this polyalgorithm and destroys the memory layout. Hence there is no point in that specialization for now.
performance
Main performance gains are through the custom solvers however they are not the only tie-breakers for small
n
when it comes to the complete performance measurements. Arguably the next most important detail is that nowa
array can survive without any copies ifdtype
is suitable to LAPACK and contiguous since C/F layout is meticulously tracked and their transposes are used whenever possible. This also made it possible to have more possibilities of overwriting the original array. Due to the customcheck_finite
function the slowdowns of_asarray_validated
passes are sped up too. Finally theget_lapack_funcs
bottleneck is removed via completely separate (s-d-c-z) type-paths.I will add the relevant plots here. But as a teaser, it is always faster than current version of
scipy.linalg.solve
for generic case even though doing lots of extra work. It can go much faster but losing most of the pace in the array regularization at the intake. Compared to numpy.linalg.solve I don't know why exactly but up-to about n=10 NumPy version is very fast with no checks whatsoever. Then the proposed version takes over both SciPy and NumPy versions.Diagonal case
There is no point in comparing to the current SciPy master since NumPy pretty much smokes it in every case. In the fat B array case on the left, we are only paying the price of copying B into fortran contiguous array. On the right the columns of B is set to 10 and N varied. As you can see in both cases, we are pretty much an order of magnitude faster for large arrays. On very small arrays N=[3, 10] we don't pay any price even though we perform many more checks.
Bi/Tridiagonal case
Similarly, bidiagonal/tridiagonal difference is the same complexity difference hence still no surprise. In fact checking the condition number takes the same amount of time as the solver step (rcond part is not optimized yet). SciPy and NumPy has no special solver for these cases.
Triangular case
For triangular cases, already much faster than the generic solvers in both SciPy and NumPy, and slightly faster than
linalg.solve_triangular
. As n gets larger the difference grows.More to come here....
early feedback points needed
backward incompatible details
The function no-longer checks whether the inputs are sparse arrays or not. This had the side effect of importing scipy.sparse through
_lib._util._asarray_validated
for no reason. Instead it will be discouraged through the documentation.If everybody is OK with it, I would like to finally zap the disfunctional
debug
keyword (maybesym_pos
too? ).TO DO
transposed
keyword handling working properly for complex arrays when array has C-layoutI would appreciate all the help and all kinds of feedback for this overhaul.
Code for generating various structured a, b pairs