-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Issues with sprs-ldl
.
#199
Comments
Thanks for the report @PTNobel ! Similarly to #188 it would probably be nice to adapt your code to add it to the benchmarks as a starting point. I'm not familiar with the implementation so can't comment on the reasons behind this difference in performance. Someone would probably need to profile it to see what is happening.. |
Hello @PTNobel and thanks for the report. I'm afraid the performance of
Something troubles me about your benchmark though: it is expected that Another thing about the benchmark: it includes the cost to convert from triplet format to csc, but it would probably be better to avoid counting this as solve time. I should investigate this once I'm done improving the matrix product performance (currently investigating parallelization opportunities). Please note this can take some time, I don't have much free time at the moment. Any help is welcome of course :) Thanks for the report! |
@vbarrielle, I think it sounds like the lowest hanging fruit here is to
|
Hello, just a note that I can start investigating now that #201 is done. |
Anything you want me to do? I'm happy to try and work through my list of items from my last comment. |
@PTNobel I've checked 1, and the csc transform does not take too much time so it does not matter if it stays in the timed section. I'm going to focus on 4 and 6, which mostly requires that I publish a new version of So if you want you can look into 2, 3 or 5. Thanks for proposing your help by the way. |
I wanted to have comparison points with other factorization algorithms, so I serialized the first jacobian in your example and loaded it in scipy. There I've been able to test
I've also timed the CHOLMOD algorithm, which is probably the best library for Cholesky decomposition. I've had to add a diagonal offset to avoid an error signaling the matrix is not positive definite:
For the record, on my machine, using Reverse Cuthill-McKee, I get the following timing on your benchmark:
which would mean about 1.46s per solve but this is not an accurate comparison as the symbolic part is re-used but the backsolve is included. In general, the number that governs the performance of the factorization is the fill-in, ie the number of additional nonzeros created in the factorization. The jacobian has 41818 nonzeros. Factoring with So part of the better performance in CHOLMOD is probably explained by its usage of a better fill-in reducing permutation (it probably uses AMD). It's also an extremely well tuned library. However I find the fill-in quite big, even for CHOLMOD. But I notice the graph is wired randomly, and I'm pretty sure fill-in reducing permutations work by discovering structure in the nonzero graph, so that means this is quite a pathological example. Maybe we can expect better numbers if the graph is for instance a regular grid, or a surface mesh. That's something to try. |
I'll add, that this is because my use-case encounters genuinely random graphs quite frequently sadly, the more structure present, the work we're doing is less useful. |
So as you suspected, the results being returned are wildly different suggesting that |
Question: What is the difference between LdlLongNumeric and LdlNumeric in |
The difference comes from the type of the integer type that the underlying C library expects for the storage of the indices and indptr of the system matrix, the former expects a |
Latest micro-optimizations in I guess the way to go now would be to have a better fill-in reducing permutation, which will require binding to those available in suitesparse, or implementing one of them. I'll probably look into the former in the near future. |
The binding of SuiteSparse's CAMD has enabled some more performance gains, and, as mentionned in PTNobel/sprs-performance-test#2 (comment), I don't think there's much more improvements to be hoped for using LDL. A good way to improve performance would be to have a supernodal Cholesky decomposition (like CHOLMOD), or a multifrontal one to exploit parallelism. I think the latter is a bit more accessible than the former, so I'll probably have a look into it at some point in the future, but it's probably going to take a while, as my free time is scarce these days. |
I published a pure Rust translation of It works well with this solver: |
@rwl That seems like it could fit very well with this crate! |
I've been using
sprs-ldl
to solve some symmetric sparse matrix systems and found the performance to be surprisingly poor.lsolve_csc_dense_rhs
has had significantly better performance.Here is the benchmark I've been using: https://github.com/PTNobel/sprs-performance-test
which produced the following output:
I'm happy to answer more questions if needed, but that'll probably have to come on the weekend.
This issue was filed per https://www.reddit.com/r/rust/comments/gzazna/whats_the_current_state_of_rusts_sparse_linear/fv9hf0e/?context=3
The text was updated successfully, but these errors were encountered: