Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow UMFPACK to be run with an alternative BLAS under the hood #453

Closed
ChrisRackauckas opened this issue Oct 5, 2023 · 13 comments
Closed

Comments

@ChrisRackauckas
Copy link

In LinearSolve.jl we directly use MKL_jll.jl in order to avoid OpenBLAS but not use lbt because we don't want to override the user's global state. However, this doesn't change UMPACK, and thus UMPACK is rather slow in comparison to everything else. We'd be hoping to just swap the underlying BLAS without having to make a global change in order to get the performance how we want it without effecting global state.

@rayegun
Copy link
Member

rayegun commented Oct 5, 2023

The only really feasible way to do this is to set up a second entirely disjoint instance of LBT. This would have to be an entirely separate package and probably JLL though. Which seems pretty crazy to me rather than just setting up an @info telling the user to use MKL on x86.

Why exactly would we want to give users better performance on just UMFPACK when they could have it for every BLAS call already with LBT?

@ChrisRackauckas
Copy link
Author

Is the answer for LinearSolve.jl to just start doing using MKL?

@rayegun
Copy link
Member

rayegun commented Oct 5, 2023

If it's available for that plat yeah. I really do think we're shooting ourselves in the foot trying to pull out specific MKL functions, when the user probably wants it all. I'm not aware of any functionality where OpenBLAS beats MKL are you?

Whether that should be done in LinSolve or whether we should explicitly inform the user to do it themselves idk

@ViralBShah
Copy link
Member

At least for this issue, it would be good to have a simple example that shows where UMFPACK + OpenBLAS is slow. I assume you are using 1 openBLAS thread.

@ChrisRackauckas
Copy link
Author

The point is that the threading performance is what is bad, so it has bad scaling. There is currently no situation where UMFPACK benchmarks as the best method https://docs.sciml.ai/SciMLBenchmarksOutput/dev/LinearSolve/SparsePDE/ and benchmarks show very poor performance for OpenBLAS https://docs.sciml.ai/SciMLBenchmarksOutput/dev/LinearSolve/LUFactorization/

@ViralBShah
Copy link
Member

My question is - does it get better with openblas threading turned off?

@rayegun
Copy link
Member

rayegun commented Oct 6, 2023

I'm not sure if UMFPACK is multithreaded @ViralBShah. I think it relies on threaded BLAS for performance.

@ViralBShah
Copy link
Member

ViralBShah commented Oct 11, 2023

Here's Tim's answer on the topic: DrTimothyAldenDavis/SuiteSparse#432 (comment)

Also, a parallel UMFPACK is coming (unrelated to this), and will also have the same performance issue. People who care about this sort of thing should just use MKL in their startup.jl. We can do an @info for now in Base.

The only realistic solution is SuiteSparseMKL if you want it to all work automatically and by default. Even then, AMD has its own BLAS, and all this will fail on Mac ARM. Of course, I feel that the kind of person who wants faster LinearSolve performance, also generally wants faster overall BLAS performance, and it may be best to ask people to pick a better BLAS by default. Eventually we should have a global preference to set the BLAS, rather than doing it through startup.jl.

@ChrisRackauckas
Copy link
Author

Well, I did the LinearSolve.jl change and 🤷 I'm glad we got some information back. I got about 50 happy responses of people saying PDE stuff is magically faster, but 2 bug reports which effectively blow it all down SciML/LinearSolve.jl#427 and will cause it to have to revert. Pretty sad, so this issue has at least been proven to be unsolvable without changes to the binaries and a cause of major (100x) slowdowns in many real-world use cases. And it's pretty clear that the coming parallel UMFPACK will be super good but probably still be slower than KLU if you use OpenBLAS, it's that bad 🤷. So, we really do need a way to build with MKL without LBT since we cannot use the global trigger in general.

The other path would be to actually fix MKL.jl to cover the whole interface (i.e. fixing JuliaLinearAlgebra/MKL.jl#138) and then having LinearAlgebra.jl do this. LinearAlgebra.jl should just check the CPU and load the "right" BLAS.

I'm not aware of any functionality where OpenBLAS beats MKL are you?

Matrix multiplications on AMD Epyc chips. That came up in two responses. Everyone else got either nothing changed or a speedup.

The only realistic solution is SuiteSparseMKL if you want it to all work automatically and by default.

I think that is the case.

Also, a parallel UMFPACK is coming (unrelated to this), and will also have the same performance issue. People who care about this sort of thing should just use MKL in their startup.jl. We can do an @info for now in Base.

We should throw a warning if anyone does A\b with a sparse matrix and has OpenBLAS on. Tim Davis is pretty explicit that it's effectively not correct behavior in his eyes and it would be good to tell users that they are missing a ton of performance in a very known and fixable way. In the meantime, since we are side-stepping LinearAlgebra anyways, we will at least add the warnings to LinearSolve.jl directly.

@ViralBShah
Copy link
Member

I think those issues are addressable and we should go about fixing them. This is probably the first large scale push for MKL and it is not surprising it uncovered more issues.

Incomplete interface issues should be filed in libblastrampoline. We had picked every single thing we could find (but naturally there might be a few missed), and we even allow for BLAS extensions.

Packages that are not linked against LBT probably should be recompiled. However, if they call openblas directly, it really shouldn't be an issue. The SLICOT one is a bit puzzling, but I'll try to look into that.

@ChrisRackauckas
Copy link
Author

Indeed, this was just the first large attempt to break the shackles of OpenBLAS. We will lick our wounds, fix up a few things, and come back to it with a improved packages in a bit. I personally want LinearAlgebra to start shipping MKL on appropriate CPUs within the next year

@ViralBShah
Copy link
Member

Part of the fix might be this: JuliaLinearAlgebra/MKL.jl#140

@ViralBShah
Copy link
Member

The OpenBLAS perf issue is resolved and also backported. For the SuiteSparse that ships as an stdlib, we won't be able to do this.

If you really want it, maybe there can be an external package, but I have no idea how reliably that will work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants