Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement]: Ship a correctly rounded threaded OpenBLAS as an Artifact #131

Open
orkolorko opened this issue Jan 31, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@orkolorko
Copy link
Collaborator

orkolorko commented Jan 31, 2023

Feature description

I think it would be a good idea to ship a version of OpenBLAS with the CONSISTENT_FPCSR=1 flag enabled together with the library as an Artifact, or compile during installation.

The main reason is that the system (or Julia) OpenBLAS distribution may not have this flag enabled.
While Julia may be started with only 1 thread, unless explicitly stated, OpenBLAS may run with multiple thread enabled and have different rounding modes on each thread.

Currently, a fix that allows consistent rounding is to call Julia with the

OPENBLAS_NUM_THREADS=1

but this affects performance.

See
Julia Threads + BLAS Threads
Using directed rounding in Octave/Matlab

@lucaferranti
Copy link
Member

Hi @orkolorko , apologies for the delay in answering.

This sounds very interesting!

Exploiting BLAS multithreading is also what makes Rump multiplication algorithm faster. We can use matrix multiplication as a benchmark to see how this affects performance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants