You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 10, 2017. It is now read-only.
The new GPU libraries (cuBLAS, clBLAS) do not actually implement CBLAS: they implement BLAS with non-standard prefixes (and I'm unsure about the use from Fortran... certainly not binary compatible with LAPACK). Although this allows users to specifically use the GPU, it is impractical to expect users - and the many tiers of middleware - to implement source code changes
In addition, it is clear to see that GPU acceleration is actually slower for small arrays (unless batched, which is a non-trivial departure from the BLAS API)
A more practical solution would be to create a libblas (implementing BLAS and then wrapping with CBLAS) that delegates to the correct implementation at runtime. The deciding factors in choosing an implementation (ATLAS vs clBLAS) for each routine could be calculated imperically on a per-machine basis and saved into a config file that allows the delegating lib to decide based on its parameters (e.g. array size)
From a C perspective, I do not know how to load a library containing methods of the same name as those we are implementing. There might need to be some dynamic library loading jiggery pokery.
From a Java perspective, this library would look identical to libblas and therefore no code changes would be necessary. Note that we cannot workaround the issue of name collisions, because the native LAPACK and ARPACK need to be able to call correctly named BLAS.
The text was updated successfully, but these errors were encountered:
The new GPU libraries (cuBLAS, clBLAS) do not actually implement CBLAS: they implement BLAS with non-standard prefixes (and I'm unsure about the use from Fortran... certainly not binary compatible with LAPACK). Although this allows users to specifically use the GPU, it is impractical to expect users - and the many tiers of middleware - to implement source code changes
In addition, it is clear to see that GPU acceleration is actually slower for small arrays (unless batched, which is a non-trivial departure from the BLAS API)
A more practical solution would be to create a libblas (implementing BLAS and then wrapping with CBLAS) that delegates to the correct implementation at runtime. The deciding factors in choosing an implementation (ATLAS vs clBLAS) for each routine could be calculated imperically on a per-machine basis and saved into a config file that allows the delegating lib to decide based on its parameters (e.g. array size)
From a C perspective, I do not know how to load a library containing methods of the same name as those we are implementing. There might need to be some dynamic library loading jiggery pokery.
From a Java perspective, this library would look identical to libblas and therefore no code changes would be necessary. Note that we cannot workaround the issue of name collisions, because the native LAPACK and ARPACK need to be able to call correctly named BLAS.
The text was updated successfully, but these errors were encountered: