Skip to content
This repository was archived by the owner on Aug 5, 2022. It is now read-only.

BLAS desiderata

njsmith edited this page Apr 11, 2014 · 4 revisions

The numerical ecosystem could really use a modern, optionally-multithreaded BLAS under a BSD-like license with a priority on

  • Correctness
  • Out-of-the-box single-binary functionality (e.g., runtime kernel selection, runtime thread control)
  • Speed
  • Portability

...in roughly that order.

OpenBLAS is currently the library that's closest to providing these things, but there are a number of improvements possible. Fixing these might make some good concrete targets for people to go after:

  • The path leading to getting a generally-useful build is lined with tricky booby-traps (e.g., automagic capping of the maximum number of threads and the famous NO_AFFINITY).
  • There are concerns about lack of tests. That link lists a number of specific bugs that made it past the existing test suite and still are not tested for; in general it would be very useful to build up a set of comprehensive BLAS/Lapack tests that includes tests for realistic problem sizes.
  • It's not possible (?) to override CPU detection at runtime, which makes it hard to run comprehensive tests.
  • The use of AT&T-syntax inline asm (?) prevents the use of MSVC; using intrinsics instead might be more maintainable and certainly more portable.
  • ...any more?
Clone this wiki locally